Command load and execution in Bash
The booting post explains the process from power-on the computer till a bash prompt is presented to the user.
At this point the user can enter commands to perform any desired action.
Command launch process
Once you get access to a terminal (does not matter whether it is via Alt-F(n) terminal or an Xterminal), bash is already showing the defined prompt and waiting for a command to by typed. It is waiting for your typing at standard input (stdin).
When hitting “Enter” after the command, bash performs these actions:
if there are slashes, bash loads and executes the command in the given path (more on load and execute below).
if no slashes are found in the entered command, bash checks to see whether the command is a function or a build-in command
if it is, it executes the function or build-in code passing any command line arguments.
if it is not, bash looks its internal cache of previously ran commands.
if in the cache, bash loads and executes the command, as it has the full path name in the cache.
if not found in the cache, bash stars a search on the directories informed in the
PATH
environment variable. Than it will inform the internal cache if the file for the command exists and loads and executes the command (more on load and execute below).
The search in the PATH environment variable is done sequentially, following the order of the directories from left to right. When the command is found in a directory, bash stops the search. The command name, with its full path, is then informed in the cache. This internal cache is a hash table. Looking for the commands in the hash is much faster than having to scan each time the directories informed in PATH
.
You can easily check the hash table content typing the built-in command hash (do this after having run a couple of commands).
Behind the curtains
Several interesting things are happening “behind the curtains”.
Let’s continue supposing that the command is an external bash command. As commented, regardless whether the command is in the cache o not, bash has to load into memory the binary file of the command in order to execute it. This leads to several actions:
Creating a new process to run the command. This implies that bash has to fork and exec the new command, which also implies some parent/child management. It first does the fork and not the exec because doing directly the exec would substitute our bash in memory with the binary image of the command, and no more bash would show-up a prompt waiting for the next command. In other words, our bash session would be substituted by the launched command. So, first is forks a copy of himself, and the command is executed (loaded) substituting the copy of bash. Our “original” bash process will be the parent of that new process where the loaded command is running. When doing all this, it is said that bash is spawning a sub-shell and executing the command there. This is a high level view of how processes/commands are executed. The section fork and exec has more details about how this is done.
As commented, the binary file containing the compiled code of the command has to be read (remember that bash knows its full path and name). This implies accessing the file system to get the file location in the disk to be able to read compiled code in that file. Here there is a deeper view on directories and inodes
While being read, the file has to be loaded into memory so the system can execute its code. Remember that any code executed has to be in memory. For binary files, this implies mapping the different parts of an ELF file to memory. More about ELF here.
All this is happening while the computer does many other things (for instance playing a mp3 file or responding to the keyboard). This is possible because there are several mechanisms to share the CPU time among running processes. This is done by the scheduler, a very important component of the kernel. Be aware of how fast a CPU is and how many things it can do looking at the Multitasking explanation in the fork and exec section or at ‘speed of components’ explanations in the “Some numbers and magnitudes” section.
Interrupts are also a very important mechanism to get CPU time. This mechanism is used by the electronic that control the keyboard when a key is pressed so it can be shown in the screen “immediately” (from a user time point of view…)
There are also several mechanisms to exchange information among processes (so the parent and the child can communicate to each other). Find in signals) information about signaling among processes (for instance when the parent is ordered to die)
Waiting for the child
As we are in an interactive session, and the command is launched in the foreground (see bash job control, bash will wait (and therefore not presenting the prompt) till the termination of the command that has been run in the separated sub-shell. It will collect it exit status (also known as return code) of the child in the “$?
” shell variable.
Another command??
When the prompt is back, bash is ready to accept new commands. These commands will go thorough the described process.
From here, there are many options that bash and the whole system offer to us. Commands can be written (scripted) in files that will be interpreted by bash. As bash contains its own programming language, these scripts can control things and behave accordingly via flow control instructions like ‘if’, ‘for’ and many more.
You can find many good resources in the Net about programming in bash, but you can also check the things Mico Maco finds helpful for him at the Bash Programming section.
There are also some bash tip at Bash cmd line tips
Bash code review
At the time of writing this post, Mico Maco downloaded the bash version in use (bash v4.4) to have a look at its code. Mico Maco highly recommends doing this, so you can download the bash code at the GNU site.
If you are looking at version 4.4, look around line 5095 of program “execute_cmd.c” to see the code that will run a command from disk in a subshell (what has been explained in this page…). It does many things, as bash is a very powerful shell. Keep reading lines till line 5251 (I recommend just reading the comments to get what is going on). There is a call to a function named “shell_execve()“. I leave it you to imagine what this function does, but if you look at like 5445 in the same file, you will find it. There is where, among many other things needed by bash, the command is executed (bash has already done the fork…).