fork() and exec(): Creating and Replacing Processes
In Unix/Linux, process creation is split into two distinct operations: fork() to create a new process, and exec() to replace a process's program image with a different program. This separation is a fundamental Unix design philosophy that provides extraordinary flexibility.
fork() -- Clone the Current Process
The fork() system call creates a child process that is an almost exact copy of the parent:
- The child gets a new PID but inherits the parent's code, data, heap, stack, open file descriptors, signal handlers, and environment variables.
- Return value is the key to distinguishing parent from child: fork() returns the child's PID to the parent (a positive integer), and 0 to the child. On failure, it returns -1 to the parent (no child is created).
pid_t pid = fork();
if (pid == 0) {
// Child process executes here
} else if (pid > 0) {
// Parent process executes here (pid = child's PID)
} else {
// fork() failed
}
After fork(), both parent and child execute concurrently from the same point in the code (the instruction after fork). They have independent address spaces -- changes in one do not affect the other (thanks to copy-on-write, covered in the next tutorial).
exec() -- Replace the Program Image
The exec() family of functions (execl, execv, execvp, execve, etc.) replaces the current process's code, data, heap, and stack with a new program loaded from an executable file. The PID stays the same -- it is the same process running a different program.
After exec(), the old program is gone entirely. exec() does not return on success (there is nothing to return to -- the old code no longer exists). It only returns on failure (e.g., file not found).
The fork+exec Pattern
The canonical way to run a new program in Unix:
- The parent calls
fork()to create a child. - The child calls
exec()to replace itself with the desired program. - The parent calls
waitpid()to wait for the child to finish.
This pattern is how shells work: when you type ls -la, the shell forks, the child execs /bin/ls, and the shell waits for it to complete.
Orphan and Zombie Processes
Orphan process: A child whose parent exits first. The child is adopted by the init process (PID 1) or systemd, which becomes its new parent and will reap it when it exits.
Zombie process: A child that has exited but whose parent has not yet called wait() or waitpid() to read its exit status. The process has released all its resources (memory, open files), but its PCB entry remains in the process table so the exit status is preserved for the parent. Zombies consume a PID and a small amount of kernel memory. A parent that never calls wait() accumulates zombies.
waitpid() -- Reap the Child
waitpid(pid, &status, options) blocks the parent until the specified child exits (or returns immediately with WNOHANG). It fills status with the child's exit code and frees the zombie PCB entry. Properly reaping children prevents zombie accumulation.
Real-Life: How a Shell Executes Commands
When you type gcc hello.c -o hello in a terminal, the shell performs this exact sequence:
Step 1: fork() The shell (e.g., bash, PID 1000) calls fork(). Now there are two processes: the original bash (PID 1000, parent) and a copy of bash (PID 1001, child).
Step 2: exec() in the child
The child (PID 1001) calls execvp("gcc", ["gcc", "hello.c", "-o", "hello"]). The child's memory is replaced with the gcc program. PID 1001 is now running gcc, not bash.
Step 3: waitpid() in the parent
The parent (PID 1000) calls waitpid(1001, &status, 0), which blocks until gcc finishes. The shell appears to "pause" while gcc runs.
Step 4: gcc exits
gcc finishes and calls exit(0). It becomes a zombie until the parent reaps it. The parent's waitpid() returns, the zombie is cleared, and the shell reads the exit status (0 = success) and prints the next prompt.
Why separate fork and exec?
Between fork() and exec(), the child can set up its environment: redirect stdin/stdout (for > and | operators), close unnecessary file descriptors, change the working directory, or set environment variables. This is how shell pipes work:
ls | grep foo
The shell forks twice, sets up a pipe between the two children (redirecting stdout of ls to stdin of grep), then each child execs its respective program.
Zombie danger: A long-running server that forks children to handle requests but forgets to call waitpid() will accumulate thousands of zombies, eventually exhausting the PID space and preventing new process creation.