Threads: Lightweight Execution Units
A thread is the smallest unit of execution that the operating system can schedule on a CPU core. While a process defines an address space and a collection of resources (open files, signal handlers, etc.), a thread is a single flow of control within that process. Every process has at least one thread -- the main thread -- but can create additional threads that execute concurrently within the same address space.
What Threads Share (and What They Don't)
All threads within a process share:
| Shared | Per-Thread (private) |
|---|---|
| Virtual address space (code, heap, globals) | Stack (each thread gets its own) |
| Open file descriptors | Register set (PC, SP, general-purpose) |
| Signal handlers and signal mask | Thread-local storage (TLS) |
| PID and UID | Thread ID (TID) |
| Memory mappings | Scheduling state and priority |
The fact that threads share the heap and global data is both their greatest strength (cheap communication via shared memory) and their greatest danger (race conditions, data corruption without proper synchronization).
Kernel Threads vs. User-Level Threads
Kernel threads (also called 1:1 threads) are threads that the OS kernel is aware of and schedules directly. Each user-space thread maps to exactly one kernel-schedulable entity. This is the model used by Linux NPTL (Native POSIX Thread Library), modern Windows threads, and macOS pthreads. The advantage is that if one thread blocks on I/O, the kernel can schedule another thread from the same process. The downside is that thread creation and context switching require kernel involvement (a syscall), which costs around 1-10 microseconds.
User-level threads (also called green threads or fibers) are managed entirely in user space by a runtime library. The kernel sees only a single thread (or a small pool). The runtime multiplexes many user-level threads onto these kernel threads. Creation is extremely fast (just allocate a small stack, ~2 KB for a Go goroutine vs. ~8 MB for a kernel thread). However, if one user-level thread makes a blocking syscall, it blocks the entire kernel thread it is running on.
Threading Models
- 1:1 (one-to-one): Each user thread maps to one kernel thread. Used by Linux, Windows, macOS. Simple, but thread creation is heavier.
- M:N (many-to-many): M user-level threads are multiplexed over N kernel threads (where M >> N). Used by Go (goroutines on a pool of OS threads), Erlang (processes on schedulers). Requires a sophisticated user-space scheduler.
- M:1 (many-to-one): All user threads map to a single kernel thread. Simple but no true parallelism -- if one blocks, all block. Largely obsolete.
Thread Creation Cost
Creating a new process via fork() requires duplicating the entire page table (even with COW, the metadata copying is expensive). Creating a thread is far cheaper because threads share the same page table. On Linux, clone() with shared-memory flags creates a thread in about 10-50 microseconds. A Go goroutine can be created in under 1 microsecond because it only allocates a tiny stack in user space.
Thread Safety
When multiple threads access shared data concurrently, race conditions arise if at least one thread is writing. A data race occurs when two threads access the same memory location without synchronization and at least one access is a write. Preventing data races requires synchronization primitives: mutexes, semaphores, atomic operations, or lock-free data structures.
Real-Life: Web Server Thread Pool
A web server like Tomcat uses a thread pool to handle incoming requests:
- At startup, the server creates a pool of N worker threads (e.g., 200). This avoids the overhead of creating a new thread for each request.
- When a request arrives, the server picks an idle worker thread from the pool and assigns the request to it.
- The worker thread reads the HTTP request, processes it (queries a database, renders a template), and sends the response.
- After completing the request, the worker returns to the pool to await the next assignment.
Why threads, not processes? All worker threads share the same in-memory cache (e.g., session store, configuration objects). If each handler were a separate process, they would need expensive IPC (shared memory segments, pipes) to access shared state.
Why a pool, not thread-per-request? Creating and destroying 10,000 threads per second would consume significant CPU time on thread creation, stack allocation, and kernel bookkeeping. A pool of reusable threads amortizes creation cost to startup time.
Thread safety in practice: The shared session store must be protected by a lock or use a concurrent data structure (like ConcurrentHashMap). Without this, two threads handling requests for the same user could corrupt the session simultaneously.
Go's approach: Go uses goroutines (M:N model) to avoid the thread pool pattern entirely. You simply launch a goroutine per request (go handleRequest(conn)), and the Go runtime multiplexes thousands of goroutines onto a small number of OS threads. Goroutine creation costs ~2 KB of stack, so even millions of concurrent goroutines are feasible.