The POSIX Socket Lifecycle
The POSIX socket API is the standard interface for network programming on Unix-like systems. A socket is represented by a file descriptor — an integer that the kernel uses to track the open connection, just like it tracks open files. This unification of networking and file I/O under the same abstraction is a cornerstone of Unix design.
Server Socket Lifecycle
A server follows a specific sequence of system calls:
- socket() — creates a new socket file descriptor. You specify the address family (AF_INET for IPv4, AF_INET6 for IPv6), socket type (SOCK_STREAM for TCP, SOCK_DGRAM for UDP), and protocol.
- bind() — associates the socket with a local address and port (e.g., 0.0.0.0:8080). Without binding, the OS assigns an ephemeral port.
- listen() — marks the socket as passive, telling the kernel to accept incoming connections. The backlog parameter sets the maximum queue length for pending connections.
- accept() — blocks until a client connects. Returns a new file descriptor for the established connection. The original socket continues listening.
- send()/recv() (or write()/read()) — exchange data over the connected socket.
- close() — releases the file descriptor and tears down the connection (sends FIN in TCP).
Client Socket Lifecycle
A client is simpler: socket() → connect() (initiates TCP three-way handshake to the server's address) → send()/recv() → close().
Blocking vs Non-Blocking Mode
By default, sockets are blocking: recv() will suspend the thread until data arrives, and accept() will suspend until a client connects. In non-blocking mode (set via fcntl(fd, F_SETFL, O_NONBLOCK)), these calls return immediately with EAGAIN/EWOULDBLOCK if no data or connection is available. Non-blocking sockets are essential for building high-performance servers that handle many connections on a single thread.
I/O Multiplexing
Handling thousands of connections requires monitoring many file descriptors at once. The OS provides three mechanisms:
- select() — the original multiplexer. Passes three fd_set bitmasks (read, write, exception) to the kernel. Limited to FD_SETSIZE (typically 1024) file descriptors. O(n) scan on every call.
- poll() — removes the fd limit by using an array of
pollfdstructs instead of bitmasks. Still O(n) because the kernel scans the entire array. - epoll (Linux) — the modern solution. Uses three calls:
epoll_create()makes an epoll instance,epoll_ctl()adds/removes fds,epoll_wait()returns only the ready fds. This is O(1) per event rather than O(n) per call. Supports edge-triggered mode (notify only on state change) and level-triggered mode (notify while condition holds).
The C10K Problem
In the early 2000s, handling 10,000 concurrent connections on a single server was a major challenge. Thread-per-connection models consumed too much memory (each thread needs a stack, typically 1-8 MB). select() and poll() degraded linearly with connection count. epoll (Linux), kqueue (BSD/macOS), and IOCP (Windows) solved this by providing scalable event notification, enabling single-threaded event loops like those in nginx, Node.js, and Redis to handle hundreds of thousands of connections.
Server vs Client Socket Flow
TCP Server (pseudocode):
server_fd = socket(AF_INET, SOCK_STREAM, 0)
bind(server_fd, {0.0.0.0, 8080})
listen(server_fd, 128) // backlog of 128
while true:
client_fd = accept(server_fd) // blocks, returns new fd
data = recv(client_fd, 4096) // read up to 4096 bytes
send(client_fd, response)
close(client_fd)
This is a simple iterative server — it handles one client at a time. While processing one client, all others wait in the listen backlog queue.
TCP Client (pseudocode):
fd = socket(AF_INET, SOCK_STREAM, 0)
connect(fd, {93.184.216.34, 8080}) // TCP handshake
send(fd, request)
data = recv(fd, 4096)
close(fd)
epoll-based server (pseudocode):
epfd = epoll_create()
epoll_ctl(epfd, ADD, server_fd, {EPOLLIN})
while true:
events = epoll_wait(epfd, max=64)
for ev in events:
if ev.fd == server_fd:
client = accept(server_fd)
set_nonblocking(client)
epoll_ctl(epfd, ADD, client, {EPOLLIN | EPOLLET})
else:
data = recv(ev.fd, 4096)
// process and respond
This single-threaded server can handle tens of thousands of concurrent connections because epoll_wait() returns only the file descriptors that are actually ready, avoiding the O(n) scan of select/poll.