Loading...
Loading...
Recent discussions on high-performance network servers are converging on event-driven designs built around modern OS primitives like epoll and kqueue, moving away from traditional worker-thread pools. One highlighted pattern favors a “one thread per CPU core” architecture that reduces coordination overhead, minimizes branching and state transitions, and can sustain 100k+ requests per second on contemporary hardware. In parallel, a deeper design comparison argues epoll’s handle-centric model is more composable than kqueue’s filter-centric approach, influencing how developers structure multiplexing, lifecycle management, and feature layering in production servers.
Author describes a high-performance network server pattern that departs from the canonical event-dispatch loop. Instead of a single demultiplexer and complex state machines, the recommended design uses one pinned thread per CPU core, each with its own epoll/kqueue descriptor, and models major state transitions (accept, read, etc.) as explicit thread-to-thread transfers of file descriptors. The piece includes practical implementation notes: creating detached thread pools, setting CPU affinity on Linux and macOS, raising RLIMIT_NOFILE, disabling SO_LINGER, using TCP_DEFER_ACCEPT on Linux, and an accept loop that hands new sockets to worker threads. The approach emphasizes simplicity, blocking I/O per thread, and claims easy attainment of ~100k requests/sec on modern hardware.
A seasoned systems programmer outlines a high-performance server pattern that leverages per-core threads, CPU affinity, and dedicated epoll/kqueue file descriptors to beat traditional event-loop or libevent-based designs. The recommended architecture creates one pinned thread per core, each with its own epoll/kqueue instance; state transitions (accept, read, etc.) are implemented by passing file descriptors between threads’ queues. The article gives practical implementation details—thread creation, setting affinity on Linux/ macOS, increasing RLIMIT_NOFILE, socket options like SO_LINGER and TCP_DEFER_ACCEPT, and an accept loop that distributes connections—aiming for simple blocking I/O and minimal decision points to reach ~100k req/s on modern hardware. It matters because it offers a pragmatic, scalable alternative for high-throughput network servers.
A veteran systems programmer outlines a high-performance server pattern that beats common libevent-based designs by using one pinned thread per CPU core, each with its own epoll/kqueue instance, and by moving clients between threads for state transitions (accept, read, etc.). The article gives pragmatic guidance: create a detached thread per core, set CPU affinity (pthread_setaffinity_np or macOS thread policy), open an epoll/kqueue per thread, raise file-descriptor limits, disable socket lingering, use TCP_DEFER_ACCEPT on Linux, and implement an accept loop that assigns new sockets to worker threads. The approach reduces decision points, simplifies blocking I/O, and claims easy attainment of ~100k requests/sec on modern hardware.
epoll's handle-centric design is more composable than kqueue's filter-centric design (2021)
The article discusses an advanced network-server programming pattern that enhances performance by utilizing modern system calls like epoll and kqueue. It critiques the traditional approach of using worker threads and suggests a more efficient design involving one thread per CPU core, which can handle high request rates exceeding 100,000 requests per second. This design minimizes decision points and simplifies state transitions, making it suitable for modern server architectures. The insights provided are relevant for developers and engineers looking to optimize server performance in high-demand environments.