Event Loops Replace Thread Pools in Fast Servers

Recent discussions on high-performance network servers are converging on event-driven designs built around modern OS primitives like epoll and kqueue, moving away from traditional worker-thread pools. One highlighted pattern favors a “one thread per CPU core” architecture that reduces coordination overhead, minimizes branching and state transitions, and can sustain 100k+ requests per second on contemporary hardware. In parallel, a deeper design comparison argues epoll’s handle-centric model is more composable than kqueue’s filter-centric approach, influencing how developers structure multiplexing, lifecycle management, and feature layering in production servers.

5 news itemsHeat: 2.6

News (5)

快速服务器

Author describes a high-performance network server pattern that departs from the canonical event-dispatch loop. Instead of a single demultiplexer and complex state machines, the recommended design uses one pinned thread per CPU core, each with its own epoll/kqueue descriptor, and models major state transitions (accept, read, etc.) as explicit thread-to-thread transfers of file descriptors. The piece includes practical implementation notes: creating detached thread pools, setting CPU affinity on Linux and macOS, raising RLIMIT_NOFILE, disabling SO_LINGER, using TCP_DEFER_ACCEPT on Linux, and an accept loop that hands new sockets to worker threads. The approach emphasizes simplicity, blocking I/O per thread, and claims easy attainment of ~100k requests/sec on modern hardware.

100pts

BuzzingtoshMarch 6, 2026

Fast-Servers

A seasoned systems programmer outlines a high-performance server pattern that leverages per-core threads, CPU affinity, and dedicated epoll/kqueue file descriptors to beat traditional event-loop or libevent-based designs. The recommended architecture creates one pinned thread per core, each with its own epoll/kqueue instance; state transitions (accept, read, etc.) are implemented by passing file descriptors between threads’ queues. The article gives practical implementation details—thread creation, setting affinity on Linux/ macOS, increasing RLIMIT_NOFILE, socket options like SO_LINGER and TCP_DEFER_ACCEPT, and an accept loop that distributes connections—aiming for simple blocking I/O and minimal decision points to reach ~100k req/s on modern hardware. It matters because it offers a pragmatic, scalable alternative for high-throughput network servers.

19pts

ZelitoshMarch 5, 2026

Fast-Servers

A veteran systems programmer outlines a high-performance server pattern that beats common libevent-based designs by using one pinned thread per CPU core, each with its own epoll/kqueue instance, and by moving clients between threads for state transitions (accept, read, etc.). The article gives pragmatic guidance: create a detached thread per core, set CPU affinity (pthread_setaffinity_np or macOS thread policy), open an epoll/kqueue per thread, raise file-descriptor limits, disable socket lingering, use TCP_DEFER_ACCEPT on Linux, and implement an accept loop that assigns new sockets to worker threads. The approach reduces decision points, simplifies blocking I/O, and claims easy attainment of ~100k requests/sec on modern hardware.

122pts

HNtoshMarch 5, 2026

epoll's handle-centric design is more composable than kqueue's filter-centric design (2021)

3pts

LobstersFebruary 28, 2026

fast-servers: an interesting pattern

The article discusses an advanced network-server programming pattern that enhances performance by utilizing modern system calls like epoll and kqueue. It critiques the traditional approach of using worker threads and suggests a more efficient design involving one thread per CPU core, which can handle high request rates exceeding 100,000 requests per second. This design minimizes decision points and simplifies state transitions, making it suitable for modern server architectures. The insights provided are relevant for developers and engineers looking to optimize server performance in high-demand environments.

3pts

LobstersFebruary 28, 2026