Loading...
Loading...
Developers working on Roaring Bitmap-style 16-bit sorted arrays are showing that classic std::binary_search can be outperformed by algorithms tuned for modern CPUs. The proposed “SIMD Quad” approach combines a four-way (quaternary) narrowing step—often via interpolation over 16-element blocks—with wide SIMD comparisons using SSE2 on x64 and NEON on ARM. By treating each block as a SIMD-sized unit, it avoids tiny, branchy descent, probes fewer dependent memory locations, and exploits memory-level parallelism. The result is faster membership tests for array containers (typically 1–4096 elements) central to high-throughput analytics systems.
A developer built SIMD Quad, a faster search for sorted arrays of 16-bit integers used in Roaring Bitmaps. Rather than standard binary search, the algorithm divides arrays into 16-element blocks, uses the last element of each block as interpolation keys, and performs a quaternary interpolation search to quickly pick a candidate block. For small arrays (<16) it falls back to linear search. Once a block is selected, SIMD instructions (NEON on ARM, SSE2 on x86) compare all 16 elements in parallel for equality, avoiding deeper binary-search steps and exploiting memory-level parallelism. The hybrid approach leverages hardware SIMD and interpolation to reduce comparisons and improve throughput for common bitmap workloads.
A developer proposes SIMD Quad, a hybrid search algorithm that beats traditional binary search for sorted arrays of 16-bit integers (common in Roaring Bitmap containers). It divides arrays into 16-element blocks, uses the last element of each block as keys and performs a quaternary interpolation search across block boundaries to quickly narrow to one block, then uses SIMD (NEON on ARM, SSE2 on x64) to compare all 16 elements in parallel. For tiny arrays (<16) it falls back to linear scan. The approach leverages wide SIMD comparisons and memory-level parallelism to reduce branch work and comparisons, promising faster membership tests in bitmap-like data structures.
A developer proposes SIMD Quad, a faster search for sorted arrays of 16-bit integers (typical in Roaring Bitmaps) that beats std::binary_search by combining quaternary interpolation on 16-element blocks with SIMD comparisons. Arrays under 16 elements use linear search; larger arrays are split into 16-element blocks whose last elements act as interpolation keys. A quaternary interpolation narrows the candidate block, then NEON (ARM) or SSE2 (x64) loads all 16 values and compares them to the target in parallel. The approach leverages wide SIMD comparisons and modern CPUs' memory-level parallelism to reduce branch work and runtime, trading a few extra instructions for much fewer memory-bound steps.
A developer found faster alternatives to std::binary_search for Roaring Bitmap 16-bit arrays (sizes 1–4096) by leveraging modern CPU features. Key insights: use SIMD comparisons that inspect eight or more 16-bit integers in one instruction, and exploit memory-level parallelism by replacing binary search with a quaternary (four-way) search to probe multiple pivots simultaneously. The approach avoids descending below SIMD-width blocks, enabling cheaper multi-element comparisons and fewer dependent memory accesses. This can outperform classic binary search on x64 and 64-bit ARM CPUs, improving membership queries in compact bitmap containers used in high-performance data systems.