Loading...
Loading...
Microsoft has released bitnet.cpp, an open-source inference framework aimed at making 1-bit and 1.58-bit LLMs practical on commodity CPUs. Built on llama.cpp and lookup-table/T-MAC techniques, it targets fast, lossless-style execution with reported 1.37x–6.17x CPU speedups and up to ~82% lower energy use across ARM and x86. Microsoft claims a 100B-parameter BitNet b1.58 model can run on a single CPU at roughly 5–7 tokens per second, with recent updates adding parallel kernels, configurable tiling, and embedding quantization for an additional 1.15x–2.1x gain. The broader trend is cheaper, more accessible local and edge LLM inference.
microsoft/BitNet: Official inference framework for 1-bit LLMs