Loading...
Loading...
Subquadratic unveiled SubQ 1M-Preview, claiming the first fully subquadratic large language model whose attention compute grows linearly with context length, enabling practical multi-million-token contexts (research result at 12 million tokens). The company says SubQ delivers state-of-the-art long-context accuracy, faster inference, and far lower compute — reporting up to ~1,000x lower attention compute versus frontier models and architecture-level sparse-attention gains (52× faster than FlashAt
Miami startup Subquadratic claims its first model, SubQ 1M-Preview, uses a new Subquadratic Sparse Attention (SSA) architecture that makes attention compute grow linearly with context length, producing up to ~1,000x lower attention cost at 12 million tokens versus current frontier models. The company launched three private-beta products (an API, SubQ Code agent, and SubQ Search) and raised $29M in seed funding from investors including Justin Mateen and Javier Villamizar, with a reported $500M valuation. Researchers have reacted skeptically and called for independent validation because prior attempts to beat transformers’ quadratic attention cost have struggled; if verified, SSA could materially change long-context LLM design and lower the need for retrieval workarounds.
SubQ, from startup Subquadratic, unveiled a sub-quadratic large language model designed to reason over a 12 million–token context window, claiming linear attention cost and up to ~1,000× reduction in attention compute at that scale. The model targets use cases like processing entire code repositories, long PR histories, and persistent agent state, and is offered via a full-context API and a “long-context layer” that integrates with coding agents (Claude Code, Codex, Cursor). Benchmarks shown position SubQ highly on long-context and coding evaluations, with third-party validation promised and technical reports forthcoming. The company emphasizes an architectural breakthrough—sparse attention that focuses only on relevant token relationships—to deliver lower cost and faster exploration for developer and enterprise workflows.
Subquadratic announced SubQ 1M-Preview, the company’s first LLM built on a fully subquadratic attention architecture that claims linear compute scaling with context length and a 12 million-token research result. The startup says SubQ reduces attention compute by nearly 1,000× versus frontier transformer models and delivers state-of-the-art long-context accuracy on needle-in-a-haystack and exact-copy tasks. SubQ is launching private beta access to an API, SubQ Code (a CLI coding agent that loads entire repositories into one context), and SubQ Search for long-context research. Third-party-verified benchmark claims include 95% on RULER 128K, strong MRCR v2 results, and an SWE-Bench score of 81.8 versus peers such as Claude Opus, GPT, and Gemini.
Subquadratic unveiled SubQ 1M-Preview, claiming the first fully subquadratic large language model whose attention compute grows linearly with context length, enabling practical multi-million-token contexts (research result at 12 million tokens). The company says SubQ delivers state-of-the-art long-context accuracy, faster inference, and far lower compute — reporting up to ~1,000x lower attention compute versus frontier models and architecture-level sparse-attention gains (52× faster than FlashAttention, 63% less compute). Subquadratic is opening private beta access to an API, a CLI coding agent (SubQ Code) that loads full codebases into one context, and SubQ Search. Third-party-verified benchmarks (RULER 128K, MRCR v2, SWE-Bench) are cited to position SubQ against Claude Opus, GPT, and Gemini.