Loading...
Loading...
Miami startup Subquadratic unveiled SubQ 1M-Preview, a claimed subquadratic sparse-attention LLM that reportedly scales attention compute linearly to support research results at 12 million-token contexts. The company says SubQ cuts attention costs by up to ~1,000× versus frontier transformers, achieves strong third-party benchmark results, and is launching private-beta products including an API, a coding agent that loads whole repos into one context, and a long-context search tool after raising $29M. Researchers urge independent validation, noting prior skepticism about replacing transformers’ quadratic attention and calling for reproducible benchmarks to confirm SubQ’s potential to reshape long-context model design and reduce retrieval workarounds.
If Subquadratic's claims hold, developers and infrastructure teams could handle multi-million-token contexts with far lower attention costs, changing model design and retrieval patterns. Independent validation is critical for tech pros to assess integration, benchmarking, and cost implications.
Dossier last updated: 2026-05-11 07:14:34
Miami startup Subquadratic announced a new AI model, SubQ, claiming up to 1,000x efficiency gains over current architectures, but researchers and experts are calling for independent benchmarks and transparent methodology. Subquadratic asserts architectural innovations that reduce compute and memory costs, positioning SubQ as a potential disruptor for large-model deployment and edge inference. Critics warn the claim lacks peer-reviewed results, open-source code, and standardized evaluation across workloads, noting past industry examples where early performance claims did not generalize. The dispute matters because unverified efficiency breakthroughs could reshape costs for cloud providers, startups, and developers if true, but require reproducible evidence before influencing adoption, investment, or research directions.
Miami startup Subquadratic claims its first model, SubQ 1M-Preview, uses a new Subquadratic Sparse Attention (SSA) architecture that makes attention compute grow linearly with context length, producing up to ~1,000x lower attention cost at 12 million tokens versus current frontier models. The company launched three private-beta products (an API, SubQ Code agent, and SubQ Search) and raised $29M in seed funding from investors including Justin Mateen and Javier Villamizar, with a reported $500M valuation. Researchers have reacted skeptically and called for independent validation because prior attempts to beat transformers’ quadratic attention cost have struggled; if verified, SSA could materially change long-context LLM design and lower the need for retrieval workarounds.
Subquadratic announced SubQ 1M-Preview, the company’s first LLM built on a fully subquadratic attention architecture that claims linear compute scaling with context length and a 12 million-token research result. The startup says SubQ reduces attention compute by nearly 1,000× versus frontier transformer models and delivers state-of-the-art long-context accuracy on needle-in-a-haystack and exact-copy tasks. SubQ is launching private beta access to an API, SubQ Code (a CLI coding agent that loads entire repositories into one context), and SubQ Search for long-context research. Third-party-verified benchmark claims include 95% on RULER 128K, strong MRCR v2 results, and an SWE-Bench score of 81.8 versus peers such as Claude Opus, GPT, and Gemini.
Subquadratic unveiled SubQ 1M-Preview, claiming the first fully subquadratic large language model whose attention compute grows linearly with context length, enabling practical multi-million-token contexts (research result at 12 million tokens). The company says SubQ delivers state-of-the-art long-context accuracy, faster inference, and far lower compute — reporting up to ~1,000x lower attention compute versus frontier models and architecture-level sparse-attention gains (52× faster than FlashAttention, 63% less compute). Subquadratic is opening private beta access to an API, a CLI coding agent (SubQ Code) that loads full codebases into one context, and SubQ Search. Third-party-verified benchmarks (RULER 128K, MRCR v2, SWE-Bench) are cited to position SubQ against Claude Opus, GPT, and Gemini.