Loading...
Loading...
DeepSeek’s V4 Pro isn’t touted as the strongest model, but its architecture and nearly free caching approach are reshaping cost dynamics in large models. Analysts highlight innovations—MoE, MLA, Engram, mHC—that slash KV-cache and compute needs, enabling drastic inference savings and reportedly cutting deployment costs by an order of magnitude for adopters like Opus. Observers argue these underlying, hardware-aware optimizations matter more than chasing benchmark-driven features or multimodal bells and whistles. With anticipated V4.1 training on real harness data, proponents expect accuracy and efficiency to improve further, positioning DeepSeek among the leading domestic model efforts.
DeepSeek's cache-first architecture and hardware-aware optimizations change inference cost calculus for LLM deployers, affecting budgeting and system design. Tech teams should reassess model selection priorities toward efficiency innovations rather than only benchmark-driven features.
Dossier last updated: 2026-05-29 01:39:41
Stepfun (阶跃星辰) has open sourced Step 3.7 Flash, a new Flash-series model optimized for production agent workloads with a sparse MoE architecture. The model totals 196B parameters plus a 1.8B ViT and 11B activation parameters, and claims up to 400 tokens/sec generation—targeted for high-frequency, multi-turn, low-latency agent applications. Key features include native multimodal understanding and execution (UI, charts, documents, images), strengthened web and visual search for cross-source evidence retrieval, robust tool calling and orchestration across long multi-step workflows, and compatibility optimizations for major agent frameworks and tool protocols (e.g., Claude Code, KiloCode, RooCode, MCP/Skills). Code and model artifacts are available on GitHub, Hugging Face, Modelscope, and Stepfun’s platform.
StepFun released Step 3.7 Flash, a multimodal mixture-of-experts (MoE) model with 196 billion total parameters but only 11 billion active, designed to run locally on a system with 128 GB of RAM. The model includes a built-in 1.8B ViT visual encoder and targets the “flash” class of efficient large models. Reported benchmarks show competitive performance: SWE-Bench Pro 56.26% (slightly ahead of DeepSeek V4 Flash and similar to Gemini 3.5 Flash) and a DeepSearchQA F1 of 92.82%, positioning it as a strong local inference option. This matters for developers and organizations seeking high-capability multimodal models that can run on modest hardware without cloud dependency. Key players: StepFun; competing models referenced include DeepSeek and Gemini.
@oran_ge: 这篇文章的核心就是这一张图了 deepseek v4 pro 虽然不是最好的模型 但是缓存基本不要钱 这是所有大模型都需要的技术 opus 用这个技术成本都能下降10倍 同时相信 v4.1 有了真实的 harness 数据进行训练之后,一定
为什么我认为DeepSeek是国内大模型的第一梯队? 大家可以好好地读一下这篇文章 梁文锋并没有追逐来自编码计划或多模态模型的快钱 相反,他们的激进架构创新(MoE,MLA,Engram,mHC等)大幅削减了KV缓存和计算需求 特别是MoE,极大地降低了AI硬件的推理要求,这些底层架构式的创新,才是DeepSeek真正的力量 而反观国内的大厂,他们不静下来研究,天天搞这个榜单刷榜的行为,这些大厂搞的模型,我觉得都没救了,评分很高,实际干活不行