JetBrains Open-Sources Mellum2 for Local AI

JetBrains has open-sourced Mellum2, a compact mixture-of-experts (MoE) model family optimized for local AI workflows and developer tooling. With 12B and 2.5B expert variants tuned for coding, Mellum2 targets low-latency, resource-efficient on-device inference to power code completion, IDE assistants, and embedded chat agents. JetBrains positions the models as privacy-friendly, cost-saving alternatives to cloud LLMs, publishing weights on Hugging Face and a technical report on arXiv. Early benchmarks show strong coding reasoning vs. some competitors but mixed results on general tasks, highlighting trade-offs between specialization and broader capabilities. Engineers should benchmark Mellum2 for specific developer scenarios and validate inference and accuracy trade-offs.

Why It Matters

JetBrains open-sourcing Mellum2 matters because it offers a privacy-friendly, low-latency alternative for on-device code assistance and IDE integrations, reducing reliance on cloud LLMs. Tech professionals must evaluate trade-offs between specialized coding performance and general task capability when adopting local MoE models.

Latest Changes

JetBrains released Mellum2 as open-source under the Apache 2.0 license

Mellum2 is a 12B-parameter MoE that activates 2.5B parameters per token

Models and weights published on Hugging Face with a technical report on arXiv

Optimizations target low-latency, resource-efficient on-device inference

Variants include a 12B model and a 2.5B-expert variant tuned for coding

Timeline

2026-06-01 — JetBrains blog announced Mellum2 as a fast model for local AI workflows and developer tooling

2026-06-01 — Initial public notes described Mellum2 as a small MoE family with coding-focused variants

2026-06-02 — JetBrains published Mellum2 weights and technical report, stating Apache 2.0 licensing

2026-06-02 — Company highlighted Mellum2's 131,072-token context window and upgrade from prior 4B code-completion model

Recent News (4)

JetBrains 开源 Mellum2 模型：12B 参数，升级为 AI 智能体编程助手

JetBrains open-sourced Mellum2, a 12B-parameter ML model tailored for software engineering, upgrading from a 4B code-completion model to a full coding assistant with a 131,072-token context window. Released under Apache 2.0, Mellum2 comes in base, instruct, and thinker variants and uses a sparse Mixture-of-Experts design with 2.5B active parameters for efficient inference on standard hardware. It supports code generation and editing, external tool calls, multi-step agentic workflows, long conversations, and is positioned for AI workload routing, low-latency RAG pipelines, sub-agent orchestration, and private local deployment. Training used a three-stage curriculum moving from web data to curated code and math to better suit software tasks.

NewsNow2h ago

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains released Mellum2, a 12-billion-parameter Mixture-of-Experts (MoE) model trained from scratch on text and code and licensed under Apache 2.0. Mellum2 activates only 2.5B parameters per token, delivering more than 2x faster inference than similar-sized open models while remaining competitive on code, reasoning, science, and math benchmarks. Designed for latency-sensitive, high-throughput tasks, it targets routing, RAG pipelines, summarization, sub-agent workloads, and private self-hosted deployments. The model’s MoE architecture boosts capacity without raising per-token compute, making it suitable as a lightweight focal model inside multi-model AI stacks. The model and technical report are available on Hugging Face and arXiv respectively for engineers to evaluate and deploy.

src_agent-collectrss-huggingface4h ago

Why It Matters

Latest Changes

Timeline

What to Watch

Recent News (4)