Loading...
Loading...
JetBrains has open-sourced Mellum2, a compact mixture-of-experts (MoE) model family optimized for local AI workflows and developer tooling. With 12B and 2.5B expert variants tuned for coding, Mellum2 targets low-latency, resource-efficient on-device inference to power code completion, IDE assistants, and embedded chat agents. JetBrains positions the models as privacy-friendly, cost-saving alternatives to cloud LLMs, publishing weights on Hugging Face and a technical report on arXiv. Early benchmarks show strong coding reasoning vs. some competitors but mixed results on general tasks, highlighting trade-offs between specialization and broader capabilities. Engineers should benchmark Mellum2 for specific developer scenarios and validate inference and accuracy trade-offs.
JetBrains open-sourcing Mellum2 matters because it offers a privacy-friendly, low-latency alternative for on-device code assistance and IDE integrations, reducing reliance on cloud LLMs. Tech professionals must evaluate trade-offs between specialized coding performance and general task capability when adopting local MoE models.
Dossier last updated: 2026-06-02 06:06:06
JetBrains open-sourced Mellum2, a 12B-parameter ML model tailored for software engineering, upgrading from a 4B code-completion model to a full coding assistant with a 131,072-token context window. Released under Apache 2.0, Mellum2 comes in base, instruct, and thinker variants and uses a sparse Mixture-of-Experts design with 2.5B active parameters for efficient inference on standard hardware. It supports code generation and editing, external tool calls, multi-step agentic workflows, long conversations, and is positioned for AI workload routing, low-latency RAG pipelines, sub-agent orchestration, and private local deployment. Training used a three-stage curriculum moving from web data to curated code and math to better suit software tasks.
JetBrains released Mellum2, a 12-billion-parameter Mixture-of-Experts (MoE) model trained from scratch on text and code and licensed under Apache 2.0. Mellum2 activates only 2.5B parameters per token, delivering more than 2x faster inference than similar-sized open models while remaining competitive on code, reasoning, science, and math benchmarks. Designed for latency-sensitive, high-throughput tasks, it targets routing, RAG pipelines, summarization, sub-agent workloads, and private self-hosted deployments. The model’s MoE architecture boosts capacity without raising per-token compute, making it suitable as a lightweight focal model inside multi-model AI stacks. The model and technical report are available on Hugging Face and arXiv respectively for engineers to evaluate and deploy.
JetBrains announced Mellum2, an open-source, fast language model designed for local AI workflows and developer tooling. Mellum2 targets efficient on-device inference with optimizations for latency and resource use, making it suitable for code assistance, chat agents, and embedded AI features. JetBrains emphasizes integration into developer environments and compatibility with common tooling, aiming to give teams a privacy-friendly alternative to cloud LLMs while lowering infrastructure costs. The move matters because it expands accessible, performant models for developers, supports local-first AI adoption, and could shift expectations around latency, data privacy, and offline capabilities in software development and productivity tools.
JetBrains released Mellum 2, a small mixture-of-experts (MoE) model family tuned for coding, with the 12B and a 2.5B expert variant focused on developer workflows. JetBrains claims Mellum 2's coding reasoning matches Qwen 3.5 9B while performing worse than Qwen 3.5 4B on most other tasks. Models and weights are available on Hugging Face and a technical report is hosted on arXiv. This matters because a compact MoE optimized for code could lower inference costs for developer tools while shifting trade-offs between specialization and general reasoning. Engineers and platform builders should evaluate Mellum 2 for code completion, IDE assistants, and cost-sensitive deployments, and validate benchmarks across diverse coding and non-coding tasks.