Loading...
Loading...
Developers working with Gemma 4 and GGUF formats should update their local model files: patched Gemma 4 GGUF builds addressing a Chat Template bug are now available on Hugging Face for the 31B and 26B variants. Parallel tooling advances ease local model grafting—an author released a utility to extract only the necessary MTP tensors from GGUF donor files, producing compact ~900 MB faux-GGUF artifacts for use with MTP grafting scripts. Together, these updates reduce friction for local deployment and model merging workflows, cutting storage and transfer overhead while ensuring stabilized chat-template behavior for Gemma 4 users.
Gemma 4 GGUF fixes and lightweight donor GGUF tooling reduce deployment friction and storage needs for local model merging workflows. Tech professionals managing local inference, grafting, or custom chat templates should adapt pipelines to use patched models and compact donor artifacts.
Dossier last updated: 2026-05-11 21:05:49
A user discovered a tuned version of Nemotron on Hugging Face—Nemotron-3-Super-64B-A12B-Math-REAP-GGUF—that claims to run large-context workloads efficiently on 48 GB of VRAM, achieving about 21 tokens/sec for coding and supporting extremely long (500k-token) context. The model is presented as a math-focused, distilled/tuned variant intended to emulate parts of the larger 12B Nemotron Super but with far lower resource requirements. This matters because compact, optimized model builds and GGUF packaging can enable researchers and developers to run near-large-model capabilities on desktop GPUs, lowering the barrier for experimenting with long-context agentic use cases and coding assistance. Key players: Hugging Face hosting, Max-and-Omnis as the uploader, and the Nemotron family of models.
A contributor created a lightweight tool to extract MTP tensors from GGUF model files so the grafting script no longer needs a full GGUF donor. The result is two compact "faux GGUF" files (~900 MB) designed to contain only the tensors required for MTP grafting, with a Hugging Face upload provided as an example. This matters because smaller donor files reduce storage and transfer overhead for local model grafting workflows, making experiments with MTP-based model merging more accessible to developers and hobbyists. The post links the extraction script and the reduced GGUF artifacts, enabling easier reuse in local model modification pipelines.
it's time to update your Gemma 4 GGUFs
A recent note alerts users to update their Gemma 4 GGUF model files after a Chat Template issue was fixed. The post points users to two Hugging Face uploads hosting updated Gemma 4 variants—google_gemma-4-31B-it-GGUF and google_gemma-4-26B-A4B-it-GGUF—so users can download fixed GGUF builds. This matters for developers and practitioners running local Gemma 4 models in GGUF format because outdated or buggy template handling could affect chat-based interfaces, integrations, or inference behavior. The update and direct links help maintainers and deployers keep local LLM deployments stable and compatible with chat templates and tooling.