Loading...
Loading...
A user benchmarking Qwen3-code-next against the newer Qwen3.5-35B-A3b reports that the older 3-code-next model outperforms 3.5-35B-A3b on tool-calling tasks inside VS Code using the Continue extension, despite 3-code-next running with more aggressive quantization. The tester is surprised because quantization usually reduces precision, yet the code-focused model showed better integration for invoking external tools. This matters to developers and teams choosing LLMs for IDE tool-calling and code
A developer praises Qwen-3.5 27B after testing Qwen-3.5 122B and finding the larger model comparable to Google Gemini 3 Flash for coding tasks, but too costly to run locally due to required motherboard upgrades and extra RTX 3090 GPUs. The author was initially skeptical of claims that the 27B version outperformed the 122B, then discovered a user (nemotr) demonstrating strong results with Qwen-3.5 27B, prompting reevaluation. The post highlights trade-offs between model size, performance, and local hardware costs, and underlines growing interest in smaller, efficient LLMs that can match larger models for developer workflows. It matters because lower-hardware LLMs can widen access to powerful coding assistants without expensive GPU investments.
A user with an RTX 5060 Ti 16GB and 64GB RAM is considering buying a second RTX 5060 8GB (~€280) to run Qwen-3.5 27B locally at Q4/Q5 with 100k+ context for agentic coding tasks. They currently run Qwen3-Coder-Next at Q5 achieving ~26 tokens/sec but want better throughput and larger context windows for coding and agent workflows. Key considerations include VRAM constraints (8GB likely limits model quantization and offloading), multi-GPU setups requiring NVLink or careful model sharding, CPU/RAM and storage I/O for memory-mapped models, and software support (quantization levels, offload libs like GGML, bitsandbytes, or vLLM). The value depends on whether the secondary 8GB card can be effectively used for offload/partitioning versus investing in a single larger-VRAM GPU or cloud access.
A buyer found an Nvidia GeForce RTX 3090 locally for $623 and asks whether it’s a good deal for running large language models like Qwen-3.5 27B. They request real-world metrics from owners—tokens per second (TG), perplexity per token (PP), and quantization formats used—indicating interest in on-device inference performance. The post reflects uncertainty about current GPU market dynamics and the difficulty of building or upgrading systems amid fluctuating prices and supply. This matters for hobbyists and small labs evaluating cost-effective hardware for AI workloads, where older high-end GPUs can still be valuable if they meet memory and quantization needs.
A developer benchmarked various Qwen3.5-35B-A3B quantized (Q4–Q3) model builds on an NVIDIA RTX 3090 using a 10K token context window, excluding smaller Q3_K_S-sized files. The post reports throughput and likely memory/latency differences across unsloth variants of Qwen3.5-35B-A3B to show performance trade-offs for different quantization formats. This matters to practitioners running large LLMs on consumer GPUs because quantization and format choices directly affect inference speed, VRAM usage, and cost of hosting; benchmarking helps users pick the best model file for local or edge deployment. Key players are the Qwen3.5-35B-A3B model, quantization formats (Q4/Q3), and the RTX 3090 hardware.
A user benchmarking Qwen3-code-next against the newer Qwen3.5-35B-A3b reports that the older 3-code-next model outperforms 3.5-35B-A3b on tool-calling tasks inside VS Code using the Continue extension, despite 3-code-next running with more aggressive quantization. The tester is surprised because quantization usually reduces precision, yet the code-focused model showed better integration for invoking external tools. This matters to developers and teams choosing LLMs for IDE tool-calling and code workflows: model architecture, training focus, and task alignment can beat raw parameter or version numbers. Key players: Qwen models (Qwen3-code-next, Qwen3.5-35B-A3b), VS Code Continue extension. Further controlled benchmarks would clarify if this is generalizable.