ollama / local llm / desktop agent

A developer built a desktop 'crab'—a transparent on-screen pet that interacts with users, accepts input like talking or being thrown, and responds with a cheeky, bullying personality. It runs entirely locally using an Ollama-hosted model and leverages completion-format prompting instead of instruction-following to keep behavior coherent on smaller models. The project highlights desktop agents and local LLM deployment, prioritizing privacy and low-latency interactions without cloud dependency. It

Latest Changes

Developers are running Gemma 4 variants locally on older desktops using Ollama as a model manager.

Community builds of Ollama enable offline Gemma 4 inference on Android via Termux without cloud APIs or keys.

Production deployments using Ollama can silently run without loaded models, causing unexpected fallbacks.

A creative desktop agent ('crab') demonstrates interactive, personality-driven local agents using completion-format prompting.

Timeline

2026-05-09 — Developer ships a desktop 'crab' agent that runs entirely locally and uses completion-format prompting.

2026-05-12 — Author reports deploying local LLM features in production but Ollama ran for weeks with no models pulled.

2026-05-13 — Student demonstrates running Gemma 4 locally on Android via Termux and a community Ollama build.

2026-05-14 — Tester shows a 2015 desktop can run smaller Gemma 4 variants locally using Ollama as model manager.

Recent News (4)

Old PC vs New AI: Can a 2015 Desktop Actually Run Gemma 4? (2B vs 4B Benchmark)

A 2015 desktop with an i5-6400, 24 GB RAM and a GTX 950 (2 GB VRAM) can run smaller Gemma 4 variants locally, the author reports, using Ollama as a local model manager. Based on memory requirements, Gemma 4 E2B (≈2B params) and E4B (≈4B params) are realistic for this hardware, while 26B and 31B variants are impractical. The article outlines selecting the right Gemma 4 variant, installing Ollama, and benchmarking speed, reasoning, knowledge, code generation, structured output, instruction following, and system metrics to assess usability. The piece argues smaller, optimized LLMs are opening up local AI on aging consumer hardware, enabling offline workflows and edge deployments without high-end GPUs or cloud costs.

8pts

Dev.togramli1h ago

I Ran an AI Model on My Phone. No Cloud. No API Keys. Just Gemma 4 and Termux.

A software engineering student documented running Google’s Gemma 4 locally on an Android phone using Termux and a community build of Ollama, demonstrating offline LLM inference without cloud APIs or billing. He deployed the E2B variant (2.3B effective parameters, 128K context) after compiling Ollama in Termux, pulled gemma4:2b, and ran it locally; the model served responses and could be exposed via Ollama’s local API (port 11434) so other devices on the same Wi‑Fi can query the phone as a private LLM server. The guide notes practical tradeoffs—multi-gigabyte downloads, long compile times, thermal throttling, memory limits, and occasional reasoning errors—while highlighting expanded access, privacy, and new edge deployment patterns. This matters because mobile-first, offline LLMs lower barriers for developers without cloud access and shift the client/server calculus for AI services.

ollama / local llm / desktop agent

Why It Matters

Latest Changes

Timeline

What to Watch

Recent News (4)