Loading...
Loading...
A how-to shows building a production-ready RAG backend that scales from hundreds to millions of documents by parallelizing embedding generation and using managed cloud services. The author outlines an architecture using BigQuery as the data source, Cloud Run Jobs to run hundreds of parallel workers, Vertex AI’s text-embedding-005 via the google-genai SDK, and AlloyDB for PostgreSQL with pgvector for storing and querying vectors. The post includes Terraform-based provisioning, Cloud Run Jobs patt
Tech teams building retrieval-augmented systems need scalable, maintainable pipelines for embedding generation and vector storage; the described stack shows how managed Google Cloud services can reduce operational burden while supporting large document volumes. Understanding these patterns helps engineers design cost-effective, parallelized workflows and plan safe embedding upgrades.
Dossier last updated: 2026-05-14 14:38:01
Gemma 4, Google DeepMind’s new open model family, has reached over 50 million downloads and aims to deliver high intelligence-per-parameter across phones, browsers, and consumer GPUs. The family includes ultra-mobile E2B/E4B sizes, a 31B dense model for local server-grade workloads, and a 26B Mixture-of-Experts variant for high-throughput reasoning. Released under an Apache 2 license, Gemma 4 lets developers modify and commercialize locally deployed agents. Demos showed multiple local Gemma instances creating SVGs, an Android agent selecting skills, a food-tour agent using the Agent Development Kit and Google Maps, and autonomous Python execution where Gemma 4 wrote and fixed Matplotlib physics code to animate a bouncing ball. The release signals broader on-device agent capabilities and easier startup integration.
Developer built Y&Y App, an industrial-grade SaaS that combines live ERP inventory with an AI domain-expert agent to diagnose equipment faults using a Gemma 4 26B MoE model. The microservices stack uses .NET 8 for ERP logic, FastAPI/Python for AI, and React on the frontend; embeddings from gemini-embedding-001 are stored in PostgreSQL with pgvector for a RAG pipeline. Queries compute cosine similarity to retrieve exact OEM manual excerpts, then gemma-4-26b-a4b-it synthesizes safe, grounded troubleshooting steps with low latency thanks to MoE sparsity. The project is production-oriented (deployed on Vercel/Cloud Run) and open-sourced on GitHub, demonstrating a practical, safety-minded application of LLMs in industrial maintenance.
A developer post outlines how to migrate vector embeddings in a production retrieval-augmented generation (RAG) system without downtime when upgrading embedding models. Because embeddings are model-specific, existing vectors become incompatible with queries from a new model, making a “re-embed everything” cutover risky and disruptive. The article proposes a shadow deployment approach using a dual-column schema in AlloyDB for PostgreSQL: add a new column (e.g., embedding_v2), backfill it in the background, then switch application logic once all rows are updated. Background processing is handled with Cloud Run Jobs, and the broader pipeline references BigQuery and Vertex AI embeddings. The post also covers consistency checks and handling migration failures, with code in a public GitHub repository.
A how-to shows building a production-ready RAG backend that scales from hundreds to millions of documents by parallelizing embedding generation and using managed cloud services. The author outlines an architecture using BigQuery as the data source, Cloud Run Jobs to run hundreds of parallel workers, Vertex AI’s text-embedding-005 via the google-genai SDK, and AlloyDB for PostgreSQL with pgvector for storing and querying vectors. The post includes Terraform-based provisioning, Cloud Run Jobs patterns to avoid API rate limits, and a reference GitHub repo with code and setup steps. This matters because naive sequential embedding pipelines fail at scale; the described serverless, parallel approach reduces latency and operational friction for large-scale RAG systems.