vertex ai / cloud run jobs / alloydb

A how-to shows building a production-ready RAG backend that scales from hundreds to millions of documents by parallelizing embedding generation and using managed cloud services. The author outlines an architecture using BigQuery as the data source, Cloud Run Jobs to run hundreds of parallel workers, Vertex AI’s text-embedding-005 via the google-genai SDK, and AlloyDB for PostgreSQL with pgvector for storing and querying vectors. The post includes Terraform-based provisioning, Cloud Run Jobs patt

Why It Matters

Tech teams building retrieval-augmented systems need scalable, maintainable pipelines for embedding generation and vector storage; the described stack shows how managed Google Cloud services can reduce operational burden while supporting large document volumes. Understanding these patterns helps engineers design cost-effective, parallelized workflows and plan safe embedding upgrades.

Latest Changes

How-to details using BigQuery as source, Cloud Run Jobs for parallel workers, Vertex AI text-embedding-005, and AlloyDB with pgvector

Post includes Terraform provisioning and Cloud Run Jobs patterns for hundreds of parallel workers

Separate guide explains zero-downtime migration of vector embeddings when changing models

Timeline

2026-04-15 — How-to published showing scalable RAG backend with Cloud Run Jobs, BigQuery, Vertex AI embeddings, and AlloyDB with pgvector

2026-04-21 — Developer explains techniques to migrate production vector embeddings without downtime for embedding model upgrades

2026-05-08 — AI-powered ERP system built using Gemma 4 26B MoE highlights microservices and domain-expert agent patterns

2026-05-14 — Agent Factory recap notes Gemma 4 downloads exceed 50 million, indicating wide adoption of new model families

Recent News (4)

Agent Factory Recap: How Gemma 4 Taught Itself Physics

Gemma 4, Google DeepMind’s new open model family, has reached over 50 million downloads and aims to deliver high intelligence-per-parameter across phones, browsers, and consumer GPUs. The family includes ultra-mobile E2B/E4B sizes, a 31B dense model for local server-grade workloads, and a 26B Mixture-of-Experts variant for high-throughput reasoning. Released under an Apache 2 license, Gemma 4 lets developers modify and commercialize locally deployed agents. Demos showed multiple local Gemma instances creating SVGs, an Android agent selecting skills, a food-tour agent using the Agent Development Kit and Google Maps, and autonomous Python execution where Gemma 4 wrote and fixed Matplotlib physics code to animate a bouncing ball. The release signals broader on-device agent capabilities and easier startup integration.

5pts

Dev.toshirmeirlador1h ago

AI-Powered ERP System with Gemma 26B MoE

Developer built Y&Y App, an industrial-grade SaaS that combines live ERP inventory with an AI domain-expert agent to diagnose equipment faults using a Gemma 4 26B MoE model. The microservices stack uses .NET 8 for ERP logic, FastAPI/Python for AI, and React on the frontend; embeddings from gemini-embedding-001 are stored in PostgreSQL with pgvector for a RAG pipeline. Queries compute cosine similarity to retrieve exact OEM manual excerpts, then gemma-4-26b-a4b-it synthesizes safe, grounded troubleshooting steps with low latency thanks to MoE sparsity. The project is production-oriented (deployed on Vercel/Cloud Run) and open-sourced on GitHub, demonstrating a practical, safety-minded application of LLMs in industrial maintenance.

5pts

Dev.to

Why It Matters

Latest Changes

Timeline

What to Watch

Recent News (4)