Transformers Scale Up as Coherence Quickly Emerges

Across research roundups and hands-on training notes, the story is how Transformers consolidated their lead through scale, tooling, and growing interpretability. Gwern.net’s 2021 newsletters track rapid acceleration from GPT-3’s API-driven adoption to a wave of 100B+ industrial models such as Naver’s HyperCLOVA, alongside new architectures (Perceiver, Set Transformers) and multimodal advances (CLIP, SEER). Complementing the macro view, a checkpoint-by-checkpoint GPT-2-style training experiment shows coherence emerging quickly and inheriting web-data biases, underscoring evaluation and dataset concerns. A technical explainer on encoder–decoder attention ties these trends to core mechanisms that make large models practical to scale and stack.

Recent News (4)

March 2021 Gwern.net Newsletter

Gwern.net's March 2021 newsletter aggregates recent updates to the site and a curated list of AI, genetics, and broader science links. Key highlights include enabling mobile "popins" and new recursive Wikipedia popups on Gwern.net, and a roundup of AI research: dissecting CLIP's multimodal neurons (Goh et al.), large-scale self-supervised vision work like SEER, critiques of ImageNet transfer learning, Vision Transformer robustness studies, and discussion of GPT-3 API adoption. The newsletter also links reinforcement-learning papers, autonomous vehicle simulation analyses, and debates about global AI leadership. Other topics include genetics GWAS findings, evolutionary biology, and meta-science items. It matters as a concise signal of influential papers and trends for researchers and practitioners tracking ML, self-supervision, and AI tool adoption.

AI Blogsgwern2h ago

April 2021 newsletter

Gwern.net's April 2021 newsletter compiles links and commentary on AI/ML research and large-model developments, noting GPT-3's impact and new giant language and multimodal models from industry (OpenAI, Naver HyperCLOVA, Huawei PanGu-α, Google LaMDA/MUM, Alibaba PLUG). It highlights papers on Set Transformers, Perceiver, Z-IL (local learning rules matching backprop), super-convergence, and creative uses of generative models (CogView, VideoGPT, GODIVA). The newsletter flags trends: continued Transformer dominance, rapid scale-ups (100B+ models), Chinese and multinational efforts, and open checkpoints/releases. This matters for researchers, engineers, and policy watchers because it maps where compute, datasets, and architectures are driving AI capability gains and where reproducibility, efficiency, and governance questions will arise.

AI Blogsgwern2h ago