Loading...
Loading...
A new paper, "Continual Harness: Online Adaptation for Self-Improving Foundation Agents," proposes methods for foundation models to adapt continuously online using interaction data so they self-improve over time. The work presents algorithms and an experimental framework for continual learning that mitigates catastrophic forgetting, balances stability and plasticity, and leverages streaming feedback from users or environments. Key players are the paper authors and the broader ML community seekin
Continual learning and online adaptation let foundation models improve from live interactions, reducing manual retraining and improving user relevance. Tech professionals must design systems that balance learning speed with model stability and safety.
Dossier last updated: 2026-05-18 03:25:32
Researchers propose Self-Distillation Fine-Tuning (SDFT), a method that converts demonstration-conditioned models into their own on-policy teachers to enable continual learning without catastrophic forgetting. Authors Idan Shenfeld, Mehul Damani and Jonas Hübotter show SDFT uses in-context learning to generate on-policy training signals from demonstrations, outperforming supervised fine-tuning (SFT) across skill learning and knowledge acquisition tasks. In sequential experiments a single model accumulates multiple skills over time with less performance regression, suggesting on-policy distillation as a practical route for foundation models to learn new tasks while preserving prior capabilities. The approach matters for deploying adaptive AI agents where reward functions are unavailable and maintaining long-term model competence is critical.
Researchers introduce Self-Distillation Fine-Tuning (SDFT), a method that enables on-policy continual learning from demonstrations by having a model use its own demonstration-conditioned in-context outputs as a teacher. Authored by Idan Shenfeld, Mehul Damani and Jonas Hübotter (with Pulkit Agrawal), the paper shows SDFT outperforms standard supervised fine-tuning (SFT) on skill acquisition and knowledge tasks: it raises new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, a single model accumulates multiple skills without performance regression, suggesting on-policy distillation from demonstrations is a practical path for foundation models to learn continuously when explicit reward functions are unavailable. This could impact how large models are updated for evolving capabilities.
Researchers Idan Shenfeld, Mehul Damani and Jonas Hübotter introduce Self-Distillation Fine-Tuning (SDFT), a simple on-policy method that enables continual learning from demonstrations without explicit reward functions. Instead of standard supervised fine-tuning (SFT), SDFT uses a demonstration-conditioned model as its own teacher to generate on-policy training signals via in-context learning. Across skill acquisition and knowledge transfer tasks, SDFT outperforms SFT—improving new-task accuracy while substantially reducing catastrophic forgetting—and enables sequential accumulation of multiple skills without regression. The paper positions on-policy distillation from demonstrations as a practical route for foundation models to learn continually while preserving prior capabilities.
A new paper, "Continual Harness: Online Adaptation for Self-Improving Foundation Agents," proposes methods for foundation models to adapt continuously online using interaction data so they self-improve over time. The work presents algorithms and an experimental framework for continual learning that mitigates catastrophic forgetting, balances stability and plasticity, and leverages streaming feedback from users or environments. Key players are the paper authors and the broader ML community seeking robust deployment of large pretrained models. This matters because enabling safe, efficient online adaptation could reduce offline retraining costs, improve personalization, and keep deployed AI systems up to date without manual dataset curation. The approach impacts production ML, RL, and applications relying on long-lived agents.