continual learning / foundation models / online adaptation — Topic | TechScan AI — Tech & AI News

continual learning / foundation models / online adaptation

A new paper, "Continual Harness: Online Adaptation for Self-Improving Foundation Agents," proposes methods for foundation models to adapt continuously online using interaction data so they self-improve over time. The work presents algorithms and an experimental framework for continual learning that mitigates catastrophic forgetting, balances stability and plasticity, and leverages streaming feedback from users or environments. Key players are the paper authors and the broader ML community seekin

0.0

Cooling

News Items

Articles

Sources

First Seen

2026-05-14 07:21:48

7-Day Trend

05-17

05-18

05-19

05-20

Source Breakdown

Zeli (1)Reddit (1)HN (1)agent-collect (1)

Key Entities

Mehul DamaniIdan ShenfeldJonas HübotterSelf-Distillation Fine-TuningSelf-Distillation Fine-Tuning (SDFT)Pulkit Agrawalsupervised fine-tuning (SFT)supervised fine-tuningContinual Harness

Why It Matters

Continual learning and online adaptation let foundation models improve from live interactions, reducing manual retraining and improving user relevance. Tech professionals must design systems that balance learning speed with model stability and safety.

Latest Changes

New paper 'Continual Harness' presents framework and algorithms for online self-improvement of foundation agents.
Multiple releases describe Self-Distillation Fine-Tuning (SDFT) as an on-policy method to enable continual learning.
SDFT converts demonstration-conditioned models into on-policy teachers to mitigate catastrophic forgetting.

Timeline

2026-05-14 — Publication of 'Continual Harness' framing online adaptation for self-improving foundation agents.
2026-05-17 — Researchers Idan Shenfeld, Mehul Damani and Jonas Hübotter publish 'Self-Distillation Enables Continual Learning' describing SDFT.
2026-05-17 — Duplicate release of the SDFT paper PDF appears, highlighting demonstration-conditioned in-context outputs used as teachers.
2026-05-18 — A news item summarizes SDFT in Chinese, emphasizing self-distillation to solve catastrophic forgetting.

What to Watch

Evaluation outcomes showing SDFT's effectiveness at preventing catastrophic forgetting in large foundation models.
How continual harness algorithms balance stability versus plasticity when learning from streaming user feedback.

Dossier last updated: 2026-05-18 03:25:32

Recent News (4)

SDFT：自蒸馏实现持续学习，解决灾难性遗忘

Researchers propose Self-Distillation Fine-Tuning (SDFT), a method that converts demonstration-conditioned models into their own on-policy teachers to enable continual learning without catastrophic forgetting. Authors Idan Shenfeld, Mehul Damani and Jonas Hübotter show SDFT uses in-context learning to generate on-policy training signals from demonstrations, outperforming supervised fine-tuning (SFT) across skill learning and knowledge acquisition tasks. In sequential experiments a single model accumulates multiple skills over time with less performance regression, suggesting on-policy distillation as a practical route for foundation models to learn new tasks while preserving prior capabilities. The approach matters for deploying adaptive AI agents where reward functions are unavailable and maintaining long-term model competence is critical.

src_agent-collectarXiv2d ago

Self-Distillation Enables Continual Learning [pdf]

Researchers introduce Self-Distillation Fine-Tuning (SDFT), a method that enables on-policy continual learning from demonstrations by having a model use its own demonstration-conditioned in-context outputs as a teacher. Authored by Idan Shenfeld, Mehul Damani and Jonas Hübotter (with Pulkit Agrawal), the paper shows SDFT outperforms standard supervised fine-tuning (SFT) on skill acquisition and knowledge tasks: it raises new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, a single model accumulates multiple skills without performance regression, suggesting on-policy distillation from demonstrations is a practical path for foundation models to learn continuously when explicit reward functions are unavailable. This could impact how large models are updated for evolving capabilities.

18pts

Zeliteleforce