← Home

Quick answer

AI Summary: Reduces 'distributional drift' in LLMs, ensuring the model's behavior matches fine-tuning objectives more closely.

Claim

Towards On-Policy SFT: Distribution Discriminant Theory and its Applications

Authors
Miaosen Zhang·
Xu Yang·
Qi Dai·
Chong Luo

ABSTRACT

Supervised fine-tuning (SFT) is efficient but often yields inferior generalization compared to RL, a gap driven by RL's use of on-policy data. We propose a framework to bridge this chasm by enabling On-Policy SFT. We present Distribution Discriminant Theory (DDT), which quantifies the alignment between data and the model-induced distribution. We introduce two techniques: In-Distribution Finetuning (IDFT), a loss-level method, and Hinted Decoding, which re-aligns the training corpus to the model's distribution. Experiments demonstrate that our framework achieves generalization performance on par with DPO and SimPO while maintaining the computational efficiency of a standard SFT pipeline.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Towards On-Policy SFT: Distribution Discriminant Theory and its Applications.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Towards On-Policy SFT: Distribution Discriminant Theory and its Applications | Attendemia