DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Paper • Dec 10, 2024 • arxiv.org • Zhihan Liu, Shenao Zhang, Yongfei Liu, Boyi Liu, Yingxiang Yang, Zhaoran Wang
Direct preference learning offers a promising and computation-efficient beyond supervised fine-tuning (SFT) for improving code generation in coding large language models (LMs). However, the scarcit...