Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment LearningPaper·Qinghao Ye, Xianhan Zeng, Fu Li, Chunyuan Li, Haoqi Fan·3/10/2025Source ↗
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation ModelPaper·Lixue Gong, Xiaoxia Hou, Fanshi Li, Liang Li, Xiaochen Li…·3/10/2025Source ↗
VideoWorld: Exploring Knowledge Learning from Unlabeled VideosPaper·Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang…·3/5/2025Source ↗
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsPaper·Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, We…·3/4/2025Source ↗
Let the Code LLM Edit Itself When You Edit the CodePaper·Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhan…·3/4/2025Source ↗
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language ModelPaper·Jiawei Chen, Wentao Chen, Jing Su, Jingjing Xu, Hongyu Li…·3/3/2025Source ↗
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual TokensPaper·Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng,…·3/3/2025Source ↗
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning TasksPaper·Kaijing Ma, Xinrun Du, Yunran Wang, Haoran Zhang, Zhoufut…·3/1/2025Source ↗
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence InferencePaper·Xunhao Lai, Jianqiao Lu, Yao Luo, Yiyuan Ma, Xun Zhou·2/28/2025Source ↗
Towards Semantic Equivalence of Tokenization in Multimodal LLMPaper·Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zh…·2/26/2025Source ↗
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANsPaper·Yihong Luo, Xiaolong Chen, Xinghua Qu, Tianyang Hu, Jing …·2/25/2025Source ↗
MagicArticulate: Make Your 3D Models Articulation-ReadyPaper·Chaoyue Song, Jianfeng Zhang, Xiu Li, Fan Yang, Yiwen Che…·2/18/2025Source ↗
Ultra-Sparse Memory NetworkPaper·Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu, Yutao Z…·2/6/2025Source ↗
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot VideosPaper·Mingfei Han, Linjie Yang, Xiaojun Chang, Lina Yao, Heng Wang·2/5/2025Source ↗
UI-TARS: Pioneering Automated GUI Interaction with Native AgentsPaper·Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao L…·1/21/2025Source ↗
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsPaper·Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Z…·1/16/2025Source ↗
Magic-Boost: Boost 3D Generation with Multi-View Conditioned DiffusionPaper·Fan Yang, Jianfeng Zhang, Yichun Shi, Bowen Chen, Chenxu …·1/9/2025Source ↗
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMsPaper·Zhihan Liu, Shenao Zhang, Yongfei Liu, Boyi Liu, Yingxian…·12/10/2024Source ↗
MaskBit: Embedding-free Image Generation via Bit TokensPaper·Mark Weber, Lijun Yu, Qihang Yu, Xueqing Deng, Xiaohui Sh…·12/8/2024Source ↗
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive HashingPaper·Xiaonan Nie, Qibin Liu, Fangcheng Fu, Shenhan Zhu, Xupeng…·11/13/2024Source ↗