Best Multimodal Papers

The highest-signal papers on Multimodal, ranked by community reviews and momentum.
Canonical intent: topic=multimodal|type=paper|year=evergreen

Explore TopicAwesome ListsResearch Atlas

Top Picks

4
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu
Mar 20, 2025·9986 checkouts·arxiv.org
Source ↗
10
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang, Jiapeng Luo, Yi Wang, Conghui He, Botian Shi, Xingcheng Zhang, Wenqi Shao, Junjun He, Yingtong Xiong, Wenwen Qu, Peng Sun, Penglong Jiao, Han Lv, Lijun Wu, Kaipeng Zhang, Huipeng Deng, Jiaye Ge, Kai Chen, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang
Apr 19, 2025·9869 checkouts·arxiv.org
Source ↗
26
Baichuan-Omni Technical Report
Yadong Li, Haoze Sun, Mingan Lin, Tianpeng Li, Guosheng Dong, Tao Zhang, Bowen Ding, Wei Song, Zhenglin Cheng, Yuqi Huo, Song Chen, Xu Li, Da Pan, Shusen Zhang, Xin Wu, Zheng Liang, Jun Liu, Keer Lu, Yaqi Zhao, Yanjun Shen, Fan Yang, Kaicheng Yu, Tao Lin, Jianhua Xu, Zenan Zhou, Weipeng Chen
Dec 27, 2024·9380 checkouts·arxiv.org
Source ↗

FAQ

How is this “best Multimodal Papers” collection ranked?

This page ranks Multimodal Papers using topic relevance, checkout momentum, source diversity, and freshness signals. Rankings are recalculated as new items and engagement arrive, so readers see resources that are both high quality and currently useful for implementation, research, and practical decision making. Canonical intent key: topic=multimodal|type=paper|year=evergreen.

How do you prevent duplicate collection pages?

Attendemia maps each slug variant, including best-of and year forms, to one canonical intent key. If two URLs describe the same topic, type, and timeframe, non-canonical versions permanently redirect. This consolidates crawl signals, avoids duplicate content dilution, and helps search engines index the strongest single page.

When does a year page stay separate from evergreen?

A year-specific page stays separate only when its item set is materially different from evergreen and has enough ranking depth. When overlap is high, the year URL redirects to the evergreen canonical page. This avoids thin duplication while preserving genuinely distinct annual collections for search users.

Are these paid recommendations?

No. These recommendations are not paid placements. Attendemia ranks items from public metadata, source quality coverage, and user engagement signals, then orders them by practical usefulness. Sponsorship does not buy rank position, so this page should be interpreted as editorial curation rather than advertising inventory.