Multimodal
Awesome Multimodal Machine Learning: From Video Understanding to Vibe Coding
A curated, high-quality list of must-read papers and resources tracing the evolution of Multimodal Machine Learning. This repository covers the foundational shift from Video Understanding and Generative Video (Diffusion/Autoregressive) to the frontiers of UX/GUI Design Agents and Vibe Coding. Whether you are looking for landmark papers in CLIP-based alignment or the latest in vision-language-action (VLA) models for interface interaction, this list provides a structured roadmap through the most influential research in the field.
- Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, Xinchao Wang20257,414 checkouts
- Hao Tang, Chengchao Shen20255,147 checkouts
- Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang, Jiapeng Luo, Yi Wang, Conghui He, Botian Shi, Xingcheng Zhang, Wenqi Shao, Junjun He, Yingtong Xiong, Wenwen Qu, Peng Sun, Penglong Jiao, Han Lv, Lijun Wu, Kaipeng Zhang, Huipeng Deng, Jiaye Ge, Kai Chen, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang20259,869 checkouts
- Dongchen Lu, Yuyao Sun, Zilu Zhang, Leping Huang, Jianliang Zeng, Mao Shu, Huo Cao20258,078 checkouts
- Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, Raffay Hamid, Bing Yin, Trishul Chilimbi20255,956 checkouts
- Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski20259,707 checkouts
- Yi Wang, Xinhao Li, Ziang Yan, Yinan He, Jiashuo Yu, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang20256,976 checkouts
- Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu20255,649 checkouts
- Jiaxing Zhao, Boyuan Sun, Xiang Chen, Xihan Wei, Qibin Hou20258,495 checkouts
- Haobo Yuan, Xiangtai Li, Tao Zhang, Yueyi Sun, Zilong Huang, Shilin Xu, Shunping Ji, Yunhai Tong, Lu Qi, Jiashi Feng, Ming-Hsuan Yang20257,418 checkouts
- Tianyu Fu, Tengxuan Liu, Qinghao Han, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang20256,611 checkouts
- Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, Xide Xia20247,681 checkouts
- Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab, Aashu Singh, Qifan Wang, David Yang, ShengYun Peng, Hanchao Yu, Shen Yan, Xuewen Zhang, Baosheng He20247,939 checkouts
- Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yiming Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Kaipeng Zhang, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang20255,943 checkouts
- Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Jongwoo Park, Kanchana Ranasinghe, Silvio Savarese, Ran Xu, Caiming Xiong, Juan Carlos Niebles20255,335 checkouts
- Renjie Pi, Jianshu Zhang, Tianyang Han, Jipeng Zhang, Rui Pan, Tong Zhang20246,143 checkouts
- Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li2025
- Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu20259,986 checkouts
- Li Zhou, Xu Yuan, Zenghui Sun, Zikun Zhou, Jingsong Lan20248,246 checkouts
- Yuan Liu, Zhongyin Zhao, Ziyuan Zhuang, Le Tian, Xiao Zhou, Jie Zhou20248,821 checkouts
FAQ
What is Multimodal?
Multimodal is an expert-curated awesome list on Attendemia that groups high-signal resources for fast learning. Items are reviewed and refreshed over time, so readers can start with a practical shortlist instead of searching across fragmented sources and low-context recommendation threads.
How are items ranked here?
Items are ranked using maintainer curation, content quality notes, engagement momentum, and freshness indicators. This ranking method keeps the top of the awesome list actionable for current workflows, while still preserving evergreen references that are widely cited and useful for deeper technical understanding.
Can I follow this list?
Yes. Use the follow button near the page header to receive update visibility when new resources are added or promoted. Following this list helps you monitor changes without rechecking manually and keeps your learning feed aligned with this specific topic over time.