Multimodal
Awesome Multimodal Machine Learning: From Video Understanding to Vibe Coding
A curated, high-quality list of must-read papers and resources tracing the evolution of Multimodal Machine Learning. This repository covers the foundational shift from Video Understanding and Generative Video (Diffusion/Autoregressive) to the frontiers of UX/GUI Design Agents and Vibe Coding. Whether you are looking for landmark papers in CLIP-based alignment or the latest in vision-language-action (VLA) models for interface interaction, this list provides a structured roadmap through the most influential research in the field.
- Chao Liao, Liyang Liu, Xun Wang, Zhengxiong Luo, Xinyu Zhang, Wenliang Zhao, Jie Wu, Liang Li, Zhi Tian, Weilin Huang20258,458 checkouts
- Shanshan Zhao, Xinjie Zhang, Jintao Guo, Jiakui Hu, Lunhao Duan, Minghao Fu, Yong Xien Chng, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang20265,856 checkouts
- Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, Jiaqi Wang20259,061 checkouts
- Chuofan Ma, Yi Jiang, Junfeng Wu, Jihan Yang, Xin Yu, Zehuan Yuan, Bingyue Peng, Xiaojuan Qi20258,754 checkouts
- Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan2025
- Weijia Shi, Xiaochuang Han, Chunting Zhou, Weixin Liang, Xi Victoria Lin, Luke Zettlemoyer, Lili Yu20257,398 checkouts
- Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai20248,451 checkouts
- Chunwei Wang, Guansong Lu, Junwei Yang, Runhui Huang, Jianhua Han, Lu Hou, Wei Zhang, Hang Xu20248,825 checkouts
- Yuying Ge, Yizhuo Li, Yixiao Ge, Ying Shan20248,071 checkouts
- Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai20255,912 checkouts
- Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K. Du, Zehuan Yuan, Xinglong Wu20257,582 checkouts
- Siqi Kou, Jiachun Jin, Zhihong Liu, Chang Liu, Ye Ma, Jian Jia, Quan Chen, Peng Jiang, Zhijie Deng20259,864 checkouts
- Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo2024
- Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, Bowen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, Zhongyuan Wang20249,062 checkouts
- Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou20259,304 checkouts
- Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy20246,292 checkouts
- Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen2024
- Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia20247,189 checkouts
- Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan20255,354 checkouts
- Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yu-Gang Jiang, Xipeng Qiu20255,263 checkouts
FAQ
What is Multimodal?
Multimodal is an expert-curated awesome list on Attendemia that groups high-signal resources for fast learning. Items are reviewed and refreshed over time, so readers can start with a practical shortlist instead of searching across fragmented sources and low-context recommendation threads.
How are items ranked here?
Items are ranked using maintainer curation, content quality notes, engagement momentum, and freshness indicators. This ranking method keeps the top of the awesome list actionable for current workflows, while still preserving evergreen references that are widely cited and useful for deeper technical understanding.
Can I follow this list?
Yes. Use the follow button near the page header to receive update visibility when new resources are added or promoted. Following this list helps you monitor changes without rechecking manually and keeps your learning feed aligned with this specific topic over time.