Multimodal
Awesome Multimodal Machine Learning: From Video Understanding to Vibe Coding
A curated, high-quality list of must-read papers and resources tracing the evolution of Multimodal Machine Learning. This repository covers the foundational shift from Video Understanding and Generative Video (Diffusion/Autoregressive) to the frontiers of UX/GUI Design Agents and Vibe Coding. Whether you are looking for landmark papers in CLIP-based alignment or the latest in vision-language-action (VLA) models for interface interaction, this list provides a structured roadmap through the most influential research in the field.
- Huaying Yuan, Jian Ni, Zheng Liu, Yueze Wang, Junjie Zhou, Zhengyang Liang, Bo Zhao, Zhao Cao, Zhicheng Dou, Ji-Rong Wen20269,263 checkouts
- Ze Liu, Zhengyang Liang, Junjie Zhou, Zheng Liu, Defu Lian20256,306 checkouts
- Yifan Xu, Xinhao Li, Yichun Yang, Desen Meng, Rui Huang, Limin Wang20256,190 checkouts
- Jiangwei Ren, Xingyu Jiang, Zizhuo Li, Dingkang Liang, Xin Zhou, Xiang Bai20259,354 checkouts
- Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang20258,711 checkouts
- Junjie Zhou, Zheng Liu, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao, Chen Jason Zhang, Defu Lian, Yongping Xiong20247,746 checkouts
- Yuanmin Tang, Xiaoting Qin, Jue Zhang, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Ling, Saravan Rajmohan, Dongmei Zhang, Qi Wu20248,514 checkouts
- Yikun Liu, Pingan Chen, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiangchao Yao, Yanfeng Wang, Weidi Xie20249,197 checkouts
- Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi, Jimmy Lin, Bryan Catanzaro, Wei Ping20258,909 checkouts
- Ziyan Jiang, Rui Meng, Xinyi Yang, Semih Yavuz, Yingbo Zhou, Wenhu Chen20257,485 checkouts
- Ting Jiang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang20248,900 checkouts
- Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping20257,372 checkouts
- Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang20248,145 checkouts
- Yimu Wang, Shuai Yuan, Bo Xue, Xiangru Jian, Wei Pang, Mushi Wang, Ning Yu20259,923 checkouts
- Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan20246,807 checkouts
- Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen20235,669 checkouts
- Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol20246,171 checkouts
- Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu20237,059 checkouts
- Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, Samuel Albanie20226,096 checkouts
FAQ
What is Multimodal?
Multimodal is an expert-curated awesome list on Attendemia that groups high-signal resources for fast learning. Items are reviewed and refreshed over time, so readers can start with a practical shortlist instead of searching across fragmented sources and low-context recommendation threads.
How are items ranked here?
Items are ranked using maintainer curation, content quality notes, engagement momentum, and freshness indicators. This ranking method keeps the top of the awesome list actionable for current workflows, while still preserving evergreen references that are widely cited and useful for deeper technical understanding.
Can I follow this list?
Yes. Use the follow button near the page header to receive update visibility when new resources are added or promoted. Following this list helps you monitor changes without rechecking manually and keeps your learning feed aligned with this specific topic over time.