Multimodal

Awesome Multimodal Machine Learning: From Video Understanding to Vibe Coding

Grok this topic
machine-learningdeep-learningmultimodal199 items · 0 followers

A curated, high-quality list of must-read papers and resources tracing the evolution of Multimodal Machine Learning. This repository covers the foundational shift from Video Understanding and Generative Video (Diffusion/Autoregressive) to the frontiers of UX/GUI Design Agents and Vibe Coding. Whether you are looking for landmark papers in CLIP-based alignment or the latest in vision-language-action (VLA) models for interface interaction, this list provides a structured roadmap through the most influential research in the field.

Reset
Managed by Attendemia