Learning Compact Vision Tokens for Efficient Large Multimodal Models
Paper • Jun 8, 2025 • arxiv.org • Hao Tang, Chengchao Shen
Large multimodal models (LMMs) suffer significant computational challenges due to the high cost of Large Language Models (LLMs) and the quadratic complexity of processing long vision token sequence...