Quick answer
AI Summary: MiRA introduces a 'divide and conquer' strategy for multimodal tasks by using three distinct agents for vision, text, and final judgment. By forcing the visual and textual agents to reason independently before a 'judge' combines their findings, the framework prevents the 'hallucination leakage' often seen in unified multimodal models.