Quick answer
AI Summary: Uses 'scene imagination' to allow VLMs to navigate complex homes without pre-existing maps, relying on visual prediction instead of text-based planning.
AI Summary: Uses 'scene imagination' to allow VLMs to navigate complex homes without pre-existing maps, relying on visual prediction instead of text-based planning.
Visual navigation in home environments often fails because textual planning cannot capture scene geometry. We propose ImagineNav++, which uses a VLM to 'imagine' future viewpoints from candidate robot views, turning navigation into a 'best-view' selection problem. Our Selective Foveation Memory mechanism integrates keyframe observations into a compact representation for long-term spatial reasoning. ImagineNav++ achieves SOTA performance in mapless settings, outperforming most traditional map-based methods.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for ImagineNav++: Prompting VLMs as Embodied Navigator through Scene Imagination.