Quick answer
AI Summary: Bridges the gap between vision and touch by distilling force-feedback into standard VLA architectures, enabling better handling of occluded or delicate objects.
AI Summary: Bridges the gap between vision and touch by distilling force-feedback into standard VLA architectures, enabling better handling of occluded or delicate objects.
Current VLA models primarily rely on visual feedback, which is insufficient for contact-rich tasks like precision assembly or handling delicate objects. We introduce FD-VLA, a force-distilled framework that integrates haptic and force-torque data into the action-prediction loop. By distilling force-sensing capabilities from a teacher model trained in simulation into a student VLA model, we enable 'blind' manipulation success where visual occlusion occurs. Our results show a 34% improvement in task success for tight-tolerance peg-in-hole assembly under varying lighting and occlusion conditions.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation.