← Home

Quick answer

AI Summary: Bridges the gap between vision and touch by distilling force-feedback into standard VLA architectures, enabling better handling of occluded or delicate objects.

Claim

FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation

Authors
Ruiteng Zhao·
Wenshuo Wang·
Marcelo H. Ang Jr.·
Haiyue Zhu

ABSTRACT

Current VLA models primarily rely on visual feedback, which is insufficient for contact-rich tasks like precision assembly or handling delicate objects. We introduce FD-VLA, a force-distilled framework that integrates haptic and force-torque data into the action-prediction loop. By distilling force-sensing capabilities from a teacher model trained in simulation into a student VLA model, we enable 'blind' manipulation success where visual occlusion occurs. Our results show a 34% improvement in task success for tight-tolerance peg-in-hole assembly under varying lighting and occlusion conditions.

Review Snapshot

Explore ratings

4.7
★★★★★
3 ratings
5 star
67%
4 star
33%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful