Quick answer
AI Summary: Shows that investing in better outcome verifiers yields higher performance than scaling actor policies alone.
AI Summary: Shows that investing in better outcome verifiers yields higher performance than scaling actor policies alone.
We investigate test-time verification as a means to shrink the 'intention-action gap' in embodied AI. We characterize the test-time scaling law for embodied instruction following and demonstrate that jointly scaling the number of rephrased instructions and generated actions increases test-time sample diversity. We present CoVer, a contrastive verifier for vision-language-action alignment, and introduce 'boot-time compute'—a hierarchical verification pipeline. Compared to scaling policy pre-training on the same data, our verification approach yields 22% gains in-distribution and a 45% improvement in real-world experiments, suggesting verification is a more compute-efficient path to alignment.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Scaling Verification Can Be More Effective than Scaling Policy Learning for VLA.