Learning Consistent Temporal Grounding between Related Tasks in Sports Coaching — Quantapedia
Video-LLMs often attend to irrelevant frames, which is especially detrimental for sports coaching tasks requiring precise temporal grounding. Yet obtaining frame-level supervision is challenging: expe