Publications

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situations