SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationsHao DuBo Wuet al.2025CVPR 2025