SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationsHao DuBo Wuet al.2025CVPR 2025Conference paper
Selective Prompting Tuning for Personalized Conversations with LLMsQiushi HuangXubo Liuet al.2024ACL 2024Paper
Learning from Children: Improving Image-Caption Pretraining via CurriculumHammad AyyubiRahul Lokeshet al.2023ACL 2023Paper
Scene Graph Refinement Network for Visual Question AnsweringTianwen QianJingjing Chenet al.2023IEEE TMMPaper
STAR: A Benchmark for Situated Reasoning in Real-World VideosBo WuShoubin Yuet al.2021NeurIPS 2021Conference paper