SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationsHao DuBo Wuet al.2025CVPR 2025
Selective Prompting Tuning for Personalized Conversations with LLMsQiushi HuangXubo Liuet al.2024ACL 2024
Learning from Children: Improving Image-Caption Pretraining via CurriculumHammad AyyubiRahul Lokeshet al.2023ACL 2023
Scene Graph Refinement Network for Visual Question AnsweringTianwen QianJingjing Chenet al.2023IEEE TMM