FlexAttention for Efficient High-Resolution Vision-Language ModelsJunyan LiDelin Chenet al.2024ECCV 2024Conference paper
3D-LLM: Injecting the 3D World into Large Language ModelsYining HongHaoyu Zhenet al.2023NeurIPS 2023Conference paper
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical ReasoningYining HongLi Yiet al.2021NeurIPS 2021Conference paper