The RealHumanEval: Evaluating Large Language Models’ Abilities to Support ProgrammersHussein MozannarValerie Chenet al.2025TMLR
Video-text compliance: Activity verification based on natural language instructionsMayoore JaiswalFrank Liuet al.2019ICCVW 2019