Interpretable and globally optimal prediction for textual grounding using image conceptsRaymond YehJinjun Xionget al.2017NeurIPS 2017
Learning motion in feature space: Locally-consistent deformable convolution networks for fine-grained action detectionKhoi Nguyen MacDhiraj Joshiet al.2019ICCV 2019