Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension


Bottom-Up and Bidirectional Alignment for Referring Expression Comprehension



Exploring Logical Reasoning for Referring Expression Comprehension





