Abstract: Over the past year, a large body of multimodal research has emerged around zero-shot evaluation using GPT de-scriptors. These studies boost the zero-shot accuracy of pretrained VL models ...