Helping

Helping computer vision and language models understand what they see | MIT News

December 9, 2023
Ai News
computer
Helping
language
MIT
models
News
understand
vision

Powerful machine-learning algorithms known as vision and language models, which learn to match text with images, have shown remarkable results when asked to generate captions or summarize videos. While these models excel at identifying objects, they often struggle to understand concepts, like object attributes or the arrangement of items in a scene. For instance, a…