Alessandro Suglia Profile picture
Assistant Professor @HeriotWattUni/@NRobotarium & Head of Visual Dialogue at @helloalana; PhD @EDINRobotics; Former Research Intern @MetaAI and @AmazonScience.

Nov 10, 2022, 8 tweets

Happy to release GOAL ⚽️, a multimodal dataset based on football highlights that includes 1) videos, 2) human transcriptions, and 3) Wikidata-based KB with statistics about players and teams for every match.

arxiv.org/abs/2211.04534

Interested? Read the thread below! #NLProc

[1/7] Previous video benchmarks consider movies or TV series that typically involve scripted interaction between characters instead of visually grounded language. On the other hand, in GOAL we focus on football commentaries because they involve visually grounded language

[2/7]: GOAL pushes the boundaries of current multimodal models because it requires the encoding of 1) videos; 2) commentary; 3) KB information. All these elements are essential when generating a sound and coherent commentary for a football video.

[3/7]: GOAL contains highlights derived from several football leagues (e.g., Serie A, Premier League, etc.), professional human transcriptions, as well as information about the match derived from Wikidata. Just like real commentators, models should use all these modalities.

[4/7]: We set up a multi-faceted evaluation based on several tasks: 1) commentary retrieval, 2) frame reordering, 3) moment retrieval and 4) commentary generation. We use the HERO model for our evaluation and demonstrate that the commentary generation task is really challenging.

[5/7]: We analysed both HERO and BART predictions when considering just the language or the multimodal context as a reference. This highlighted the poor visual grounding ability of this model. We also show that it’s important to incorporate the KB information to boost performance

[6/7]: GOAL can also serve as resource a resource for visual context-aware speech recognition; multi-modal fact-checking, and multi-modal activity recognition. Additionally, GOAL represents an interesting benchmark for models that directly solve the task using audio information.

[7/7]: This work was the last chapter of my PhD thesis and is a result of a great collaboration with @zedavid @andrea_vanzo @emabastiano @verena_rieser @sinantie @MNikandrou Lu Yu and @shubhamag1992

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling