5. "VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text" by @AkbariH70 Liangzhe Yuan @RuiQian3 Wei-Hong Chuang, Shih-Fu Chang, @YinCui1@BoqingGo.
Google just published the paper about Gecko, their new text embedding model that punches way above its weight, with its 768-dim vectors being competitive with models that have 7x more parameters and 5x larger embeddings.
Here is a thread with our key takeaways: 🧵🦎
Gecko relies on knowledge distillation from LLMs in the form of synthetic queries, similar to previous work such as InPars and Promptagator.
They propose a two-step approach, where (1) a query is generated given a task description and a passage, and (2) an LLM reranks the top-N retrieved passages, using the highest-scoring one as the positive and the lowest as the negative.
Two LLM-based ranking functions are used:
- Query Likelihood: how likely is the query (ie, its perplexity for the LLM) given the passage
- Relevance Classification: what's the probability of the "true" token, given the query-passage pair (à la MonoT5)
The two lists are combined using Reciprocal Rank Fusion, and the final order is used to pick the positive and negative pair for each synthetic query.
Big investments, more diffusion models applications, FLAN-T5 from @GoogleAI , Neural Audio Compression from @MetaAI , Single Life RL, and much more by @SergiCastellaSa 👇
Company news, future, a little history on Transformers, and an overview of the workshop!
2/9 "From Transformers to Work: Advances in Neural Search" by Marzieh Fadaee
Research at @ZetaVector: Leveraging LMs to generate training data? What about Multilingual data? In-domain vs. out-of-domain evaluation? Distilled models vs. teacher models?
If the show still doesn't show up wherever you get your pods please reach out and we'll try to make sure it's available there (some providers might take a bit longer to publish it)
In his @NVIDIAGTC keynote Jensen Huang demonstrates @NVIDIAAI leading position in powering the AI ecosystem in R&D, enterprise and edge computing, with a zillion new announcements. Here's a few notable ones.
Graph Neural Network acceleration with CUDA-X.
Nemo Megatron allows training GPT-3 scale Large Language Models on distributed hardware.
At #EMNLP2021 Evelina Fedorenko makes a strong case to defuse criticism that neural language models cannot "think". Neither can the human language modules in the brain, she argues, based on human brain studies. #EMNLP2021livetweet
In contrast, due it's predictive coding nature, language is inherently very well-suited to communication. #EMNLP2021livetweet
As far as human brain studies suggest, language is *not suitable for complex thought*, Fedorenko concludes her keynote at #EMNLP2021, as she outlines her future research. #EMNLP2021livetweet