We build models optimized for a specific type of dataset like:
- text
- audio
- computer vision
- etc.
Is it possible to create a general model? @DeepMind unveils the answer⬇️ 1/5
Recently, DeepMind published two papers about general-purpose architectures that can process different types of input datasets.
1) Perceiver supports any kind of input 2) Perceiver IO supports any kind of output
More⬇️
Perceivers can handle new types of data with only minimal modifications.
They process inputs using domain-agnostic Transformer-style attention.
Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark.
3/5
Perceiver IO achieves strong results on tasks with highly structured output spaces, such as:
- natural language
- visual understanding
- StarCraft II
- multi-task and multi-modal domains.
4/5
Perceiver outperforms strong, specialized models on classification tasks across various modalities:
- images
- point clouds
- audio
- video
- video+audio
If you are curious about general-purpose architectures, here is the link for you: github.com/deepmind/deepm…
5/5
• • •
Missing some Tweet in this thread? You can try to
force a refresh
.@OpenAI ImageGPT is one of the first transformer architectures applied to computer vision scenarios.👇
In language, unsupervised learning algorithms that rely on word prediction (like GPT-2 and BERT) are extremely successful.
One possible reason for this success is that instances of downstream language tasks appear naturally in the text.
2/4
In contrast, sequences of pixels do not clearly contain labels for the images they belong to.
However, OpenAI believes that sufficiently large transformer models:
- could be applied to 2D image analysis
- learn strong representations of a dataset
3/4
AllenNLP provides a simple & modular programming model for:
1. Applying advanced deep learning techniques to NLP research 2. Streamlining the creation of NLP experiments 3. Abstracting the core building blocks of NLP models
2/5
Portfolio of NLP tasks under AllenNLP:
- Text Generation
- Language Modeling
- Multiple Choice
- Pair Classification
- Structured Prediction
- Sequence Tagging
- Text + vision
3/5