🔥2 New Super Models to Handle Any Type of Dataset

We build models optimized for a specific type of dataset like:
- text
- audio
- computer vision
- etc.

Is it possible to create a general model? @DeepMind unveils the answer.
1/7
Recently, DeepMind published two papers about general-purpose architectures that can process different types of input datasets.

1) Perceiver supports any kind of input
2) Perceiver IO supports any kind of output

2/7
Perceivers can handle new types of data with only minimal modifications.

They process inputs using domain-agnostic Transformer-style attention.

Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark.
3/7
Unlike Transformers, Perceivers first map inputs to a small latent space where processing is cheap and doesn't depend on the input size.

See the architectures of both Perceiver (pic 1) and Perceiver IO (pic 2).
4/7
Results:

Perceiver outperforms strong, specialized models on classification tasks across various modalities:
- images
- point clouds
- audio
- video
- video+audio.
5/7
Perceiver IO achieves strong results on tasks with highly structured output spaces, such as:
- natural language
- visual understanding
- StarCraft II
- multi-task and multi-modal domains.
6/7
Thanks for learning ML and AI with us!

If you are curious about general-purpose architectures, here is the link for you: github.com/deepmind/deepm…

Share this thread with your friends and spread the open ML knowledge!
7/7

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with TheSequence

TheSequence Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TheSequenceAI

4 Dec
. @OpenAI ImageGPT is one of the first transformer architectures applied to computer vision scenarios.
1/4
In language, unsupervised learning algorithms that rely on word prediction (like GPT-2 and BERT) are extremely successful.

One possible reason for this success is that instances of downstream language tasks appear naturally in the text.
2/4
In contrast, sequences of pixels do not clearly contain labels for the images they belong to.

However, OpenAI believes that sufficiently large transformer models:
- could be applied to 2D image analysis
- learn strong representations of a dataset
3/4
Read 4 tweets
24 Nov
What are the most intriguing areas of deep learning?

Let's use the examples of the biggest AI labs in the world. What do they believe is key to the near future of AI?

7 companies and the techniques they're focused on.
⬇️
1. @DeepMind — Reinforcement Learning

Evolution:
1) AlphaGo mastered Go
2) AlphaGo-Zero learned Go without knowing the rules
3) AlphaZero mastered perfect-information environments: Chess & Shogi
3) MuZero masters complex environments: Atari
2. @facebookai — Self-Supervised Learning

Examples:
1) DINO discovers and segments objects in an image or a video with absolutely no supervision
2) HuBERT learns both acoustic and language models from continuous inputs
Read 7 tweets
6 Nov
@parlai_parley = a unified framework for sharing, training, and testing dialog models.

5 ParlAI's features you should know about

1) 100+ popular datasets
2) Vast model zoo
3) Useful integrations
4) Wide set of reference models
5) Interactive Colab tutorial
1) 100+ popular datasets available all in one place
ParlAI gives you access to many datasets from recent research papers or collected by researchers.

2) Vast Model zoo
A long list of pre-trained models by task.

And so many tasks they can perform.
3) Useful integrations

- Amazon Mechanical Turk for data collection and human evaluation

- Facebook Messenger to connect agents with humans in a chat interface
Read 5 tweets
5 Nov
🔥2 New Super Models to Handle Any Type of Dataset

We build models optimized for a specific type of dataset like:
- text
- audio
- computer vision
- etc.

Is it possible to create a general model? @DeepMind unveils the answer⬇️
1/5
Recently, DeepMind published two papers about general-purpose architectures that can process different types of input datasets.

1) Perceiver supports any kind of input
2) Perceiver IO supports any kind of output

More⬇️
Perceivers can handle new types of data with only minimal modifications.

They process inputs using domain-agnostic Transformer-style attention.

Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark.
3/5
Read 5 tweets
4 Nov
3 reasons why you need to outsource data labeling.

1) Teams want to invest time in ML models, not in data-centric operations

2) You care about the amount and quality of labeled data

3) The entire data annotation process involves a lot of steps
⬇️
1) The majority of the time invested in an AI project is allotted to data-centric operations

Data labeling methods keep being increasingly important to the success of ML solutions.

The process can be overwhelming. Especially for startups and small companies.
2) You care about the amount and quality of labeled data

The success of supervised learning depends extensively on these parameters.

Labels guide the ML model in the right direction such that it can classify unseen samples accurately.
Read 5 tweets
4 Nov
Transformers pioneered the principle of attention mechanisms to access past information.

However, most Transformer models discard older memories to prioritize more recent activations.

@DeepMind's Compressive Transformer tackles that problem.
1/4
The Compressive Transformer tries to imitate the process of consolidating memories.

Under that approach, previous activations are compacted into a "compressed memory" that can be used in long-range tasks.
2/4
Compressive Transformer was evaluated against state-of-the-art memory models using WikiText-103  and  Enwik8. 

In both cases, it showed significant improvements over more established models both in memory and efficiency.
3/4
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(