Tweet

@TheSequenceAI

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @TheSequenceAI

TheSequence

@TheSequenceAI

5 Nov

@DeepMind

🔥2 New Super Models to Handle Any Type of Dataset

We build models optimized for a specific type of dataset like:
- text
- audio
- computer vision
- etc.

Is it possible to create a general model? @DeepMind unveils the answer⬇️
1/5

Recently, DeepMind published two papers about general-purpose architectures that can process different types of input datasets.

1) Perceiver supports any kind of input
2) Perceiver IO supports any kind of output

More⬇️

Perceivers can handle new types of data with only minimal modifications.

They process inputs using domain-agnostic Transformer-style attention.

Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark.
3/5

Read 5 tweets

TheSequence

@TheSequenceAI

4 Nov

3 reasons why you need to outsource data labeling.

1) Teams want to invest time in ML models, not in data-centric operations

2) You care about the amount and quality of labeled data

3) The entire data annotation process involves a lot of steps
⬇️

1) The majority of the time invested in an AI project is allotted to data-centric operations

Data labeling methods keep being increasingly important to the success of ML solutions.

The process can be overwhelming. Especially for startups and small companies.

2) You care about the amount and quality of labeled data

The success of supervised learning depends extensively on these parameters.

Labels guide the ML model in the right direction such that it can classify unseen samples accurately.

Read 5 tweets

TheSequence

@TheSequenceAI

4 Nov

@DeepMind

Transformers pioneered the principle of attention mechanisms to access past information.

However, most Transformer models discard older memories to prioritize more recent activations.

@DeepMind's Compressive Transformer tackles that problem.
1/4

The Compressive Transformer tries to imitate the process of consolidating memories.

Under that approach, previous activations are compacted into a "compressed memory" that can be used in long-range tasks.
2/4

Compressive Transformer was evaluated against state-of-the-art memory models using WikiText-103  and  Enwik8. 

In both cases, it showed significant improvements over more established models both in memory and efficiency.
3/4

Read 4 tweets

TheSequence

@TheSequenceAI

30 Oct

@OpenAI

.@OpenAI ImageGPT is one of the first transformer architectures applied to computer vision scenarios.👇

In language, unsupervised learning algorithms that rely on word prediction (like GPT-2 and BERT) are extremely successful.

One possible reason for this success is that instances of downstream language tasks appear naturally in the text.
2/4

In contrast, sequences of pixels do not clearly contain labels for the images they belong to.

However, OpenAI believes that sufficiently large transformer models:
- could be applied to 2D image analysis
- learn strong representations of a dataset
3/4

Read 4 tweets

TheSequence

@TheSequenceAI

29 Oct

@AmazonScience

Forecasting high-dimensional time series plays a crucial role in many applications like:
- demand forecasting
- financial predictions

You can use @AmazonScience's DeepGLO for these problems.⬇️

The challenge with multi-dimensional time-series datasets is a serious one.

1) Traditional methods (like ARIMA) can't scale to large datasets with millions of time series.

2) Deep neural networks have been proven to handle scalability more effectively. BUT⬇️

BUT many deep neural nets:

- only forecast values from the same dimension
- require different time series to be normalized on a single scale

DeepGLO addresses these challenges.
3/6

Read 6 tweets

TheSequence

@TheSequenceAI

29 Oct

@allen_ai

There are a handful of frameworks to implement basic NLP.

And what about implementing models like BERT or GPT-3? A framework that does not require monumental development efforts.

@allen_ai created one for you. It's AllenNLP.⬇️

AllenNLP provides a simple & modular programming model for:

1. Applying advanced deep learning techniques to NLP research
2. Streamlining the creation of NLP experiments
3. Abstracting the core building blocks of NLP models

2/5

Portfolio of NLP tasks under AllenNLP:

- Text Generation
- Language Modeling
- Multiple Choice
- Pair Classification
- Structured Prediction
- Sequence Tagging
- Text + vision
3/5

Read 5 tweets

Share this page!

TheSequence

Try unrolling a thread yourself!

More from @TheSequenceAI

TheSequence

TheSequence

TheSequence

TheSequence

TheSequence

TheSequence

Did Thread Reader help you today?

Like this author's thread?