Tweet

@TolokaAI

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @TheSequenceAI

TheSequence

@TheSequenceAI

5 Nov

@DeepMind

🔥2 New Super Models to Handle Any Type of Dataset

We build models optimized for a specific type of dataset like:
- text
- audio
- computer vision
- etc.

Is it possible to create a general model? @DeepMind unveils the answer⬇️
1/5

Recently, DeepMind published two papers about general-purpose architectures that can process different types of input datasets.

1) Perceiver supports any kind of input
2) Perceiver IO supports any kind of output

More⬇️

Perceivers can handle new types of data with only minimal modifications.

They process inputs using domain-agnostic Transformer-style attention.

Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark.
3/5

Read 5 tweets

TheSequence

@TheSequenceAI

4 Nov

@DeepMind

Transformers pioneered the principle of attention mechanisms to access past information.

However, most Transformer models discard older memories to prioritize more recent activations.

@DeepMind's Compressive Transformer tackles that problem.
1/4

The Compressive Transformer tries to imitate the process of consolidating memories.

Under that approach, previous activations are compacted into a "compressed memory" that can be used in long-range tasks.
2/4

Compressive Transformer was evaluated against state-of-the-art memory models using WikiText-103  and  Enwik8. 

In both cases, it showed significant improvements over more established models both in memory and efficiency.
3/4

Read 4 tweets

TheSequence

@TheSequenceAI

30 Oct

@OpenAI

.@OpenAI ImageGPT is one of the first transformer architectures applied to computer vision scenarios.👇

In language, unsupervised learning algorithms that rely on word prediction (like GPT-2 and BERT) are extremely successful.

One possible reason for this success is that instances of downstream language tasks appear naturally in the text.
2/4

In contrast, sequences of pixels do not clearly contain labels for the images they belong to.

However, OpenAI believes that sufficiently large transformer models:
- could be applied to 2D image analysis
- learn strong representations of a dataset
3/4

Read 4 tweets

TheSequence

@TheSequenceAI

29 Oct

@AmazonScience

Forecasting high-dimensional time series plays a crucial role in many applications like:
- demand forecasting
- financial predictions

You can use @AmazonScience's DeepGLO for these problems.⬇️

The challenge with multi-dimensional time-series datasets is a serious one.

1) Traditional methods (like ARIMA) can't scale to large datasets with millions of time series.

2) Deep neural networks have been proven to handle scalability more effectively. BUT⬇️

BUT many deep neural nets:

- only forecast values from the same dimension
- require different time series to be normalized on a single scale

DeepGLO addresses these challenges.
3/6

Read 6 tweets

TheSequence

@TheSequenceAI

29 Oct

@allen_ai

There are a handful of frameworks to implement basic NLP.

And what about implementing models like BERT or GPT-3? A framework that does not require monumental development efforts.

@allen_ai created one for you. It's AllenNLP.⬇️

AllenNLP provides a simple & modular programming model for:

1. Applying advanced deep learning techniques to NLP research
2. Streamlining the creation of NLP experiments
3. Abstracting the core building blocks of NLP models

2/5

Portfolio of NLP tasks under AllenNLP:

- Text Generation
- Language Modeling
- Multiple Choice
- Pair Classification
- Structured Prediction
- Sequence Tagging
- Text + vision
3/5

Read 5 tweets

TheSequence

@TheSequenceAI

27 Oct

3 big AI industry insights🔥

1) Companies are big spenders on AI but lack confidence
2) AI is a cloud-native world
3) Budgets are growing, despite challenges

Fascinating details👀⬇️

1) Big spenders, but a lack of confidence

- 38% of companies have a budget of more than $1M per year for AI infrastructure alone!

- However, for 77% of companies, less than half of models make it to production

@kubernetesio

3) AI is a cloud-native world

- 81% of companies use containers and cloud technologies for their AI workloads

- Nearly 1/2 of them are using @kubernetesio

=> AI is a leader in cloud-native adoption

Read 5 tweets

Share this page!

TheSequence

Try unrolling a thread yourself!

More from @TheSequenceAI

TheSequence

TheSequence

TheSequence

TheSequence

TheSequence

TheSequence

Did Thread Reader help you today?

Like this author's thread?