Pre-trained language models have been one of the most important breakthroughs in the recent years of deep learning.
What models are used in super large-scale language tasks?
Thread👇
Pre-trained language models are trained in massive text datasets.
Thanks to transformer architectures, we can implement pre-trained language models adapted to specific tasks. For example, question-answering or language modeling.
2/⬇️
Transformers opened the door to a new era of innovation in NLU. And the attention mechanism used in transformers = one of the most impactful developments in the last years of ML.
Researches from @MSFTResearch moved further. They introduced one of the first generative models that could be used in super large-scale language tasks. It was @ChunyuanLi@icaruszyz@JianfengGao0217 Xiang Gao, Yuan Li, Xiujun Li, Baolin Peng.
4/⬇️
They called their model Optimus.
Optimus attempts to combine large pre-trained language models (language understanding) and generation tasks in a very clever architecture using generative models.
5/⬇️
The Optimus architecture includes a BERT-based encoder and a GPT-2-based decoder.
From that perspective, Optimus = variational auto-encoder (VAE) architecture that combines encoder and decoder layers.
6/⬇️
The full paper about Optimus is called "Optimus: Organizing sentences via pre-trained modeling of a latent space".
If you'd like to find a bite-sized, highly concentrated overview of this paper click the link below. You'll move to TheSequence Edge#7, our educational newsletter. thesequence.substack.com/p/edge7
8/8
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The Adversarial Robustness Toolbox (ART) = framework that uses generative adversarial neural networks (GANs) to protect deep learning models from security attacks
Thread⬇️
GANs = the most popular form of generative models.
GAN-based attacks:
+White Box Attacks: The adversary has access to the training environment, knowledge of the training algorithm
+Black Box Attacks: The adversary has no additional knowledge
2/⬇️
The goal of ART = to provide a framework to evaluate the robustness of a neural network.
The current version of ART focuses on four types of adversarial attacks:
+evasion
+inference
+extraction
+poisoning
3/⬇️
🤖@Uber Ludwig = Open Source Framework for Creating ML Models Without Writing Any Code.
To use Ludwig all you need is a data file with the inputs attributes and the desired outputs, Ludwig does the result.
Thread🧵👇
The main innovation behind Ludwig = idea of data-type specific encoders and decoders. Ludwig uses specific encoders and decoders for any given data type supported.
2/6⬇️
Ludwig is based on a series of principles:
+No Coding Required
+Generality
+Flexibility
+Extensibility
+Interpretability
3/6⬇️
The centralized nature of AI makes it difficult for startups to compete with the large tech incumbents that have access to:
+massive datasets
+virtually unlimited computing resources
+world-class research talent
Decentralized AI is the key
Thread⬇️
The research in decentralized ML is nothing new and can be traced back to the late 1970s
But the space has caught new momentum w/ blockchains and distributed ledger technologies
2/⬇️
However, blockchains are not the only technology trend influencing decentralized ML
Decentralized ML has benefited from:
+Blockchains
+Federated Learning
+Private ML
3/⬇️