Daily AI Papers Profile picture
Tweets popular AI papers; by @labmlai Chrome extension https://t.co/SyfujSq5df
Mar 16, 2024 β€’ 5 tweets β€’ 2 min read
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking



It trains language models to generate rationales at each token to explain future text, to improve their predictions.

πŸ§΅πŸ‘‡ arxiv.org/abs/2403.09629Image
Image
Image
Image
2/4) These rationales (thoughts) help the model predict tokens that are otherwise difficult to predict.

The paper shows zero-shot improvements on GSM8K (5.9%β†’10.9%) and CommonsenseQA (36.3%β†’47.2%) after continued pretraining of an LM with Quiet-STaR.

πŸ‘‡
Jan 30, 2023 β€’ 5 tweets β€’ 2 min read
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

ai.papers.bar/paper/1fe37916…

Language models demonstrate both quantitative improvement and new qualitative capabilities with...

🧡 πŸ‘‡ (2/5) .. increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench)..
Jan 30, 2023 β€’ 5 tweets β€’ 2 min read
Exploration via Elliptical Episodic Bonuses

ai.papers.bar/paper/f019c2d0…

Exploration via Elliptical Episodic Bonuses (E3B) is a new method which extends count-based episodic bonuses to continuous state spaces. The...

🧡 πŸ‘‡ (2/5) .. embedding is learned using an inverse dynamics model in order to capture controllable aspects of the environment.
Jan 29, 2023 β€’ 4 tweets β€’ 2 min read
Is ChatGPT A Good Translator? A Preliminary Study

ai.papers.bar/paper/773c7fa0…

ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages. It lags...

🧡 πŸ‘‡ (2/4) .. behind significantly on lowresource or distant languages. ChatGPT does not perform as well as the commercial systems on biomedical abstracts or Reddit comments.
Jan 29, 2023 β€’ 5 tweets β€’ 2 min read
Prediction-Powered Inference

ai.papers.bar/paper/822c5638…

Prediction-powered inference is a framework for performing valid statistical inference when an experimental data set is supplemented with predictions from a...

🧡 πŸ‘‡ (2/5) .. machine-learning system such as AlphaFold. Higher accuracy of the predictions translates to smaller confidence intervals, permitting more powerful inference.
Jan 28, 2023 β€’ 4 tweets β€’ 2 min read
Active Learning from the Web

ai.papers.bar/paper/288f2912…

Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects data...

🧡 πŸ‘‡ (2/4) .. to be labeled. We propose an efficient method, Seafaring, to retrieve informative data in terms of active learning from the Web.
Jan 28, 2023 β€’ 4 tweets β€’ 2 min read
The Impossibility of Parallelizing Boosting

ai.papers.bar/paper/4335e024…

The aim of boosting is to convert a sequence of weak learners into a strong learner. At their heart, these methods are fully sequential. We...

🧡 πŸ‘‡ (2/4) .. investigate the possibility of parallelizing boosting.
Jan 28, 2023 β€’ 4 tweets β€’ 2 min read
simple diffusion: End-to-end diffusion for high resolution images

ai.papers.bar/paper/4f8b5ed4…

Currently, applying diffusion models in pixel space of high resolution images is difficult. This paper aims to improve...

🧡 πŸ‘‡ (2/4) .. denoising diffusion for high-resolution images while keeping the model as simple as possible. The four main findings are: 1) the noise schedule should be adjusted for high resolution images, 2..
Jan 28, 2023 β€’ 4 tweets β€’ 2 min read
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

ai.papers.bar/paper/7b5bed58…

The fluency and factual knowledge of large language models (LLMs) heightens the need for corresponding systems...

🧡 πŸ‘‡ ImageImageImageImage (2/4) .. to detect whether a piece of text is machine-written. For example, students may use LLMs to complete written assignments, leaving instructors unable to accurately assess student learning.
Jan 28, 2023 β€’ 4 tweets β€’ 2 min read
ClimaX: A foundation model for weather and climate

ai.papers.bar/paper/74c51402…

Most state-of-the-art approaches for weather and climate modeling are based on physics-informed numerical models of the atmosphere. Recent...

🧡 πŸ‘‡ ImageImageImageImage (2/4) .. data-driven approaches based on machine learning instead aim to directly solve a downstream forecasting or projection task. These networks are trained using curated and homogeneous climate datasets.
Jan 27, 2023 β€’ 4 tweets β€’ 2 min read
Text-To-4D Dynamic Scene Generation

ai.papers.bar/paper/83027fa4…

MAV3D (Make-A-Video3D) is a method for generating three-dimensional dynamic scenes from text descriptions. The dynamic video output generated from the...

🧡 πŸ‘‡ ImageImageImageImage (2/4) .. provided text can be viewed from any camera location and angle, and can be composited into any 3D environment.
Jan 16, 2023 β€’ 4 tweets β€’ 2 min read
YOLOv6 v3.0: A Full-Scale Reloading

ai.papers.bar/paper/f29fb672…

YOLOv6 v3.0.0 is a new version of the popular YOLO software. It has been updated for Chinese New Year 2023, which sees the Year of the Rabbit. This release...

🧡 πŸ‘‡ (2/4) .. includes novel enhancements on the network architecture and the training scheme.
Jan 16, 2023 β€’ 4 tweets β€’ 2 min read
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

ai.papers.bar/paper/f700b0b6…

Demonstrate-Search-Predict (DSP) is a framework that relies on passing natural language texts in...

🧡 πŸ‘‡ (2/4) .. sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions.
Jan 16, 2023 β€’ 4 tweets β€’ 2 min read
Myths and Legends in High-Performance Computing

ai.papers.bar/paper/6485825b…

We collected myths from conversations at conferences and meetings, product advertisements, papers, and other communications. We believe they...

🧡 πŸ‘‡ (2/4) .. represent the zeitgeist of the current era of massive change. These myths are rarely based on scientific facts but often on some evidence or argumentation.
Jan 15, 2023 β€’ 4 tweets β€’ 2 min read
TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

ai.papers.bar/paper/7f890038…

With tools like GitHub Copilot, automatic code suggestion is no longer a dream in software engineering. These tools, based on large...

🧡 πŸ‘‡ (2/4) .. language models, are typically trained on massive corpora of code mined from unvetted public sources. As a result, these models are susceptible to data poisoning attacks.
Jan 15, 2023 β€’ 4 tweets β€’ 2 min read
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

ai.papers.bar/paper/919b5b0e…

Recent advances in diffusion models have set an impressive milestone in many generation tasks. Trending works such...

🧡 πŸ‘‡ (2/4) .. as DALL-E2, Imagen, and Stable Diffusion have attracted great interest in academia and industry. Recent new approaches focus on extensions and performance rather than capacity.
Jan 14, 2023 β€’ 4 tweets β€’ 2 min read
Learning from Natural Language Feedback

ai.papers.bar/paper/4ae0e298…

Pretrained language models often do not perform tasks in line with our preferences. We propose to learn from natural language feedback, which conveys...

🧡 πŸ‘‡ ImageImageImageImage (2/4) .. more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm.
Jan 14, 2023 β€’ 4 tweets β€’ 2 min read
Tracr: Compiled Transformers as a Laboratory for Interpretability

ai.papers.bar/paper/8d3484fe…

Tracr is a "compiler" for translating human-readable programs into weights of a transformer model. Tracr takes code written in...

🧡 πŸ‘‡ ImageImageImageImage (2/4) .. RASP, a domain-specific language, and translates it into weights for a standard, decoder-only, GPT-like architecture.
Jan 14, 2022 β€’ 9 tweets β€’ 4 min read
A ConvNet for the 2020s by @MetaAI & @berkeley_ai

πŸ“Ž papers.labml.ai/paper/9f4eeafc…

They modify ResNet architecture to build a convolution-based model (ConvNeXt) that performs better than transformer-based vision models, whilst keeping the simplicity.

1/9
Summary:
πŸ§΅πŸ‘‡ 2/9

They first train a ResNet using Transformer training techniques (e.g. warmup, data augmentations, more epochs, ...). This alone improves the performance from 76.1% to 78.8%.

Then they make a series of modifications to the architecture that further improve performance.

πŸ‘‡
Jan 6, 2022 β€’ 6 tweets β€’ 3 min read
1/5 A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, ...

πŸ“Ž papers.labml.ai/paper/212cd37c…

Shows that @OpenAI Codex can be used to generate code that solves math problems by rephrasing the them as prompts.

πŸ§΅πŸ‘‡ 2/5 They solve 200+ randomly chosen math problems. They seem to have tried different prompts until codex got the solution right. They show that the transformed prompts are quite similar to original problem.
Dec 29, 2021 β€’ 5 tweets β€’ 3 min read
Improving language models by retrieving from trillions of tokens by @DeepMind

πŸ“ Annotated PDF github.com/labmlai/annota…
πŸ“Ž Paper papers.labml.ai/paper/324a7d2e…

The paper introduces Retrieval Enhanced Transformer (RETRO) - 25X smaller than GPT-3 with comparable performance.

πŸ§΅πŸ‘‡ 2/ It retrieves chunks of similar text (kNN) based on a frozen BERT model from a massive dataset (~5T tokens). Then some layers of RETRO pay cross-attention to those chunks.

Similar text chunks are found based on mean BERT embeddings of the tokens in the chunks.

πŸ‘‡