Discover and read the best of Twitter Threads about #NeurIPS2022

Most recents (24)

🧵 $TAO Thread

Hopefully it gives newcomers to the project an idea of what @bittensor_ is and what they are trying to achieve

I will continue to add to it as time passes.. Its a re-hash of my thoughts over the past 6 months

$TAO = The creation of the Neural Internet 🌐
Imagine Machine Learning problems were not solved by huge centralised supercomputers owned by a few mega-corporations..

Instead.. Leveraging every available computer around the world in a truly decentralised manner to train ML/AI models & allow them to connect & interact

$TAO
#Bitcoin is rewarded via every block.. But there's a lot of wasted energy through those who offer compute power & fail

$TAO takes those extra compute resources & uses it to develop/train Machine Learning & AI models via the Bittensor network

Meaning very little energy is wasted
Read 23 tweets
#Neurips2022 is now over---here is what I found exciting this year. Interesting trends include creative ML, diffusion models, language models, LLMs + RL, and some interesting theoretical work on conformal prediction, optimization, and more.
Two best paper awards went to work in creative ML---Imagen and LAION---in addition to many papers on improving generation quality, extending generation beyond images (e.g,. molecules), and more.
There was a lot of talk about ethics in creative ML (even an entire workshop on it), but I also saw fun applications in art, music & science (note: all workshops are recorded). Companies from Google to RunwayML had a big presence.
Read 11 tweets
How can deep learning be useful in causal inference?

In our #NeurIPS2022 paper, we argue that causal effect estimation can benefit from large amounts of unstructured "dark" data (images, sensor data) that can be leveraged via deep generative models to account for confounders.
Consider the task of estimating the effect of a medical treatment from observational data. The true effects are often confounded by unobserved factors (e.g., patient lifestyle). We argue that latent confounders can be discovered from unstructured data (e.g., clinical notes).
For example, suppose that we have access to raw data from wearable sensors for each patient. This data implicitly reveals whether each patient is active or sedentary—an important confounding factor affecting treatment and outcome. Thus, we can also correct for this confounder.
Read 7 tweets
Introducing Dramatron, a new tool for writers to co-write theatre and film scripts with a language model. 🎭

Dramatron can interactively co-create new stories complete with title, characters, location descriptions and dialogue.

Try it yourself now: dpmd.ai/dramatron-gith…
✏️ We interviewed 15 industry experts including playwrights, screenwriters and actors who produced work using Dramatron.

Canadian company @TheatreSports edited co-written theatre scripts and performed them on stage in Plays By Bots to positive reviews. dpmd.ai/dramatron-tw
Want to find out more? The team will be presenting this research at #NeurIPS2022:

📅 December 9
⌚ 3pm CST

dpmd.ai/3YbA0nK @ml4cdworkshop
Read 4 tweets
You couldn't make it to #NeurIPS2022 this year?

Nothing to worry - I curated a summary for you below focussing on key papers, presentations and workshops in the buzzing space of ML in Biology and Healthcare 👇
Starting off with Keynote presentations:

Back prop has become the workhorse in ML-
@geoffreyhinton challenges the community to rethink learning introducing the Forward-Forward Algorithm that are trained to have high goodness on positive and low goodness on negative samples.
Giving models an understanding of what they do not know, is for many decision-making applications as important as providing accurate predictions

E Candès @Stanford gave a broad introduction to conformal prediction with quantile regression to filter out low confidence predictions
Read 19 tweets
Machine learning predictive uncertainty estimates are often unreliable—data shift makes things worse!

How can you audit the uncertainty of an ML prediction, even with biased data?

A 🧵 w/ @DrewPrinster on the JAWS approach in #NeurIPS2022 paper w/ fab @anqi_liu33 @DrewPrinster
Why generate uncertainty intervals and enable real time audits?

Build user trust arxiv.org/pdf/1805.11783… proceedings.mlr.press/v89/schulam19a…
In decision support apps, reduce false alerts pubmed.ncbi.nlm.nih.gov/28841550/
Enable safety assessment inhttps://www.nejm.org/doi/full/10.1056/NEJMc2104626
Background: #conformalprediction is becoming popular for predictive interval generation with a coverage guarantee

Coverage: Predictive interval contains true label with high probability (i.e., predictive confidence intervals are valid)

Assumption: Exchangeable (or, IID) data
Read 9 tweets
We're releasing an optimized implementation of GPT2/GPT3 with FlashAttention🚀!
This trains 3-5x faster than the Huggingface version, reaching up to 189 TFLOPs/sec per A100, 60.6% (model) FLOPs util of the theoretical maximum. 1/6
github.com/HazyResearch/f…
The main ingredient is FlashAttention, which computes attention fast (2-4x) and with less memory (10x), without any approximation. This means that we don't need to do any activation checkpointing
2/6
We also provide optimized implementations of other layers:
- Fused matmul + bias + gelu for the MLP (based on Apex and cuBLASLt)
- Optimized cross entropy loss (based on Apex)
- Fused rotary embedding
- Fused dropout + residual + LayerNorm (building on Apex's FastLayerNorm)
3/6
Read 6 tweets
OK, this website still seems to be working…so…time to share our latest preprint!

Very pleased to be able to share this one: is attention all you need to solve the Schrödinger equation? arxiv.org/abs/2211.13672
For the last several years, numerous groups have shown that neural networks can make calculations in quantum chemistry much more accurate - FermiNet, PauliNet, etc. We wrote a review article about it here:
Most work since then has only made small tweaks to these basic neural network antazes. Instead, we tried to reinvent neural network ansatzes from the ground up. The result is a model we call the Psiformer: basically, a Transformer encoder designed for quantum chemistry. Image
Read 9 tweets
Wondering how one can create a dataset of several TB of text data to train a language model?📚

With @BigscienceW, we have been through this exercise and shared everything in our #NeurIPS2022 paper "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset"

🧵
🌸What is ROOTS?

The Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus is a 1.6TB corpus including 46 natural languages and 13 programming languages 🌟
The languages were chosen based on the language expertise of the communities who participated in the effort
Read 23 tweets
[1/3] Excited to announce “Semantic uncertainty intervals for disentangled latent spaces”. Poster Session 6 at #Neurips2022!
We show how to construct uncertainty intervals with meaning, eg. on a person’s hair color, or the amount they are smiling! @trustworthy_ml
[2/3] We use conformal prediction, quantile regression (QR), and GANs to go beyond pixel-level uncertainty, getting rigorous uncertainty quantification for image generation!

The idea is to perform QR on disentangled GAN latents, then calibrate it with conformal risk control.
[3 / 3] This was joint work with @ml_angelopoulos, @stats_stephen, @yaniv_romano, and @phillip_isola. If you want to come by talk about rigorous statistics for generative models, please visit our poster at Poster Session 6 or DM me to chat!
Read 3 tweets
#NeurIPS2022
What are ideal representations for self-sup. learning (SSL)?

🤓We give simple optimality conditions and use them to improve/understand/derive SSL methods!

🔥outperform baselines on ImageNet

arxiv.org/abs/2011.10566
w. @tatsu_hashimoto @StefanoErmon @percyliang
🧵
Goal: ideally representations should allow linear probes to perfectly predict any task that is invariant to augmentations in the most sample-efficient way

Q: Which of the following representation is optimal?

2/8
A: last one.

More generally we show that representations are optimal if and only if:
1. *Predictability*: linear probes can predict equivalence classes
2. *High dimension*: representation dim d=# equiv-1
3. *Invariance*: representation of equivalent examples collapse

3/8
Read 11 tweets
Happy to advertise our paper, "Active Bayesian Causal Inference," to be presented at #NeurIPS2022, @NeurIPSConf.

openreview.net/forum?id=r0bjB…
Causal models are powerful reasoning tools, but usually we don't know which model is the correct one. Traditionally, one first aims to find the correct causal model from data, which is then used for causal reasoning.
How about a Bayesian approach? I.e., we put a prior over a class of causal models, and, when observing new data, we simply update the posterior using Bayes' rule?

Advantage 1. We maintain our uncertainty about the causal model in a rigorous way (because it's Bayesian 🥰😍).
Read 10 tweets
Our group @ai4life_harvard is gearing up for showcasing our recent research and connecting with the #ML #TrustworthyML #XAI community at #NeurIPS2022. Here’s where you can find us at a glance. More details about our papers/talks/panels in the thread below 👇 [1/N] Image
@ai4life_harvard [Conference Paper] Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations (joint work with #TessaHan and @Suuraj) -- arxiv.org/abs/2206.01254. More details in this thread [2/N]
[Conference Paper] Efficient Training of Low-Curvature Neural Networks (joint work with @Suuraj, #KyleMatoba, @francoisfleuret) -- arxiv.org/abs/2206.07144. More details in this thread [3/N]
Read 10 tweets
Delighted to present the NetHack Learning Dataset (NLD) at #NeurIPS2022 next week!

NLD is a new large-scale dataset for NetHack and MiniHack, aimed at supercharging research in offline RL, learning from observations, and imitation learning.

1/
NLD has 3 parts. First, NLD-NAO, a dataset of 10 billion state transitions from 1.5 million games recorded on the alt.org/nethack server from 2009-20.

That's bigger than MineDojo in trajectories, with unprecedented coverage of NetHack, including 22,000 ascensions!

2/
Next up, NLD-AA, 3 billion state-action-score transitions from 100,000 trajectories recorded by the winning bot of the NetHack Challenge - AutoAscend.

This symbolic bot trounced the best neural agents in the challenge by 3x, but is still very short of full ascension.

3/
Read 6 tweets
Excited to share the details of our work at @DeepMind on using reinforcement learning to help large-scale commercial cooling systems save energy and run more efficiently: arxiv.org/abs/2211.07357.

Here’s what we found 🧵
First, #RL can substantially outperform industry standard controllers.

📉 We reduced energy use by 9% and 13% at two separate sites, while satisfying all of the constraints at a level comparable with the baseline policy.
🔧 We built on the existing RL controller used for cooling Google’s data centers and extended it to a more challenging setup.

There’s a higher dimensional action space (jointly controlling multiple chillers), more complex constraints, and less data standardization. Image
Read 6 tweets
Paper alert: 📜
We will be presenting our paper "Conformal Off-Policy Prediction" at #NeurIPS2022.
arxiv.org/abs/2206.04405
We present COPP, a novel methodology of quantifying uncertainty in off-policy outcomes. 1/n ImageImage
Given an untested policy and past observational data, how do you find the most likely outcome(s) under this policy without deploying it in the real-world? We solve this problem for contextual bandits using Conformal Prediction, which comes with strong theoretical guarantees. 2/n
Existing OPE methodologies estimate the *average* reward under the new policy. This does not convey information about the distribution of the reward itself. To the best of our knowledge, COPP is the first work which estimates the uncertainty in the reward itself. 3/n Image
Read 7 tweets
Members of @UCL_DARK are excited to present 8 conference papers at #NeurIPS2022.

Here is the full schedule of all @UCL_DARK's activities (see bit.ly/DARK-NeurIPS-2… for all links).

We look forward to seeing you New Orleans! 🇺🇸

Check out the 🧵 on individual papers below 👇 ImageImage
Read 9 tweets
Excited to present 3 #NeurIPS2022 papers on a trend I've been very excited about recently: blurring the boundaries between language models and RL agents

(+a bonus 4th paper on active learning!)

🧵(0/7)

PS: I'm on the industry job market! The "They're the same picture" The Office meme. Th
1️⃣ Improving Intrinsic Exploration with Language Abstractions

Using language abstractions to guide exploration in RL, e.g. by self-designing a curriculum of increasingly difficult language goals



Also see @ykilcher review:

(1/7)
2️⃣ Improving Policy Learning with Language Dynamics Distillation (led by @hllo_wrld)

Increasing RL sample efficiency by pretraining agents to model env dynamics from language-annotated demonstrations



(2/7)
Read 8 tweets
1/ Our new preprint biorxiv.org/content/10.110… on when grid cells appear in trained path integrators w/ Sorscher @meldefon @aran_nayebi @lisa_giocomo @dyamins critically assesses claims made in a #NeurIPS2022 paper described below. Several corrections in our thread ->
2/ Our prior theory authors.elsevier.com/c/1f~Ze3BtfH1Z… quantitatively explains why few hexagonal grid cells were found in the work; many choices were made which prior theory proved don’t lead to hexagonal grids; when 2 well understood choices are made grids appear robustly ~100% of the time
3/ Also corrections: (1) difference of Gaussian place cells do lead to hexagonal grids; (2) multiple bump place cells at one scale also; (3) hexagonal grids are robust to place cell scale; (4) Gaussian interactions can yield periodic patterns;
Read 11 tweets
Very excited to announce our #NeurIPS2022 paper No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit.

It's a story about NeuroAI, told through a story about grid & place cells.

Joint w/ @KhonaMikail @FieteGroup 1/15
@KhonaMikail @FieteGroup The promises of deep learning-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using deep network models of the MEC-HPC circuit, that one may get neither! 2/15
@KhonaMikail @FieteGroup Prior work claims training networks to path integrate generically creates grid units (left). We empirically show & analytically explain why grid-like units only emerge in a small subset of biologically invalid hyperparameter space chosen post-hoc by the programmer (right). 3/15
Read 16 tweets
Spurious features are a major issue for deep learning. Our new #NeurIPS2022 paper w/ @pol_kirichenko, @gruver_nate and @andrewgwils explores the representations trained on data with spurious features with many surprising findings, and SOTA results.

arxiv.org/abs/2210.11369
🧵1/6 Image
We use Deep Feature Reweighting (DFR) to evaluate feature representations: retrain the last layer of the model on group-balanced validation data. DFR worst group accuracy (WGA) tells us how much information about the core features is learned.



2/6
While group robustness methods such as group DRO can improve WGA a lot, they don’t typically improve the features! With DFR, we recover the same performance for ERM and Group DRO. The improvement in these methods comes from the last layer, not features!

3/6 Image
Read 6 tweets
📣📄 Introducing "Generalised Implicit Neural Representations"!

We study INRs on arbitrary domains discretized by graphs.
Applications in biology, dynamical systems, meteorology, and DEs on manifolds!

#NeurIPS2022 paper with @trekkinglemon
arxiv.org/abs/2205.15674

1/n 🧵
First, what is an INR? It's just a neural network that approximates a signal on some domain.

Typically, the domain is a hypercube and the signal is an image or 3D scene.

We observe samples of the signal on a lattice (eg, pixels), and we train the INR to map x -> f(x).
Here we study the setting where, instead of samples on a lattice, we observe samples on a graph.

This means that the domain can be any topological space, but we generally don't know what that looks like.
Read 11 tweets
Excited to share our #NeurIPS2022 @NVIDIAAI work GET3D, a generative model that directly produces explicit textured 3D meshes with complex topology, rich geometric details, and high fidelity textures. #3D

Project page: nv-tlabs.github.io/GET3D/
Our method builds on the success in differentiable surface modeling (nv-tlabs.github.io/DMTet/), differentiable rendering (nvlabs.github.io/nvdiffrec/ ) and 2D GANs, allowing it to learn 3D from 2D image collections. Image
GET3D can populate a crowd of objects with diversity in geometry and texture, including: 1 Lights & wheels for the cars; 2 Mirrors & tires for the motorbike, 3 Mouth, ears & horns for the animals; 4 Wheels on the legs of the chairs; 5 Shoes & cloths for humans, etc
Read 9 tweets
One of the biggest criticisms of the field of post hoc #XAI is that each method "does its own thing", it is unclear how these methods relate to each other & which methods are effective under what conditions. Our #NeurIPS2022 paper provides (some) answers to these questions. [1/N]
In our #NeurIPS2022 paper, we unify eight different state-of-the-art local post hoc explanation methods, and show that they are all performing local linear approximations of the underlying models, albeit with different loss functions and notions of local neighborhoods. [2/N]
By doing so, we are able to explain the similarities & differences between these methods. These methods are similar in the sense that they all perform local linear approximations of models, but they differ considerably in "how" they perform these approximations [3/N]
Read 13 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!