AI Researcher. Prof @Cornell & @Cornell_Tech. Co-Founder @afreshai. PhD @Stanford.
May 11, 2023 • 6 tweets • 2 min read
Do you know what's cooler than running LLMs on consumer GPUs? Finetuning large 65B+ LLMs on consumer GPUs! 🤖
Check out my new side project: LLMTune. It can finetune 30B/65B LLAMA models on 24Gb/48Gb GPUs.
github.com/kuleshov-group…
Here is a demo of the largest LLAMA-65B model, quantized to 4bits and finetuned on one A6000 GPU, writing the abstract of a machine learning paper:
Jan 30, 2023 • 20 tweets • 7 min read
Here is an experiment: using ChatGPT to emulate a Jupyter notebook. You can even get it to run GPT inside ChatGPT.
And you can also train neural networks from scratch inside ChatGPT.🤯
Here's walkthrough of how it works.
We start with a clever prompt that asks ChatGPT to be a Jupyter notebook.
It correctly prints out "hello", and can do basic arithmetic. So far, so good!
Let's see if it can run some numpy.
Dec 16, 2022 • 11 tweets • 4 min read
#Neurips2022 is now over---here is what I found exciting this year. Interesting trends include creative ML, diffusion models, language models, LLMs + RL, and some interesting theoretical work on conformal prediction, optimization, and more.
Two best paper awards went to work in creative ML---Imagen and LAION---in addition to many papers on improving generation quality, extending generation beyond images (e.g,. molecules), and more.
Dec 11, 2022 • 7 tweets • 3 min read
How can deep learning be useful in causal inference?
In our #NeurIPS2022 paper, we argue that causal effect estimation can benefit from large amounts of unstructured "dark" data (images, sensor data) that can be leveraged via deep generative models to account for confounders.
Consider the task of estimating the effect of a medical treatment from observational data. The true effects are often confounded by unobserved factors (e.g., patient lifestyle). We argue that latent confounders can be discovered from unstructured data (e.g., clinical notes).
Dec 26, 2021 • 7 tweets • 5 min read
Imagine you build an ML model with 80% accuracy. There are many things you can try next: collect data, create new features, increase dropout, tune the optimizer. How do you decide what to try next in a principled way?
Here is an iterative process for developing ML models using which you can obtain good performance even in domains in which you may have little expertise (e.g., classifying bird songs). These ideas are compiled from my Applied ML class at Cornell.
Jan 24, 2021 • 6 tweets • 3 min read
Did you ever want to learn more about machine learning in 2021? I'm excited to share the lecture videos and materials from my Applied Machine Learning course at @Cornell_Tech! We have 20+ lectures on ML algorithms and how to use them in practice. [1/5]
One new idea we tried in this course was to make all the materials executable. Each set of slides is also a Jupyter notebook with programmatically generated figures. Readers can tweak parameters and generate the course materials from scratch. [2/5]