Bojan Tunguz Profile picture
Machine Learning at Nvidia. Kaggle Quadruple Grandmaster. Data Scientist. Physicist. Catholic. Husband. Father. Stanford Alum. Opinions my own. e/xgb. AMDG.
Jerome Ku Profile picture Gary Allan Davis Jr Profile picture Angel Luis Ortega Ar Profile picture 3 subscribed
Sep 9, 2023 7 tweets 2 min read
.@NVIDIA has just announced TensorRT-LLM, open source software designed to accelerate Large Language Model inference on H100s.

1/7 Image This software has been developed though close collaboration with many leading AI companies, such as @Meta, @anyscalecompute , @cohere , Deci AI, @Grammarly, @databricks and many others.

2/7
Aug 28, 2023 6 tweets 2 min read
By now most of us well aware of transformer-based large language model capabilities and, in many instances, failures. The failures in particular can seem extremely head-scratching, as they often involve the kind of mental reasoning that even a young schoolboy cold do.

1/6 Image A new paper tries to investigate the nature of these failures, and understand the limits of LLM-based reasoning. It seems that the failures primarily arise from the tasks with low in-domain knowledge and high compositional complexity.

2/6
Aug 23, 2023 4 tweets 1 min read
Very exciting news - Python is now available in the official version of Excel! Excel is the most widely used analytics tool in the World, and Python has become the most popular programming language for Data Science and Machine Learning tasks.

1/4 Image It is a very intuitive and easy to learn programming language. The merger of these two tools will open new opportunities and use cases.

This merger is the culmination of years-long effort and collaboration between Microsoft and the open source Python community.

2/4
Jun 5, 2023 7 tweets 3 min read
Large Language Models (LLMs) have emerged as the cornerstone of the current Generative AI revolution. The big problem with LLMs is that they are, well, large. Really, really, large.

1/7 Image They require an enormous amount of high-quality data to train and even more unfathomably large amount of computational power.

2/7
Mar 14, 2023 7 tweets 5 min read
Would you like to win an RTX 4080? You are in luck, because at @nvidia we are giving away one (1) for GTC 2023. All you have to do is:

1. Like and share this tweet

2. Register for GTC: nvda.ws/3j6gw41

3. Post a screenshot of you in a session as a response below

1/7 ImageImage A few points:

1. I am working with the NVIDIA marketing team to promote one giveaway; there are other influencers who are giving away more GPUs in their own giveaways.

2. GTC registration is completely free and open to the general public. All sessions are online.

2/7
Feb 8, 2023 4 tweets 3 min read
Things seem to be moving at a breakneck speed in the world of generative AI and large language models. In a surprise press event yesterday, @Microsoft announced a wide integration of @OpenAI tools into a couple of their major products,

1/4 Bing search engine and Edge web browser. In particular, this seems to be the first time that we'll see anywhere a public use of OpenAI's next generation LLM, GPT4. Most of the new features are still relatively limited, and you'll need to join the waitlist for the full access. 2/4
Feb 7, 2023 5 tweets 3 min read
In a highly anticipated move, @Google yesterday announced that they are launching Bard, a conversational AI app that is based on their LaMDA model.

1/5 LaMDA - Language Model for Dialogue Applications - has been around for at least a year, but due to variety of considerations it has never been accessible to to the public.

2/5
Jan 30, 2023 5 tweets 3 min read
Deep Learning and Neural Networks have become the default approaches to Machine Learning in recent years. However, despite their spectacular success in certain domains (vision and NLP in particular),

1/5 their use across the board for all ML problems and with all datasets is problematic, to say the least. Oftentimes better and more robust results can be obtained with simpler, easier to train and deploy, classical ML algorithms.

2/5
Jan 29, 2023 5 tweets 1 min read
There was nothing that shocked me more when I entered the industry from academia than this kind of attitude. I came from an environment where teaching and learning were the norm, to the one where giving help to “underperformers” was viewed with disdain as a liability.

1/5 Fortunately not all organizations and managers are this cutthroat, but this kind of mindset is pervasive, especially at startups. There is a widespread attitude that *it’s someone else’s responsibility to do the educating*: yours, your previous job’s, your college’s etc.

2/5
Dec 12, 2022 4 tweets 4 min read
Last week @DeepMind’s research on AlphaCode - a competative programming system - has been published in Science. AlphaCode has been able to beat 54% of humans on a competative coding challenges, putting it on par with many junior-level developers.

1/4 The original announcement from DeepMind came out in February, which in the fast-paced world of AI is already ancient history.

2/4
Dec 5, 2022 7 tweets 3 min read
Last week @OpenAI released ChatGPT - a Large Language AI Model that interacts with users in a natural conversational way. The chatbot is able to answer complex questions, even in highly technically demanding categories.

1/7 It is also able to answer the follow up question, backtrack on wrong assumptions, and provide other detailed resources, including code fragments.

2/7
Dec 3, 2022 6 tweets 3 min read
PyTorch 2.0 is out! This major release upgrade brings about many new features, but the main improvements are under the hood.

1/6 Image The three main principles behind PyTorch

1. High-Performance eager execution
2. Pythonic internals
3. Good abstractions for Distributed, Autodiff, Data loading, Accelerators, etc.

PyTorch 2.0 is fully backward compatible with the previous versions of PyTorch.

2/6
Oct 10, 2022 4 tweets 3 min read
Decision trees based Machine Learning models are some of the best performant algorithms in eras of predictive capability, especially on small and heterogenous datasets.

1/4 They also provide an unparalleled level of interpretability compared to all other non-linear algorithms. However, they are very hard to optimize on Von Neumann architecture machines due to their non-uniform memory access patterns.

2/4
Oct 8, 2022 6 tweets 3 min read
This past week I came across another paper that purports to get the SOTA for NNs for tabular data. Due to the extreme penchant for exaggeration in this community, I have given up on checking most of these claims, but decided to take a look at this particular work.

1/6 Image I decided to check how does XGBoost *really* perform on the datasets used in the paper, and the results were not pretty.

2/6
Oct 1, 2022 4 tweets 3 min read
This week @NVIDIA open sourced the 3D object generation AI model, GET3D. GET3D is a generative model of high quality 3D textured shapes learned from images.

1/4 Trained using only 2D images, GET3D generates 3D shapes with high-fidelity textures and complex geometric details.

2/4
Sep 29, 2022 5 tweets 2 min read
I have just done something really cool - I've managed to *train* XGBoost in browser completely within an HTML file! This has been possible thanks to the PyScript project that allows running Python inside of HTML, similar to how JavaScript works.

trainxgb.com

1/5 Image The example below is very simple - the script loads the small Iris dataset from sklearn. With a slider you are able to adjust the number of XGBoost trees, and the script will train different XGBoost models accordingly and print out accuracy.

2/5
Sep 20, 2022 5 tweets 2 min read
All right, here is one trick for using XGBoost for *data analysis*.

1/5 First, you create a simple model with XGBoost. It doesn't have to be fancy, or even too accurate, it's just for reference purposes. Use that model to calculate the Shapley values for your training set. Here is an example:

kaggle.com/code/tunguz/tp…

2/5
Sep 19, 2022 4 tweets 5 min read
NVIDIA GTC starts today! There are tons of exciting topics and webinars covered. This year again the whole conference is online and free, so go and register if you have not done so already.

Here are a few special highlight sessions:

1/4 Image GTC 2022 Keynote - September: lnkd.in/gYNqxsnr

How CUDA Programming Works: lnkd.in/gKmdjZub

Building the Future of Work with AI-powered Digital Humans: lnkd.in/gXJWk6vz

Building Future-Ready Intelligence for Cars: lnkd.in/gJ9BJMGM

2/4
Aug 5, 2022 5 tweets 3 min read
A very good paper I came across this morning by the @DeepMind researchers. For the past five years Transformers have been one of the most dominant approaches to Deep Learning problems, especially in the #NLP domain.

1/5 However, despite many interesting papers on the topic, and lots of good open code, there has been a noticeable lack of *formal* definition of what transformed are, especially on the level of pseudocode.

2/5
May 16, 2022 7 tweets 1 min read
It seems that quite a few people in the ML community - even those who ought to know better - think that transformers are some kind of “memorization” machines. I have no idea how this impression has been formed, could be something to do with their use for LLMs.

1/
The idea behind transformers is simple, albeit can be technically tricky to demonstrate - they are just the latest development in representation learning (RL). RL aims to find a better *representation* of data, which can indeed lead to better ML modeling.

2/
May 3, 2022 18 tweets 3 min read
"A fox knows many things, but a hedgehog knows one big thing." - Archilochus

This was a quote that Isaiah Berlin expanded into one of his essays, first published in 1953. The quote was used to classify writers and thinkers into two big categories - those who view the world

1/18 in terms of a single large idea (Plato, Pascal, Hegel, Ibsen), and those who draw from a variety of experience (Aristotle, Shakespeare, Balzac, Joyce) and are not in the process of constructing a singular worldview.

2/18