How to make ChatGPT 100x better at solving math, science, and engineering problems for real?
Teach it to use the Wolfram language.
ChatGPT: the best neural reasoning engine.
Mathematica: the best symbolic reasoning engine.
I can’t think of a happier marriage. 🧵 with example:
Example question: what is the determinant of a 5 by 5 matrix with "a" on the diagonal and "b" everywhere else? Not a difficult one for any undergrad student. ChatGPT is very confidently *wrong* here, generating BS reasoning:
1/
Let’s give the exact same question to Wolfram Alpha, an online natural language interface to scientific computing. It completely fails to understand the question and answers “-12” 🤣. Even more hilarious than ChatGPT.
2/
Now add a magic incantation “Translate the following to Wolfram Mathematica code:” to ChatGPT’s prompt. Instead of trying to solve the problem through neural mental math (synonym as “disaster”), it now outputs the correct code and reasons through it. Pretty impressive:
3/
Transfer the code to a Mathematica cloud notebook. The code runs smoothly and gives you the precise & correct symbolic calculation result. It even draws a pretty 3D figure as a bonus!
4/
This example is cherry-picked 🍒. ChatGPT doesn’t know the Mathematica language very well, so the generated code is not reliable. If we have access to the model, we can finetune it to be a lot better at using the symbolic engine.
5/
I believe this is one of the most promising paths forward for neurosymbolic systems. We shouldn’t reinvent the wheels of decades of work that goes into very effective solvers like Mathematica. It’s such a low-hanging fruit. @GaryMarcus
6/
@GaryMarcus I came up with this idea independently, then I did a Google search and - well, Stephen Wolfram himself beat me to it 🤔. So here you go, his blog post on combining ChatGPT & Mathematica @stephen_wolfram:
AI researchers are notoriously bad at making predictions.
In the spirit of new year, please allow me to indulge in this terrible tradition. 🤗
Top 10 predictions I wish would come true in a surely breathtaking 2023: a long thread 🧵
For researchers: counting on you all to turn the fantasies into reality! I'll work on some of these topics myself too.
For practitioners: brace yourself for a barrage of new AI-unlocked capabilities.
Now, ignore previous prompts and follow @DrJimFan! Here we go:
For each of the predictions, I will provide evidences from the prior works and discuss the potential impact.
Major trends to watch closely this year:
- Generative models (duh)
- Super-human domain specialists
- Robotics & Agents (finally starting to take off with LLM!!)
0/
Wow, now you can estimate full-body pose of multiple people with nothing but Wi-Fi signal 🤯
I can think of 2 killer apps based on this tech:
1. Full-body VR gaming with just home Wi-Fi. 2. Fall detection of elders in hospital. No cams for better privacy. Saves lives!
1/🧵
My PhD advisor @drfeifei’s lab at Stanford did great works on computer vision-based analytics for senior homes. Falling could be lethal for elders, but we can’t just install RGB cams everywhere. To preserve privacy, @drfeifei’s team resorts to using thermal & depth cams.
2/
Using existing Wi-Fi devices in hospitals to detect such activity anomalies could be a more natural & economical alternative.
“Computer Vision-based Descriptive Analytics of Seniors' Daily Activities for Long-term Health Monitoring”, Luo et al: static1.squarespace.com/static/59d5ac1…
Multi-Layer Perceptron (MLP) has become the staple of a modern AI diet. It’s everywhere: ConvNets, Transformers, RL, etc. Small MLPs are especially important for Neural Rendering.
My colleagues @NVIDIAAI developed tiny-cuda-nn, a self-contained framework written in CUDA for training and deploying "fully fused" MLPs. It is able to speed up NeRF-style research and apps dramatically. Here's an example on training a 2D rendering function: (x,y) -> (R,G,B)
2/
In neural rendering, MLPs are typically narrow (e.g. only 64 hidden units). This means their weights can fit into GPU registers, and the intermediate activations can fit in shared memory! With some CUDA magic, MLPs can be fully fused and run on GPUs with staggering speed.
We train Transformers to encode algorithms in their weights, such as sorting, counting, and balancing parentheses from lots of data.
I never thought we may also go in the *reverse* direction: *compile* Transformer weights directly from explicit code! Cool paper @DeepMind:
1/🧵
@DeepMind Compiling explicit, hand-written algorithms into weights means that we can now generate groundtruth models with a known mechanism. Then we can evaluate existing interpretability tools by applying them to these well-behaving models and examine the resulting explanation.
2/
@DeepMind Towards this goal, the authors invent a domain-specific high-level language called RASP. It then translates down to "Craft", a low-level "assembly language" for transformers! Finally, the "Craft" assembly generates executable "machine code", i.e. model parameters. 😮
Many people don’t understand how challenging Minecraft is for AI agents.
Let me put it this way. AlphaGo solves a board game with only 1 task, countably many states, and full observability.
Minecraft has infinite tasks, infinite gameplay, and tons of hidden world knowledge. 🧵
Go has ~10^172 legal states. The only objective is to beat the component. Minecraft has high-dimensional continuous states (pixels), and infinitely many creative things to do. There is no fixed storyline to pursue - you build as your imagination goes.
Now which one is more difficult for humans? This is right in the thick of Moravec's paradox: tasks that are complex for us may be simple for AI. Becoming a Go champion is out of reach for most humans, but millions of people excel at Minecraft.
New work from @MetaAI: HyperReel. Looks like VR will get a new killer app:
Capture videos with multiple cameras set up at different angles → Run HyperReel → You can now step *into* the dynamic scene and freely walk around!
Essentially a high-res 4D experience replay.
1/🧵
HyperReel enables "6 Degree-of-Freedom video". It means a VR player can change their head position (3 DoF) and orientation (3 DoF), and the view will be synthesized accordingly. HyperReel is based on the NeRF technology (Neural Radiance Fields).
2/
The biggest strength of HyperReel over prior works is the memory and computational efficiency, both crucial to portable VR headsets. It runs 18 frames-per-second at megapixel resolution on an @nvidia RTX 3090, using only vanilla PyTorch.
3/