Looking for AI work. DMs open. ML discord: https://t.co/2J63isabrY projects: https://t.co/6XsuoK4lu0
Mar 31, 2023 • 10 tweets • 3 min read
An anonymous donor named L has pledged $200k for llama-dl’s legal defense against Meta. I intend to initiate a DMCA counterclaim on the basis that neural network weights are not copyrightable.
It may seem obvious that NN weights should be subject to copyright. It’s anything but:
The US copyright office recently upheld a decision that ML outputs cannot be copyrighted. There are several reasons for this, and as far as I can tell, all of them apply to NN weights as well: reuters.com/world/us/us-co…
Mar 6, 2023 • 9 tweets • 3 min read
Fixed the llama sampler. After turning off top_p, adding top_k 40, setting temp to 0.7, and adding a repetition penalty of 1/0.85, llama 7B is looking nice.
I'll post 65B next, along with (hopefully) some big text files with lots of outputs.
That's more like it. Hello, 65B.
It's always remarkable to see just how important the settings are.
Taking Pip for a walk, then I'll be back to post more.
Feb 13, 2023 • 22 tweets • 5 min read
~@AlexSheng13 proposed an interesting idea in DMs which I think is worth studying. Suppose you could train every layer of your network simultaneously, without waiting for gradients. No backprop. In fact, no forward prop.
This sounds crazy, but there’s a clever way it could work.
First, let’s ignore that this sounds impossible, and look at the benefits. What does this get us?
Our scale becomes infinite, because we can place every layer on a different device. In fact, they can be on different continents, and it wouldn’t harm training time.
Feb 6, 2022 • 12 tweets • 3 min read
Being forced to learn Haskell had an upside: I’m able to reason about the type signatures of the functions I use, even in Python. I didn’t think that way before.
I’m less enthusiastic about Haskell than other languages, but I was surprised there was any benefit at all.
Why is this useful? And how is it different from my prior mental model?
Before, I thought of functions as little machines. So if I pass a function into map, it was similar to telling a Roomba to clean your house. Whether you ask a Roomba or a maid or clean it yourself, there’s \
Jun 5, 2021 • 9 tweets • 5 min read
So, I'm a huge fan of FF7 speedrunning. There's a certain boss that has an 8% chance of killing you at the start of the fight. But speedrunner Caleb seems to die much more than 8% of the time.
To my delight, @AceZephyr1 made a *fully automated testing harness*. Incredible!
The goal is to statistically verify whether Caleb's luck is worse than 8%. There might be something else going on. For example, FF7 uses a separate RNG for enemy encounter rate, and you can manipulate it by walking a certain number of steps in certain rooms.
Jun 4, 2021 • 6 tweets • 4 min read
Wow. I'm SSH'd into a TPU v3-8. It has 96 CPUs and 335GB of RAM. Incredible. I installed npm:
They've been working on this for quite some time. And holy moly, it was worth the wait.
Jun 3, 2021 • 7 tweets • 4 min read
Discovery for my notes: I came up with a variant of FFT I call "FST" (for Fast Shawn Transform, ha)
- FST is its own inverse: fst(fst(x)) = x
- FST of an NxM signal returns NxM real numbers. No phase!
- FST is frequency space, just like FFT. Multiplication is convolution.
Code:
import numpy as np; from numpy.fft import fft, fft2
So this is incredibly strange and cool. For my notes:
It's well-known that if you take the FFT of an NxN image, you only need NxN floats to recover the original image. But usually those are (NxN)/2 complex numbers, e.g. rfft2 is complex.
I've discovered a real-only alternative:
Here's how it works. Suppose you have a picture of a cat. First, you multiply the cat by (1 + 1j), so that you end up with a complex number where both the .real and the .imag parts are the cat image. Then you take the FFT of that.
Oct 25, 2020 • 12 tweets • 5 min read
Suppose you wanted to train a world-class GPT model, just like OpenAI. How? You have no data.
In OpenAI's papers on GPT-2 and 3, you'll notice references to datasets named "books1" and "books2".
books1 appears to be bookcorpus, or similar.
But OpenAI will not release information about books2; a crucial mystery.
May 28, 2020 • 14 tweets • 4 min read
lol. So, we're doing some image processing with TPUs. We want to save the results directly to our cloud bucket, rather than having the results be transmitted to our VM, saved locally, then uploaded to our cloud bucket. Got a funny idea...
I guess this will be a ramble:
TPUs support a limited number of operations. But what you get in exchange is a blazingly-fast TPU.
A TPU consists of 8 cores, plus a CPU. (Yes, the TPU has a CPU -- weird concept, but think of it like a big computer with 8 GPUs. Obviously, a computer with GPUs has a CPU.)
Jan 31, 2020 • 10 tweets • 3 min read
Success: I trained ResNet-50 on imagenet to 75.9% top-1 accuracy in 3.51 minutes using a 512-core TPUv3.
(480,000 images per second. 224x224 res JPG.)
Before you think highly of me, all I did was run Google’s code. It was hard though.