Shawn Presser Profile picture
https://t.co/EhJuFzI8NE. Looking for AI work. DMs open. ML discord: https://t.co/A6nXGshUEH
25 Oct 20
Suppose you wanted to train a world-class GPT model, just like OpenAI. How? You have no data.

Now you do. Now everyone does.

Presenting "books3", aka "all of bibliotik"

- 196,640 books
- in plain .txt
- reliable, direct download, for years: the-eye.eu/public/AI/pile…

thread 👇
I wrote up some details here: github.com/soskek/bookcor…

In OpenAI's papers on GPT-2 and 3, you'll notice references to datasets named "books1" and "books2".

books1 appears to be bookcorpus, or similar.

But OpenAI will not release information about books2; a crucial mystery.
We suspect OpenAI's books2 dataset might be "all of libgen", but no one knows. It's all pure conjecture.

Nonetheless, books3, released above, is "all of bibliotik", which I imagine will be of interest to anyone doing NLP work. Or anyone who wants to read 196,640 books. :)
Read 12 tweets
28 May 20
lol. So, we're doing some image processing with TPUs. We want to save the results directly to our cloud bucket, rather than having the results be transmitted to our VM, saved locally, then uploaded to our cloud bucket. Got a funny idea...

I guess this will be a ramble:
TPUs support a limited number of operations. But what you get in exchange is a blazingly-fast TPU.

A TPU consists of 8 cores, plus a CPU. (Yes, the TPU has a CPU -- weird concept, but think of it like a big computer with 8 GPUs. Obviously, a computer with GPUs has a CPU.)
In the same way that GPUs are much more restrictive than CPUs – it's a lot easier to write programs for CPUs than GPUs! – the TPU cores are much more restrictive than the TPU's CPU.

But that's a positive statement. It means you get some nice flexibility with the TPU's CPU.
Read 14 tweets
31 Jan 20
Success: I trained ResNet-50 on imagenet to 75.9% top-1 accuracy in 3.51 minutes using a 512-core TPUv3.

(480,000 images per second. 224x224 res JPG.)

Before you think highly of me, all I did was run Google’s code. It was hard though.

Logs: tensorboard.dev/experiment/jsD…
It uses the code from their official MLPerf imagenet benchmark. mlperf.org/training-resul…

(3.51 minutes for v3-512 is slightly faster than their posted results of 3.85min, too!)
This raises a question: *why* is the official benchmark so blazingly fast? That’s about 930 examples/sec per core. When I tried to write my own code, I could only get 250ex/sec per core. Are they cheating? *gasp*

Spoiler: nope! It’s legit. It’s faster because:
Read 10 tweets