When continuous-valued probability distributions are taught, typically the density function (pdf) is introduced first, with some handwaving, and then the cdf is defined as an integral of the pdf.
This is backwards! A pdf ONLY EXISTS as the DERIVATIVE of the cdf.
I'm not saying that the Radon-Nikodym theorem or Lebesgue measure should be explicitly introduced before we can talk about Gaussians, but I think people comfortable with calculus would rather see d/dx P(X<x) than the usual handwaving about densities
A prime example of the kind of handwaving I'm whinging about, from Wikipedia. What is this nonsense,
very tempted to add a "[by whom?]" tag after "can be interpreted"
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Deep neural networks, as you probably know, are sandwiches of linear regressions with elementwise nonlinearities between each layer.
The core contribution of “Attention is All You Need,” which led directly to the LLM/GPT explosion,
is to throw some *logistic* regressions in there
Credit is also due to @geoffreyhinton for dropout, @ChrSzegedy for activation normalization, and @dpkingma for gradient normalization (Adam). The rest is commentary
@geoffreyhinton @ChrSzegedy @dpkingma @ylecun is commonly credited with the initial stacked-linear-regression idea (and using gradient descent to handle the learning), and the logistic regression layer was distilled from Bengio’s bag of tricks (which also includes much of the commentary).
with GPT-4 code interpreter, it finally became worthwhile for me to run the numbers myself on that lead-poisoning theory—that the 1971-2012 technological stagnation is a function of environmental cognitive impairment of the grad student and postdoc population—and uh:
be careful with that lead-apatite out there folks
@BenjaminDEKR quantum Hall effect, HTML, email, Web, search, LED displays, smartphone form factor… not nothing, but all kind of underwhelmingly derivative by comparison, no? anyway the 1971 date is due to @tylercowen. not sure if he’d agree that it ended in 2012, right after he pointed it out
I often find myself disclaiming that I do *not* propose to formally verify question-answering or assistant-type models, because I don’t think the specifications can be written down in a language with formal semantics.
But what if… 🧵
Scott Viteri suggested I consider the premise that LLMs “know what we mean” if we express specifications in natural language. I’m not convinced this premise is true, but if it is, we can go somewhere pretty interesting with it. 1/
Imagine taking two instances of the LLM and stitching them together into a cascade, where the 2nd copy checks whether a trajectory/transcript satisfies certain natural-language spec(s), and ultimately concludes its answer with YES or NO. (This is not unlike step 1 of RLAIF.) 2/
2020s Earth has an acutely unprecedented concentration of technological “dry powder”: existing machines & infrastructure, controlled by easily reprogrammable devices.
This broadly offense-dominant technology base is a critical factor in the extinction risk posed by AI. 🧵
If GPT-4’s Azure datacenter were plonked in 1820s Earth, it wouldn’t do much. After a few hours, the uninterruptible power supplies and other backup power sources would drain, and it *really* wouldn’t do much. The same is true of GPT-n for any n. Intelligence ⇏ causal power!
Suppose you bring GPT-99 to 1823 along with a self-contained nuclear power station. And suppose for the sake of argument that it’s prompted to design a successor AI that causes as much total damage to human life as possible (a prompt which surely no human would ever give, right?)
What this argument misses is that it’s not (currently!) scalable to build a world-model that can ground legal entities in physical dynamics sufficiently detailed as to facilitate enforcement,
nor to verifiably plan within such a rich model.
As a matter of praxis, Yoshua Bengio suggests that the AI R&D community focus mostly on the scientific modeling AI and not deploy any autonomous agents until they can be proven safe to a high standard, which seems very sensible to me.