**
This Thread may be Removed Anytime!**

Twitter may remove this content at anytime! Save it as PDF for later use!

- Follow @ThreadReaderApp to mention us!
- From a Twitter thread mention us with a keyword "unroll"

`@threadreaderapp unroll`

Practice here first or read more on our help page!

1. The difference between discovery and invention is a matter of degree, not a difference in kind.

2. The degree in question is what @DavidDeutschOxf calls “being hard-to-vary”, not “having been there already”.

2. The degree in question is what @DavidDeutschOxf calls “being hard-to-vary”, not “having been there already”.

For example, when Wadler says “Most programming languages were invented, not discovered—*and you can tell*”, I think he means that the set of core features in, say, C++ is not hard-to-vary (at least not compared to the set of core features in simply-typed λ-calculus).

When Harper says that when you see the same structure in categories, type theories, and logics, that's a "legitimate scientific discovery," he means that demanding a triple coincidence makes details harder to vary.

In the words of Donald Knuth (on this very topic), "But no, no, ten thousand times no!" (source: math.uni-bielefeld.de/~sillke/NEWS/0…)

Let's talk about how primitive functions on ℝ even get defined. I'm using Terry Tao's Analysis I as a reference. 1/

Let's talk about how primitive functions on ℝ even get defined. I'm using Terry Tao's Analysis I as a reference. 1/

ℝ is constructed step-by-step:

1. Obtain the naturals ℕ somehow. Popular choices include (a) taking Peano axioms as fundamental, (b) using Zermelo-Frankel axioms to construct finite ordinals, (c) defining an inductive type, or (d) the classic, accepting ℕ from God. (I kid.) 2/

1. Obtain the naturals ℕ somehow. Popular choices include (a) taking Peano axioms as fundamental, (b) using Zermelo-Frankel axioms to construct finite ordinals, (c) defining an inductive type, or (d) the classic, accepting ℕ from God. (I kid.) 2/

2. Construct ℤ as a quotient of ℕ×ℕ: specifically, pairs (a,b) where a∈ℕ and b∈ℕ, interpreted as purely formal expressions "a – b", with "a – b" and "c – d" considered equal iff a+d = c+b (where the + here is Peano addition, of type ℕ×ℕ→ℕ). 3/

Comparing the Cont & Dist monads: My operational interpretation of a continuation-passing-style value ∀R.(X→R)→R is “for any result type R, if you tell me how to compute one from an X, I’ll give you one,” and the only way to return an R is to obtain one from the continuation…

…because there are no other operations that return this arbitrary type R. Meanwhile Dist(X) is more like ∀(R:Convex).(X→R)→R. Now I can give the continuation a bunch of values in X and weighted-average the return values. So a Dist(X) is a bunch of X’s associated with weights.

This algebraic approach, where I equip R with a “convex combination” operator, only works for finitely supported distributions. To get to fully general measures, we swap our synthetic “R:Convex” for the analytic reals ℝ, but impose severe conditions on the function (X→ℝ)→ℝ…

When continuous-valued probability distributions are taught, typically the density function (pdf) is introduced first, with some handwaving, and then the cdf is defined as an integral of the pdf.

This is backwards! A pdf ONLY EXISTS as the DERIVATIVE of the cdf.

This is backwards! A pdf ONLY EXISTS as the DERIVATIVE of the cdf.

I'm not saying that the Radon-Nikodym theorem or Lebesgue measure should be explicitly introduced before we can talk about Gaussians, but I think people comfortable with calculus would rather see d/dx P(X<x) than the usual handwaving about densities

In ML, the term "manifold" generally refers to the image of a differentiable map from ℝᵏ→ℝⁿ. Sometimes assumed (explicitly or implicitly) to have a diff'ble inverse back to ℝᵏ (i.e. to be a diff'ble embedding). Either way, not the same idea of "manifold" used in math—(1/n)

With the invertibility assumption, this *is* a manifold, but an extremely special type of manifold: while manifold theory offers a lot of complexity around gluing together local Euclidean "charts", typical representation-learning learns just one *global* Euclidean chart! (2/n)

To boot, an ML "manifold" is an isometrically embedded submanifold of ℝⁿ. By Nash embedding theorem, isometric embedding into some ℝⁿ is possible for any Riemannian manifold, but it's not usually so trivial. (3/n)

Suppose I have a functor H : C^op × D → V (a profunctor; fine to assume V=Set), and endofunctors L : C→C and R : D→D, with

H(L(c),d) ≅ H(c,R(d))

natural in c,d.

Looks like an adjunction, but isn't directly—not even a relative adjunction. Does this situation have a name?

H(L(c),d) ≅ H(c,R(d))

natural in c,d.

Looks like an adjunction, but isn't directly—not even a relative adjunction. Does this situation have a name?

One characterization: endomorphisms in Chu(Cat,V). Explicitly, (L,R) : Endo_{Chu(Cat,V)}(C,D,H).

(Er, by C there I meant C^op, and I guess technically also L^op.)