Profile picture
Simon DeDeo @SimonDeDeo
, 26 tweets, 7 min read Read on Twitter
Kullback-Leibler divergence has an enormous number of interpretations and uses: psychological, epistemic, thermodynamic, statistical, computational, geometrical... I am pretty sure I could teach an entire graduate seminar on it.
Psychological: an excellent predictor of where attention is directed. ilab.usc.edu/surprise/
Epistemic: a normative measure of where you ought to direct your experimental efforts (maximize expected model-breaking) jstor.org/stable/4623265
Thermodynamic: a measure of work you can extract from an out-of-equlibrium system as it relaxes to equilibrium.
Statistical: too many to count, but (e.g.) a measure of the failure of an approximation method. countbayesie.com/blog/2017/5/9/…
Computational (machine learning): a measure of model inefficiency—the extent to which it retains useless information. arxiv.org/abs/1203.3271
Computational (compression): the extent to which a compression algorithm designed for one system fails when applied to another.
Geometrical: the (non-metric!) connection when one extends differential geometry to the probability simplex.
Ooh, biological: the extent to which subsystems co-compute. mdpi.com/1099-4300/17/4…
Another machine learning application: the basic loss function for autoencoders, deep learning, etc. (people call it the "cross-entropy")
Wait, there's more: algorithmic fairness. How to optimally constrain a prediction algorithm when ensuring compliance with laws on equitable treatment. arxiv.org/abs/1412.4643
Cultural evolution: a metric (we believe) for the study of individual exploration and innovation tasks... sciencedirect.com/science/articl…
and competitive and collaborative creation and sharing of ideas... pnas.org/content/115/18…
Someone can explain to me how Kullback-Leibler generalizes to the quantum case—apparently in the case of commuting operators? pdfs.semanticscholar.org/30a7/6a44a4f0f…
And if you want to work with generalized entropies and superstatistics (i.e., for coupled systems), it's the special case of the α-Rényi divergence... pnas.org/content/108/16…
For the digital humanists, Kullback-Leibler divergence is related to TFIDF, but with much nicer properties when it comes to coarse-graining. (The most distinctive words have the highest partial-KL when teasing apart documents; stopwords have the lowest) mdpi.com/1099-4300/15/6…
In a very basic sense, Kullback-Leibler divergence is to probability spaces (and thus epistemic states) what the cross-product is to vector spaces.
Oh, I bet you like mutual information, huh? Well, it's a special case of Kullback-Leibler—the extent to which you're surprised by (arbitrary) correlations between a pair of variables if you believe they're independent.
It's also (unlike entropy) measure-independent—it behaves well (and in a non-arbitary fashion) for continuous distributions. (A naieve calculation of entropy when you integrate over the uniform distribution is negative!)
More statistics: it's the underlying justification for the Akiake Information Criterion, used for model selection.
If people actually want this seminar, we could probably tape it this June through @ComplexExplorer at @sfiscience. Non-trivial effort but I’d do it sans honorarium if people funded the production costs. Tell them (through the donation button?) complexityexplorer.org/about/donate
MORE KL—philosophy of mind. It’s the “free energy” term in the predictive brain account of perception and consciousness. See Andy Clark’s new book or link.springer.com/article/10.100…
Yet more, via @postquantum, on the quantum case: "if you have some restricted class of operations, then the KL divergence tells you how much of the resource you need (work, entanglement, information) and this measure is unique". arxiv.org/abs/quant-ph/0…
Economists have been neglected in this thread, so here's @itsaguytalking on KL for studying trade under heterogeneous beliefs. columbia.edu/~ez2197/HowToM…
This is a beautiful paper. A reference to Aumann makes me wonder if KL could be used to measure convergence in "complexity of agreement" problems—arxiv.org/abs/cs/0406061
here, via @ilmarb, is a talk by @johncarlosbaez on KL in biology—
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Simon DeDeo
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!