François Fleuret Profile picture
Prof. @Unige_en, Adjunct Prof. @EPFL_en, Research Fellow @idiap_ch, co-founder @nc_shape. AI and machine learning since 1994. I like reality.
Feb 11 11 tweets 3 min read
We often see people using the word "random variable" (RV), but their mathematical definition is unclear to most.

Here is an attempt at a TL;DR to give an intuition.

1/11

P.S. Okay, now that I have written it, I fear it won't help If you want to define the notion of something "random", the natural strategy is to define a distribution, that is, in the finite case, a list of values / probabilities.

So for instance, the head / tail result of a coin flipping would be (H, 0.5) (T, 0.5).

2/11
Jan 18 19 tweets 3 min read
Information Theory is awesome so here is a TL;DR about Shanon's entropy.

This field is about quantifying the amount "of information" contained in a signal and how much can be transmitted under certain conditions.

1/11 What makes it awesome IMO is that it is very intuitive, and like thermodynamics in Physics it give exact bounds about what is possible or not.

The key concept is Shannon entropy.

2/11
Jan 13 18 tweets 5 min read
Since these experiments have been popular, here is a recap that will be from now the thread for updates.

The motivation for all this came from discussions at @neurips_conf with @tri_dao, @_albertgu, and @srush_nlp. What I took back from them was that the reason RNNs have been replaced with transformers is purely computational. The latter are more "GPU friendly" since with enough ores, the O(T) operations can be done in O(1).
Apr 24, 2022 12 tweets 4 min read
To investigate the ability of a GPT-like model to "understand geometrical composition" I made a minimalist CLVR-like task on which I tested my own minimal GPT.

A thread! The task consist of a random arrangements of up to five colored pixels in a 6x8 image, from which I generate a bunch of boolean geometrical properties.

Here are a few train samples.
Jun 6, 2020 5 tweets 3 min read
One more toyish example in @pytorch: The double descent with polynomial regression. (thread) If we use this fitting on a piece-wise function, at first polynomials will tend to go "more and more" through samples, but result in a very irregular functional. With 8 samples, degree 7 reaches train error ~0.
May 19, 2020 6 tweets 4 min read
To illustrate attention mechanisms, I made a toy task seq2seq task and implemented an attention layer from scratch. It worked beautifully (thread) The toy task is to translate a 1d time series composed of two triangular impulses and to rectangular impulses so that their heights are equalized in each shape group to their average. ImageImageImageImage