How to get URL link on X (Twitter) App
https://twitter.com/LotfiSanae/status/1549764850703892481To recap, the marginal likelihood answers the question "how likely is my prior to generate the training data?" which is fundamentally different than "will my trained model provide good generalization?", leading to many discrepancies. See
https://twitter.com/andrewgwils/status/1496663760131563524?s=20
https://twitter.com/Pavel_Izmailov/status/1407522681516331016Suppose for instance there are dead pixels in an image. The weights attached to these pixels don’t affect the predictions, and so MAP (regularized optimization) drives them to zero. A BMA instead samples these weights from the prior... 2/5
https://twitter.com/dabeaz/status/1398625259708993538From a probabilistic perspective, whether we are doing supervised, semi-supervised, or unsupervised learning, forming our training objective involves starting with an observation model, turning it into a likelihood, introducing a prior, and then taking our log posterior. 2/8
https://twitter.com/carlesgelada/status/1208618401729568768In particular, the predictive distribution we often want to find is p(y|x,D) = \int p(y|x,w) p(w|D) dw. 'y' is an output, 'x' an input, 'w' the weights, and D the data. This is not a controversial equation, it is simply the sum and product rules of probability. 2/18