Surprised #deeplearning methods were not massively used to detect Covid? Some tried, but most failed: e.g. they rely on patients' age instead of 'truly' analyzing the medical scans. This is because networks learn correlation and not causation. Our preprint Fishr tackles this. 1/4
When dealing with multiple datasets, we intuited that the learning should unfold consistently across those: i.e., the gradient distributions should be independent of the dataset. After a few approximations, we simply bring closer the dataset-level gradient variances! 2/4
Our approach is closely related to the Fisher Information and the Hessian. This explains the name of our work - Fishr - and why this aligns the loss landscapes per dataset. This works beyond expectations, outperforming all previous methods on the public DomainBed benchmark 3/4