2/ A year ago I was approached with a unique and exciting opportunity: I was asked to help out with setting a Kaggle Open Vaccine competition, where the goal would be to come up with a Machine Learning model for the stability of RNA molecules.
3/ This is of a pressing importance for the development of the mRNA vaccines. The task seemed a bit daunting, since I have had no prior experience with RNA or Biophysics, but wanted to help out any way I could.
4/ The competition was a great success, and we've gotten many new interesting insights and useful ML models. Now, a year later, we have finally condensed all the important takeaways from this project into a paper and code repo.
5/ This project is a collaborative effort between two major crowdsourcing problem-solving platforms: @Kaggle and @EternaGame. It is a definitive proof that such platforms and such collaborations can have a major real world impact.
6/ I want to thank and congratulate all the competition winners and contributors who have made this work possible.
7/ I would especially like to thank the members of the Rhiju Das Lab at Stanford who have spearheaded this project and made it possible, and in particular @HWaymentSteele, @kimds91, @amw_stanford and Rhiju Das.
8/ I am grateful for their guidance, support, patience, and immense knowledge of this very consequential field.
One of the unfortunate consequences of Kaggle's inability to host tabular data competitions any more will be that the fine art of feature engineering will slowly fade away. Feature engineering is rarely, if ever, covered in ML courses and textbooks. 1/
There is very little formal research on it, especially on how to come up with domain-specific nontrivial features. These features are often far more important for all aspects of the modeling pipeline than improved algorithms. 2/
I certainly would have never realized any of this were it not for tabular Kaggle competitions. There, over many years, a community treasure trove of incredible tricks and insights had accumulated. Most of them unique. 3/