We're excited to introduce NequIP, an equivariant Machine Learning Interatomic Potential that not only obtains SOTA on MD-17, but also outperforms existing potentials with up to 1000x fewer data! w/ @tesssmidt@Materials_Intel@bkoz37#compchem👇🧵 1/N
NequIP (short for Neural Equivariant Interatomic Potentials) extends Graph Neural Network Interatomic Potentials that use invariant convolutions over scalar feature vectors to instead utilize rotation-equivariant convolutions over tensor features (i.e. scalars, vectors, ...). 2/N
We benchmark NequIP on a wide variety of molecules+materials: we start with atomic forces from MD-17 with 1,000 training configurations and find that we not only outperform other deep neural networks, but also perform better or sometimes on par with kernel-based methods. 3/N
This is surprising since kernel methods tend to generalize better from small numbers of training samples. However, they scale poorly at run time (linear with the number of samples) which has until now led to a fundamental trade-off in ML potentials. 4/N
The high sample efficiency allows us to use more accurate reference data at beyond DFT-accuracy, which is often desired but has so far been hindered by the need for large training sets in NN-based ML potentials. 5/N
To this end, we train NequIP on quantum-chemical CCSD(T) data and find that again it significantly outperforms kernel-based methods. This is a step towards long time-scale simulations at virtually perfect accuracy. 6/N
We next test NequIP on liquid bulk water + a series of ice structures. We train it on as little as 133 structures, 1000x fewer data than a competing model (DeepMD) and find to our surprise that NequIP does better than DeepMD with so much fewer training data! 7/N
We further extend our tests with a series of challenging benchmarks including a catalytic surface reaction, a lithium phosphate glass, and a superionic conductor, finding in all cases that small training sets are enough to get very good models for the atomic forces. 8/N
Next, we recognize that ML Potentials need to be tested in actual Molecular Dynamics simulations and accuracy tables showing errors in atomic forces are not enough! 9/N
To this end, we test NequIP in a series of Molecular Dynamics simulations: we first find that we can recover the structure of a quenched lithium phosphate glass (we measure the RDF + ADF) at great accuracy with only 1,000 training structures. 10/N
Finally, we demonstrate that NequIP can also model kinetic properties: in particular, we study Li diffusion in a superionic conductor. This is a highly challenging test case, modeling diffusion with ML potentials has in the past required massive and diverse data sets. 11/N
Again we find that NequIP can predict the Li diffusivity with very high fidelity with respect to AIMD! The video below shows a visualization of a Molecular Dynamics simulation with this system. 12/N
To shed light on why NequIP is so data efficient, we perform experiments in which we explicitly turn off any interactions+features beyond scalars (reducing it to a conventional invariant GNN) and find that the equivariant network consistently outperforms the scalar network. 13/N
Finally, NequIP is FAST, even on CPUs! For the example of the toluene molecule (15 atoms), we compare to CCSD(T) and see an improvement of approx. 6 orders of magnitude. That means a 1 hour simulation of NequIP is as fast as 1 century of CCSD(T). 14/N
If you have questions or are interested in using NequIP, please don't hesitate to reach out to me at batzner@g.harvard.edu 15/N
First and foremost: this was joint with my co-first author and good friend Albert Musaelian with equal first-author contribution as well as with lab members Anders Johansson, Lixin Sun, Cameron Owen + Mordechai Kornbluth and of course @BKoz / Boris Kozinsky
Message Passing Neural Networks have taken molecular ML by storm and over the past few years, a lot of progress in Machine Learning for molecules and materials has been variations on this theme.
Learning curves of error vs training set size typically follow a power law: error = a * N^b, where N is the number of training samples and the exponent b determines how fast a method learns as new data become available.
Interestingly, it has been found that different models on the same data set usually only shift the learning curve, but do not change the power-law exponent, see e.g. [1]