Matrix Multiplication (MatMul) is one of the root node problems where any speedup results in improvements in many areas, including Matrix Inversion, Factorization, & Neural Networks.
In industry, MatMul is used for image processing, speech recognition, computer graphics, etc. 2/
We formulated the problem of finding MatMul algorithms as a single-player game. In each episode, the agent starts from a tensor representing a MatMul operator and has to find the shortest path to an all-zero tensor. The length of this path corresponds to the # multiplications. 3/
The main RL challenges are:
1) Gigantic action space, e.g., 10^23 for 4x4 MatMul. 2) Many symmetries in observations & actions, e.g., actions are order invariance. 3) A single wrong action/move results in a worse/sub-optimal algorithm.
4/
To tackle the problem, we developed #AlphaTensor, an #AlphaZero-based agent with novel capabilities dealing with the above RL challenges. For example, we employed a risk-seeking distributional value head to improve exploration. 5/
For many matrix sizes, #AlphaTensor discovered MatMul algorithms that are faster than the state of the art. These algorithms can be applied recursively to larger matrix sizes, effectively resulting in a bank of new, exact, and faster MatMul algorithms! 6/
#AlphaTensor invented a beautiful algorithm for 4x4 modular arithmetic MatMul, which surpassed two-level Strassen’s algorithm for the first time in 50 years!
This result was so significant that at first, we thought there is a bug 😅 7/