Awni Hannun Profile picture
Dec 5, 2023 4 tweets 2 min read
Just in time for the holidays, we are releasing some new software today from Apple machine learning research.

MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)

Code:
Docs: github.com/ml-explore/mlx
ml-explore.github.io/mlx/build/html… The video is a Llama v1 7B model implemented in MLX and running on an M2 Ultra.

More here:

* Train a Transformer LM or fine-tune with LoRA
* Text generation with Mistral
* Image generation with Stable Diffusion
* Speech recognition with Whisper github.com/ml-explore/mlx…
Jul 1, 2022 9 tweets 3 min read
Read a bit about Grokking recently. Here's some learnings:

"Grokking" is a curious neural net behavior observed ~1 year ago (arxiv.org/abs/2201.02177).

Continue optimizing a model long after perfect training accuracy and it suddenly generalizes.

Figure: What's especially surprising is that generalization happens SO LONG after perfect accuracy on train.

The sudden generalization is interesting, but we've seen this type of rapid concept learning in NNs before.
Jun 4, 2022 9 tweets 2 min read
A short thread on forward and reverse mode autograd:

A great way to internalize the complexity difference between forward and reverse mode automatic differentiation is through the lens of Jacobian-vector products. First: the Jacobian of a function is the matrix of derivatives with inputs on rows and outputs on columns.

The (i, j) entry is the derivative of the j-th output with respect to the i-th input.