The video is a Llama v1 7B model implemented in MLX and running on an M2 Ultra.
More here:
* Train a Transformer LM or fine-tune with LoRA
* Text generation with Mistral
* Image generation with Stable Diffusion
* Speech recognition with Whisper github.com/ml-explore/mlx…
MLX Data is a framework agnostic, efficient, and flexible package for data loading.
A short thread on forward and reverse mode autograd:
A great way to internalize the complexity difference between forward and reverse mode automatic differentiation is through the lens of Jacobian-vector products.
First: the Jacobian of a function is the matrix of derivatives with inputs on rows and outputs on columns.
The (i, j) entry is the derivative of the j-th output with respect to the i-th input.
Reverse-mode let's you compute a Jacobian-vector product for a given vector in a single pass.
Forward-mode let's you compute a (row) vector-Jacobian product for any vector in a single pass.