Sasha Rush Profile picture
Aug 1 7 tweets 3 min read Twitter logo Read on Twitter
Lots of folks reached out to me yesterday about the Rust ML and LLM community. Seems like supportive and intellectually-curious community, so I wanted to highlight some of the projects that you should check out 🧵
dfdx is a static shape-typed tensor library . Uses lots of Rust features and supports full backprop.github.com/coreylowman/df…
candle is an inference time tensor library with similar numpy/pytorch syntax. Check out their full LLM inference example

https://t.co/Mh1bmC5Fhogithub.com/LaurentMazare/…
github.com/LaurentMazare/…
smelte is a low-dependency nn model inference library for rust (think more flexible ggml)

github.com/Narsil/smelte-…
faer-rs and its gemms are an entire linear algebra implementation (think blas) in rust.


https://t.co/MWNfPvA1qhgithub.com/sarah-ek/faer-…
github.com/sarah-ek/gemm
Several others have also ported over and reached out to give tips for the process.

https://t.co/zJHUsxfZ58

https://t.co/ne6VDW0H1Hllama2.rs
github.com/gaxler/llama2.…
github.com/leo-du/llama2.…
This was a really interesting tip about rust internals from gaxler Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sasha Rush

Sasha Rush Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @srush_nlp

May 10
Pretraining without Attention (arxiv.org/abs/2212.10544) - BiGS is alternative to BERT trained on up to 4096 tokens.

Attention can be overkill. Below shows *every* word-word interaction for every sentence over 23 layers of BiGS (no heads, no n^2). Image
Core architecture is a state-space model. But that's just a fancy way of parameterizing a 1D CNN. This is the whole thing that replaces attention. Image
To make this simple model, we push the complexity into the per-position feed-forword Networks. We follow recent work by rearranging transformer components with more aggressive gating.

Code: github.com/jxiw/BiGS Image
Read 5 tweets
Apr 19
MiniChain (v0.3, github.com/srush/MiniChain) - a small library for prompt chaining.

Adds examples with agents, tools, streaming, and more Gradio autovis.

There are about 10 examples of popular prompts here at srush-minichain.hf.space Image
This is mostly an experiment in API design. Trying to keep things explicit and minimal. For example there is no explicit "Agent" or "Tool" abstraction. You build the react agent by just calling functions. Image
One of the main challenges in this version was wanting to support for streaming in visualizations. Mainly because it is just cool.
Read 5 tweets
Feb 27
minichain (v0.1): github.com/srush/MiniChain
Tiny library for LLM apps.

Thanks for all the feedback! Added full code examples for chat, retrieval QA, information extraction. 🧵 Image
Full "ChatGPT" example with memory

srush.github.io/MiniChain/exam… Image
Implementation of the full OpenAI retrieval notebook QA using Hugging Face Datasets

(it seems like there are 100 "AI" companies just running this prompt.)

srush.github.io/MiniChain/exam… Image
Read 5 tweets
Dec 22, 2022
Named Tensor Notation (TMLR version, arxiv.org/abs/2102.13196 w/ @davidweichiang + @boazbaraktcs)

A rigorous description, opinionated style guide, and gentle polemic for named tensors in math notation.

* Macros: ctan.org/tex-archive/ma…
Named Tensor Notation is an attempt to define a mathematical notation with named axes. The central conceit is that deep learning is not linear algebra. And that by using linear algebra we leave many technical details ambiguous to readers.
The biggest change in this version is a more complete coverage of differential calculus. Including worked examples of derivatives of attention.
Read 4 tweets
Dec 20, 2022
Blog Post (w/ @gail_w): On "Thinking Like Transformers"

In which, I get a bit obsessed with learning how to code in Transformer lang🤖.

github.com/srush/raspy

(You can follow along or do the exercises yourself in a colab notebook.)
The blog post walks through the constructs of building a computational model reflecting the transformer architecture.
And how constructs like attention can be represented abstractly.
Read 7 tweets
Oct 13, 2022
It's a joke that all NLP talks must include this graph.

But if you are a student it is a bit intimidating. How can you become an expert in where we are going if you can barely run BERT?

I asked twitter for specific advice that you might focus on:
1) Know the scaling details of the models

2) Practice extrapolating out bottlenecks from first principles

Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(