Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Sasha Rush

@srush_nlp

Aug 1 • 7 tweets • 3 min read Twitter logo

Read on Twitter

Lots of folks reached out to me yesterday about the Rust ML and LLM community. Seems like supportive and intellectually-curious community, so I wanted to highlight some of the projects that you should check out 🧵

dfdx is a static shape-typed tensor library . Uses lots of Rust features and supports full backprop.github.com/coreylowman/df…

candle is an inference time tensor library with similar numpy/pytorch syntax. Check out their full LLM inference example

https://t.co/Mh1bmC5Fhogithub.com/LaurentMazare/…
github.com/LaurentMazare/…

smelte is a low-dependency nn model inference library for rust (think more flexible ggml)

github.com/Narsil/smelte-…

faer-rs and its gemms are an entire linear algebra implementation (think blas) in rust.

https://t.co/MWNfPvA1qhgithub.com/sarah-ek/faer-…
github.com/sarah-ek/gemm

Several others have also ported over and reached out to give tips for the process.

https://t.co/zJHUsxfZ58

https://t.co/ne6VDW0H1Hllama2.rs
github.com/gaxler/llama2.…
github.com/leo-du/llama2.…

This was a really interesting tip about rust internals from gaxler

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @srush_nlp

Sasha Rush

@srush_nlp

May 10

Pretraining without Attention (arxiv.org/abs/2212.10544) - BiGS is alternative to BERT trained on up to 4096 tokens.

Attention can be overkill. Below shows *every* word-word interaction for every sentence over 23 layers of BiGS (no heads, no n^2).

Core architecture is a state-space model. But that's just a fancy way of parameterizing a 1D CNN. This is the whole thing that replaces attention.

To make this simple model, we push the complexity into the per-position feed-forword Networks. We follow recent work by rearranging transformer components with more aggressive gating.

Code: github.com/jxiw/BiGS

Read 5 tweets

Sasha Rush

@srush_nlp

Apr 19

MiniChain (v0.3, github.com/srush/MiniChain) - a small library for prompt chaining.

Adds examples with agents, tools, streaming, and more Gradio autovis.

There are about 10 examples of popular prompts here at srush-minichain.hf.space

This is mostly an experiment in API design. Trying to keep things explicit and minimal. For example there is no explicit "Agent" or "Tool" abstraction. You build the react agent by just calling functions.

One of the main challenges in this version was wanting to support for streaming in visualizations. Mainly because it is just cool.

Read 5 tweets

Sasha Rush

@srush_nlp

Feb 27

minichain (v0.1): github.com/srush/MiniChain
Tiny library for LLM apps.

Thanks for all the feedback! Added full code examples for chat, retrieval QA, information extraction. 🧵

Full "ChatGPT" example with memory

srush.github.io/MiniChain/exam…

Implementation of the full OpenAI retrieval notebook QA using Hugging Face Datasets

(it seems like there are 100 "AI" companies just running this prompt.)

srush.github.io/MiniChain/exam…

Read 5 tweets

Sasha Rush

@srush_nlp

Dec 22, 2022

@davidweichiang

Named Tensor Notation (TMLR version, arxiv.org/abs/2102.13196 w/ @davidweichiang + @boazbaraktcs)

A rigorous description, opinionated style guide, and gentle polemic for named tensors in math notation.

* Macros: ctan.org/tex-archive/ma…

Named Tensor Notation is an attempt to define a mathematical notation with named axes. The central conceit is that deep learning is not linear algebra. And that by using linear algebra we leave many technical details ambiguous to readers.

The biggest change in this version is a more complete coverage of differential calculus. Including worked examples of derivatives of attention.

Read 4 tweets

Sasha Rush

@srush_nlp

Dec 20, 2022

@gail_w

Blog Post (w/ @gail_w): On "Thinking Like Transformers"

In which, I get a bit obsessed with learning how to code in Transformer lang🤖.

github.com/srush/raspy

(You can follow along or do the exercises yourself in a colab notebook.)

The blog post walks through the constructs of building a computational model reflecting the transformer architecture.

And how constructs like attention can be represented abstractly.

Read 7 tweets

Sasha Rush

@srush_nlp

Oct 13, 2022

It's a joke that all NLP talks must include this graph.

But if you are a student it is a bit intimidating. How can you become an expert in where we are going if you can barely run BERT?

I asked twitter for specific advice that you might focus on:

https://twitter.com/stephenroller/status/1579993017234382849

1) Know the scaling details of the models

https://twitter.com/stephenroller/status/1579993017234382849

https://twitter.com/cHHillee/status/1579913387021979649

2) Practice extrapolating out bottlenecks from first principles

https://twitter.com/cHHillee/status/1579913387021979649

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Sasha Rush

Try unrolling a thread yourself!

More from @srush_nlp

Sasha Rush

Sasha Rush

Sasha Rush

Sasha Rush

Sasha Rush

Sasha Rush

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!