trust , i've training models and deployed production applications that serve at >350M requests a day.
just need `pip install` and some good naming conventions
1. jinja - prompting frameworks 2. numpy - vector search 3. sqlite - evals, one row per exp 4. boto3 - data management, s3 and some folder structure
??? 5. google sheets ;) - experiment tracking w/ a link to the artifacts saved in S3/GCS.
Disagree?
I've been training models in @PyTorch , and deploying them via @FastAPI since the library came out!
we did large image classification tasks where the folder structure reflected class labels and had a config.json in each directory.
our early a/b tests exported to google sheets and we served similar item recommendations via numpy brute force 3M skus with 40 dimension per vector (umap over resnet and matrix factorization machines)
@PyTorch @FastAPI I have nothing to sell you and sometimes i tweet about applied machine learning with @arnaudai (ex google brain) so give us a follow!
0/ Any real AI engineer knows that streaming REALLY improves the UX.
Today, I'm landing a change that defines a reliable way to stream out multiple @pydantic objects from @OpenAI .
Take a look, by the end, you'll know how to do streaming extraction and why it matters.
1/ Streaming is critical when building applications where the UI is generated by the AI.
Notice in the screenshot that the first item was returned in 560ms but the last one in almost 2000ms! a 4x difference in time to first content
How do we do this?
2/ By using `pip install openai_function_call` we can do the following:
1) Use MultiTask to dynamically make a schema based on a `cls` that we defined 2) Set stream=True to unlock the latency win 3) `from_streaming_response` allows us to parse the completion into generator[cls].
If you've followed me from the last @LangChainAI webinar I wanted to share the repo that contains the code examples. Contributions of other ideas / evals / or examples are totally welcome. If you want to help you check the issues!