AndriyMulyar Profile picture
building tech that enables humans to interact with latent spaces 🗺️ founder / cto @ https://t.co/NbsLHLWfy8 prev. ML Ph.D. Student at NYU Courant
Jerome Ku Profile picture 1 subscribed
Jun 1, 2023 6 tweets 2 min read
Chat with your data privately on CPU with GPT4All! 💥💬

-Open source
- Drag and drop files into a directory that GPT4All will query for context when answering questions.
- GPT4All cites its sources.

Install the chat client from gpt4all.io and go!
How it works Image To make an LLM utilize your private data you need to prompt your model with a technique known as retrieval augmented generation.

Based on your question, a retrieval algorithm select snippets from your private data to use in the prompt context for your model.
May 30, 2023 7 tweets 3 min read
Google used Atlas to visualize its LLM embeddings 🗺️
- Find out what you can learn by interactively exploring 8M embeddings. Embedding space maps teach you how an LLM represents your data. Each point on the map is an embedding of a data point. Two points are close together if your LLM considers them to be similar.

Embedding space maps let you detect causes of LLM hallucination such as dirty data. Image
Apr 13, 2023 13 tweets 5 min read
Announcing GPT4All-J: The First Apache-2 Licensed Chatbot That Runs Locally on Your Machine💥
github.com/nomic-ai/gpt4a…

Large Language Models must be democratized and decentralized. We improve on GPT4All by:
- increasing the number of clean training data points
- removing the GPL-licensed LLaMa from the stack
- Releasing easy installers for OSX/Windows/Ubuntu
Details in the technical report: s3.amazonaws.com/static.nomic.a…
Apr 11, 2023 4 tweets 1 min read
A GPT4All does not support or subvert specific political ideologies or choose winners.

google.com/amp/s/news.yah…

open source the data
open source the models
#gpt4all. As governments realize this foundational technology challenges their power, we can expect more of these types of rulings.

On the research side, this indicates that agendas centered around controllable LLMs will explode. Controlling LLMs is about controlling their training data.
Mar 28, 2023 9 tweets 3 min read
I'm excited to announce the release of GPT4All, a 7B param language model finetuned from a curated set of 400k GPT-Turbo-3.5 assistant-style generation.
We release💰800k data samples💰 for anyone to build upon and a model you can run on your laptop!
Real-time Sampling on M1 Mac Inspired by learnings from Alpaca, we carefully curated ~800k prompt-response samples to produce 430k high-quality assistant-style prompt/generation training pairs including code, dialogue, and stories.

Detailed procedure for replication and data: github.com/nomic-ai/gpt4a…