Building explainable and accessible AI
https://t.co/bbYqCdL8vQ
Feb 12 • 6 tweets • 3 min read
Nomic Embed Text V2 is now available
- First general purpose Mixture-of-Experts (MoE) embedding model
- SOTA performance on the multilingual MIRACL benchmark for its size
- Support for 100+ languages
- Truly open source - open training data, weights, & code
- Apache 2.0 License
Download the model on @huggingface:
Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision:
- a high quality, unified embedding space for image, text, and multimodal tasks
- outperforms both OpenAI CLIP and text-embedding-3-small
- open weights and code to enable indie hacking, research, and experimentation
- released in collaboration with @MongoDB, @llama_index, @LangChainAI, @huggingface, @awscloud, @digitalocean, @LambdaAPI
Existing text-image embedding models, including OpenAI’s CLIP, dramatically underperform specialized text encoders on text retrieval tasks. This forces developers to deploy several embedding models and store several vector indices for multimodal applications. With Nomic-Embed-Vision, developers can use a single vector space to power both their text-text and text-image retrieval tasks.
- Deduplicate your text, image and embedding datasets in your web browser.
- Scales to millions of datapoints (e.g. English Wikipedia)
- Cross correlate with real-time regex search and semantic lasso's.
Duplicate detection is a critical component to curating datasets for AI training.
Atlas is the only dataset platform that let's you perform this operation both in your web browser and programmatically.
- Variable sized embeddings with matryoshka learning and an 8192 context.
- Outperforms OpenAI text-embedding-3-small across output sizes.
- Open source, open training code, open data.
Day 0 in @LangChainAI, @llama_index and @MongoDB
Performance is critical for the production use of embeddings but what about the memory, storage, and bandwidth footprint of the vectors?
Nomic Embed v1.5 allows you to trade off memory footprint for performance all-in-one-model.
Blog: blog.nomic.ai/posts/nomic-em…
Feb 1, 2024 • 9 tweets • 5 min read
Introducing Nomic Embed - the first fully open long context text embedder to beat OpenAI
- Open source, open weights, open data
- Beats OpenAI text-embeding-3-small and Ada on short and long context benchmarks
- Day 1 integrations with @langchain, @llama-index, @MongoDB
Open source models are not replicable unless you have access to their training data.
We release our training dataset of 235M curated text pairs to enable anyone to replicate Nomic Embed from scratch.
Local LLMs in GPT4All are now 2x faster on Apple Silicone ⚡
- Supports all LLaMa models
- Exclusive support of the Replit model for 23 tok/s code generation enabling local Copilot!
Watch the 13B parameter Hermes model run at 15 tok/s locally! gpt4all.io
To make this possible, GPT4All hackers had to implement several custom Apple Metal kernels for LLM ops (e.g. Alibi) and support a custom fork llama.cpp!
Excited to get these changes upstream! github.com/nomic-ai/llama…
Jun 1, 2023 • 6 tweets • 3 min read
Local LLMs now have plugins! 💥
GPT4All LocalDocs allows you chat with your private data!
- Drag and drop files into a directory that GPT4All will query for context when answering questions.
- Supports 40+ filetypes
- Cites sources. gpt4all.io
LocalDocs enables any GPT4All model to cite its sources.
When GPT4All decides that it can improve response factuality by using your documents it does so and tells you which documents it used.
May 16, 2023 • 5 tweets • 3 min read
Big New Release of GPT4All📶
You can now use local CPU-powered LLMs through a familiar API!
Building with a local LLM is as easy as a 1 line code change! Simply spin up the chat app at gpt4all.io and place it in server mode!
Documentation: docs.gpt4all.io/gpt4all_chat.h…
Your data never leaves your machine! The HTTP server runs on port 4981 (1984 in reverse)!
See it in action and own your large language models!
Nov 23, 2022 • 9 tweets • 3 min read
What are the latest research trends in AI?
Explore all NeurIPS submissions from 1987 to 2022 in Atlas. atlas.nomic.ai/map/neurips
Learn how it works and how to make it yourself👇
Each point is an accepted abstract at NeurIPS between 1987 and 2022.
Clusters of points represent research topics. For example, all papers about graph neural networks are here:
Nov 1, 2022 • 5 tweets • 4 min read
How do people use #stablediffusion ?
Explore 6.4 million AI generated images from @krea_ai in Atlas. atlas.nomic.ai/map/809ef16a-5…@krea_ai How it works 👇
Every point is a user-generated image and its prompt.
Points are close together if an AI considers their images similar.
For example, Billionaires Row is a region containing co-located generations of @elonmusk , Jeff Bezos, Mark Zuckerburg and US dollars.