Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Nomic AI

@nomic_ai

Feb 1, 2024 • 9 tweets • 5 min read • Read on X

Scrolly

Introducing Nomic Embed - the first fully open long context text embedder to beat OpenAI

- Open source, open weights, open data
- Beats OpenAI text-embeding-3-small and Ada on short and long context benchmarks
- Day 1 integrations with @langchain, @llama-index, @MongoDB

Open source models are not replicable unless you have access to their training data.

We release our training dataset of 235M curated text pairs to enable anyone to replicate Nomic Embed from scratch.

Blog: blog.nomic.ai/posts/nomic-em…

You can find the model on @huggingface:

The easiest way to use Nomic Embed in a managed service is through the Nomic Embedding API:

huggingface.co/nomic-ai/nomic…
docs.nomic.ai/reference/endp…

Model details:
- 137M parameters for easy deployment
- 5 days of 8xH100 time to train
- Code and data:
- Detailed Technical Report:
github.com/nomic-ai/contr…
static.nomic.ai/reports/2024_N…

Embedding evaluation is broken. Benchmarks like MTEB are not sufficient for capturing all aspects of model behavior.

You can discover systematic differences in model embedding spaces using Nomic Atlas

Comparing nomic-embed-text-v1 and OpenAI Ada 002 embeddings.
blog.nomic.ai/posts/nomic-em…

Day 1 Integrations:
- Build a RAG app with Nomic Embed, @MongoDB and @NousResearch:
- Build a fully open retriever with Nomic Embed and @llamaindex:

- Integrated with @langchain
blog.nomic.ai/posts/nomic-em…
medium.com/@llama_index
python.langchain.com/docs/integrati…

Native GPT4All Integration
Chat with your data locally powered by Nomic Embed.
gpt4all.io

We also launch the Nomic Embedding API
- 1M Free tokens!
- Production ready embedding inference API including task specific embedding customizations.
- Deep integration with Atlas Datasets
- New models incoming 👀

Sign up at atlas.nomic.ai

Integrations:

@LangChainAI integration:

@llama_index integration:

@MongoDB :
python.langchain.com/docs/integrati…
medium.com/@llama_index
blog.nomic.ai/posts/nomic-em…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @nomic_ai

Nomic AI

@nomic_ai

Feb 12

Nomic Embed Text V2 is now available

- First general purpose Mixture-of-Experts (MoE) embedding model
- SOTA performance on the multilingual MIRACL benchmark for its size
- Support for 100+ languages
- Truly open source - open training data, weights, & code
- Apache 2.0 License

Download the model on @huggingface:

huggingface.co/nomic-ai/nomic…

Check out the training code and data:

github.com/nomic-ai/contr…

Read the blog post:

nomic.ai/blog/posts/nom…

Why Mixture-of-Experts? It activates only a subset of model parameters during training and inference, encouraging only the most relevant model parameters to be used on inputs. This maintains strong performance on downstream tasks while cutting costs and memory usage.

Read 6 tweets

Nomic AI

@nomic_ai

Jun 5, 2024

Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision:

- a high quality, unified embedding space for image, text, and multimodal tasks
- outperforms both OpenAI CLIP and text-embedding-3-small
- open weights and code to enable indie hacking, research, and experimentation
- released in collaboration with @MongoDB, @llama_index, @LangChainAI, @huggingface, @awscloud, @digitalocean, @LambdaAPI

Existing text-image embedding models, including OpenAI’s CLIP, dramatically underperform specialized text encoders on text retrieval tasks. This forces developers to deploy several embedding models and store several vector indices for multimodal applications. With Nomic-Embed-Vision, developers can use a single vector space to power both their text-text and text-image retrieval tasks.

blog.nomic.ai/posts/nomic-em…

We’ve been honored by the reception of Nomic-Embed-Text, which has grown into one of the most downloaded models on @huggingface.

We designed Nomic-Embed-Vision to be compatible with Nomic-Embed-Text out of the box, making it easy for developers using Nomic-Embed-Text to extend their applications with multimodal features.

Put simply, any vector created using Nomic-Embed-Text can be used to query vectors created by Nomic-Embed-Vision, and vice versa.

Read 9 tweets

Nomic AI

@nomic_ai

Feb 26, 2024

Atlas Capability Announcement: Scalable Duplicate Detection 🍡

- Deduplicate your text, image and embedding datasets in your web browser.
- Scales to millions of datapoints (e.g. English Wikipedia)
- Cross correlate with real-time regex search and semantic lasso's.

Duplicate detection is a critical component to curating datasets for AI training.

Atlas is the only dataset platform that let's you perform this operation both in your web browser and programmatically.

Documentation: docs.nomic.ai/atlas/capabili…

When you store you data in an Atlas Dataset, it gets enhanced with attributes like duplicate clusters to support data curation workflows.

Atlas:
1. Embeds your text for your.
2. Stores your text, metadata and embeddings.
3. Enables you to export subsets that are de-duplicated.

Read 5 tweets

Nomic AI

@nomic_ai

Feb 14, 2024

Announcing Nomic Embed v1.5 🪆🪆🪆

- Variable sized embeddings with matryoshka learning and an 8192 context.
- Outperforms OpenAI text-embedding-3-small across output sizes.
- Open source, open training code, open data.

Day 0 in @LangChainAI, @llama_index and @MongoDB

Performance is critical for the production use of embeddings but what about the memory, storage, and bandwidth footprint of the vectors?

Nomic Embed v1.5 allows you to trade off memory footprint for performance all-in-one-model.
Blog:
blog.nomic.ai/posts/nomic-em…

You can use Nomic Embed v1.5 in production through the Nomic Embedding API or run the open-weights yourself.
docs.nomic.ai/reference/endp…

Read 10 tweets

Nomic AI

@nomic_ai

Jun 20, 2023

Local LLMs in GPT4All are now 2x faster on Apple Silicone ⚡
- Supports all LLaMa models
- Exclusive support of the Replit model for 23 tok/s code generation enabling local Copilot!

Watch the 13B parameter Hermes model run at 15 tok/s locally!
gpt4all.io

To make this possible, GPT4All hackers had to implement several custom Apple Metal kernels for LLM ops (e.g. Alibi) and support a custom fork llama.cpp!
Excited to get these changes upstream!
github.com/nomic-ai/llama…

Silicone support is ready for all GPT4All bindings:
- Python
- Typescript
- Golang
- Java
Start building with powerful, open-source and *fast* local LLMs!
Documentation: docs.gpt4all.io

Read 4 tweets

Nomic AI

@nomic_ai

Jun 1, 2023

Local LLMs now have plugins! 💥

GPT4All LocalDocs allows you chat with your private data!

- Drag and drop files into a directory that GPT4All will query for context when answering questions.
- Supports 40+ filetypes
- Cites sources.
gpt4all.io

LocalDocs enables any GPT4All model to cite its sources.

When GPT4All decides that it can improve response factuality by using your documents it does so and tells you which documents it used.

Install the universal local LLM client from gpt4all.io, go to settings and enable the plugin!

Documentation: docs.gpt4all.io/gpt4all_chat.h…

You will soon be able to use LocalDocs in server mode allowing you to easily augment any LLM with your private data.

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Nomic AI

Try unrolling a thread yourself!

More from @nomic_ai

Nomic AI

Nomic AI

Nomic AI

Nomic AI

Nomic AI

Nomic AI

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!