Latest Twitter Threads by @nomic_ai on Thread Reader App

Feb 12 • 6 tweets • 3 min read

Nomic Embed Text V2 is now available

- First general purpose Mixture-of-Experts (MoE) embedding model
- SOTA performance on the multilingual MIRACL benchmark for its size
- Support for 100+ languages
- Truly open source - open training data, weights, & code
- Apache 2.0 License

Download the model on @huggingface:

huggingface.co/nomic-ai/nomic…

Check out the training code and data:

github.com/nomic-ai/contr…

Read the blog post:

nomic.ai/blog/posts/nom…

Jun 5, 2024 • 9 tweets • 4 min read

Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision:

- a high quality, unified embedding space for image, text, and multimodal tasks
- outperforms both OpenAI CLIP and text-embedding-3-small
- open weights and code to enable indie hacking, research, and experimentation
- released in collaboration with @MongoDB, @llama_index, @LangChainAI, @huggingface, @awscloud, @digitalocean, @LambdaAPI

Existing text-image embedding models, including OpenAI’s CLIP, dramatically underperform specialized text encoders on text retrieval tasks. This forces developers to deploy several embedding models and store several vector indices for multimodal applications. With Nomic-Embed-Vision, developers can use a single vector space to power both their text-text and text-image retrieval tasks.

blog.nomic.ai/posts/nomic-em…

Feb 26, 2024 • 5 tweets • 2 min read

Atlas Capability Announcement: Scalable Duplicate Detection 🍡

- Deduplicate your text, image and embedding datasets in your web browser.
- Scales to millions of datapoints (e.g. English Wikipedia)
- Cross correlate with real-time regex search and semantic lasso's.

Duplicate detection is a critical component to curating datasets for AI training.

Atlas is the only dataset platform that let's you perform this operation both in your web browser and programmatically.

Documentation: docs.nomic.ai/atlas/capabili…

Feb 14, 2024 • 10 tweets • 4 min read

Announcing Nomic Embed v1.5 🪆🪆🪆

- Variable sized embeddings with matryoshka learning and an 8192 context.
- Outperforms OpenAI text-embedding-3-small across output sizes.
- Open source, open training code, open data.

Day 0 in @LangChainAI, @llama_index and @MongoDB

Performance is critical for the production use of embeddings but what about the memory, storage, and bandwidth footprint of the vectors?

Nomic Embed v1.5 allows you to trade off memory footprint for performance all-in-one-model.
Blog:
blog.nomic.ai/posts/nomic-em…

Feb 1, 2024 • 9 tweets • 5 min read

Introducing Nomic Embed - the first fully open long context text embedder to beat OpenAI

- Open source, open weights, open data
- Beats OpenAI text-embeding-3-small and Ada on short and long context benchmarks
- Day 1 integrations with @langchain, @llama-index, @MongoDB

Open source models are not replicable unless you have access to their training data.

We release our training dataset of 235M curated text pairs to enable anyone to replicate Nomic Embed from scratch.

Blog: blog.nomic.ai/posts/nomic-em…

Jun 20, 2023 • 4 tweets • 2 min read

Local LLMs in GPT4All are now 2x faster on Apple Silicone ⚡
- Supports all LLaMa models
- Exclusive support of the Replit model for 23 tok/s code generation enabling local Copilot!

Watch the 13B parameter Hermes model run at 15 tok/s locally!
gpt4all.io

To make this possible, GPT4All hackers had to implement several custom Apple Metal kernels for LLM ops (e.g. Alibi) and support a custom fork llama.cpp!
Excited to get these changes upstream!
github.com/nomic-ai/llama…

Jun 1, 2023 • 6 tweets • 3 min read

Local LLMs now have plugins! 💥

GPT4All LocalDocs allows you chat with your private data!

- Drag and drop files into a directory that GPT4All will query for context when answering questions.
- Supports 40+ filetypes
- Cites sources.
gpt4all.io

LocalDocs enables any GPT4All model to cite its sources.

When GPT4All decides that it can improve response factuality by using your documents it does so and tells you which documents it used.

May 16, 2023 • 5 tweets • 3 min read

Big New Release of GPT4All📶
You can now use local CPU-powered LLMs through a familiar API!

Building with a local LLM is as easy as a 1 line code change! Simply spin up the chat app at gpt4all.io and place it in server mode!

Documentation: docs.gpt4all.io/gpt4all_chat.h…

Your data never leaves your machine! The HTTP server runs on port 4981 (1984 in reverse)!

See it in action and own your large language models!

Nov 23, 2022 • 9 tweets • 3 min read

What are the latest research trends in AI?
Explore all NeurIPS submissions from 1987 to 2022 in Atlas.
atlas.nomic.ai/map/neurips

Learn how it works and how to make it yourself👇
Each point is an accepted abstract at NeurIPS between 1987 and 2022.
Clusters of points represent research topics. For example, all papers about graph neural networks are here:

Nov 1, 2022 • 5 tweets • 4 min read

How do people use #stablediffusion ?
Explore 6.4 million AI generated images from @krea_ai in Atlas.
atlas.nomic.ai/map/809ef16a-5…

@krea_ai How it works 👇
Every point is a user-generated image and its prompt.
Points are close together if an AI considers their images similar.

For example, Billionaires Row is a region containing co-located generations of @elonmusk , Jeff Bezos, Mark Zuckerburg and US dollars.

Share this page!

Enter URL or ID to Unroll