CalCo Profile picture
Nov 23, 2022 9 tweets 3 min read Read on X
What are the latest research trends in AI?
Explore all NeurIPS submissions from 1987 to 2022 in Atlas.
atlas.nomic.ai/map/neurips
Learn how it works and how to make it yourself👇
Each point is an accepted abstract at NeurIPS between 1987 and 2022.
Clusters of points represent research topics. For example, all papers about graph neural networks are here:
Atlas lets you interact with unstructured datasets over time. Filtering by submission year shows us how submissions to NeurIPS evolve:
80's and early 90's: Kernels, Speech Recognition and models of the brain.
90's-00's: RL, Clustering and Active Learning become popular.
2010's: Theory of DL, ConvNets, Causal Inference and Adversarial Attacks
2018-2022: Self supervised learning, Pruning, Bandit problems, 3D Deep learning
Searching the map for `transformer` surfaces the prevalence of the architecture across research topics:
Language models, Vision, Speech, 3D modeling, RL, EEG, Pruning and compression.
Make this map and others with 30 lines of Python:
github.com/nomic-ai/maps Image
Bonus: A map of just 2022 accepted submissions:
atlas.nomic.ai/map/neurips_20…
@hen_str Would love to see this as the paper explorer for the NeurIPS 2023 Virtual Conference :)
Huge thanks to @Yuvaaa___ for facilitating the data collection and cleaning this past summer while interning at Nomic!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with CalCo

CalCo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @calco_io

Jun 20, 2025
so exciting to get a chance to collaborate with @Wikipedia & @Wikimedia on the first full multilingual wikipedia map! even more excited that the entire pipeline (encoder, article vectors, and visualization method) is open source 🧵

enterprise.wikimedia.com/blog/nomic-ai-…
@Wikipedia is an incredible resource for both machine and human learning, but lacked the infrastructure to be fully utilized in open source. we wanted to change that.
@cohere was the first to make strides in this area, with their open dataset of simple-wiki embeddings. unfortunately, this dataset was neither comprehensive nor openly reproducible.

huggingface.co/datasets/Coher…
Read 8 tweets
Feb 12, 2025
Nomic Embed Text V2 is now available

- First general purpose Mixture-of-Experts (MoE) embedding model
- SOTA performance on the multilingual MIRACL benchmark for its size
- Support for 100+ languages
- Truly open source - open training data, weights, & code
- Apache 2.0 License Image
Download the model on @huggingface:

huggingface.co/nomic-ai/nomic…

Check out the training code and data:

github.com/nomic-ai/contr…

Read the blog post:

nomic.ai/blog/posts/nom…
Why Mixture-of-Experts? It activates only a subset of model parameters during training and inference, encouraging only the most relevant model parameters to be used on inputs. This maintains strong performance on downstream tasks while cutting costs and memory usage. Image
Read 6 tweets
Jun 5, 2024
Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision:

- a high quality, unified embedding space for image, text, and multimodal tasks
- outperforms both OpenAI CLIP and text-embedding-3-small
- open weights and code to enable indie hacking, research, and experimentation
- released in collaboration with @MongoDB, @llama_index, @LangChainAI, @huggingface, @awscloud, @digitalocean, @LambdaAPI
Existing text-image embedding models, including OpenAI’s CLIP, dramatically underperform specialized text encoders on text retrieval tasks. This forces developers to deploy several embedding models and store several vector indices for multimodal applications. With Nomic-Embed-Vision, developers can use a single vector space to power both their text-text and text-image retrieval tasks.

blog.nomic.ai/posts/nomic-em…Image
We’ve been honored by the reception of Nomic-Embed-Text, which has grown into one of the most downloaded models on @huggingface.

We designed Nomic-Embed-Vision to be compatible with Nomic-Embed-Text out of the box, making it easy for developers using Nomic-Embed-Text to extend their applications with multimodal features.

Put simply, any vector created using Nomic-Embed-Text can be used to query vectors created by Nomic-Embed-Vision, and vice versa.Image
Read 9 tweets
Feb 26, 2024
Atlas Capability Announcement: Scalable Duplicate Detection 🍡

- Deduplicate your text, image and embedding datasets in your web browser.
- Scales to millions of datapoints (e.g. English Wikipedia)
- Cross correlate with real-time regex search and semantic lasso's.
Duplicate detection is a critical component to curating datasets for AI training.

Atlas is the only dataset platform that let's you perform this operation both in your web browser and programmatically.

Documentation: docs.nomic.ai/atlas/capabili…
When you store you data in an Atlas Dataset, it gets enhanced with attributes like duplicate clusters to support data curation workflows.

Atlas:
1. Embeds your text for your.
2. Stores your text, metadata and embeddings.
3. Enables you to export subsets that are de-duplicated.
Read 5 tweets
Feb 14, 2024
Announcing Nomic Embed v1.5 🪆🪆🪆

- Variable sized embeddings with matryoshka learning and an 8192 context.
- Outperforms OpenAI text-embedding-3-small across output sizes.
- Open source, open training code, open data.

Day 0 in @LangChainAI, @llama_index and @MongoDB
Performance is critical for the production use of embeddings but what about the memory, storage, and bandwidth footprint of the vectors?

Nomic Embed v1.5 allows you to trade off memory footprint for performance all-in-one-model.
Blog:
blog.nomic.ai/posts/nomic-em…
You can use Nomic Embed v1.5 in production through the Nomic Embedding API or run the open-weights yourself.
docs.nomic.ai/reference/endp…
Image
Read 10 tweets
Feb 1, 2024
Introducing Nomic Embed - the first fully open long context text embedder to beat OpenAI

- Open source, open weights, open data
- Beats OpenAI text-embeding-3-small and Ada on short and long context benchmarks
- Day 1 integrations with @langchain, @llama-index, @MongoDB
Open source models are not replicable unless you have access to their training data.

We release our training dataset of 235M curated text pairs to enable anyone to replicate Nomic Embed from scratch.

Blog: blog.nomic.ai/posts/nomic-em…
Image
You can find the model on @huggingface:


The easiest way to use Nomic Embed in a managed service is through the Nomic Embedding API:

huggingface.co/nomic-ai/nomic…
docs.nomic.ai/reference/endp…
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(