A mathematician dabbling in the world of data science. Researcher at the Tutte Institute for Mathematics and Computing. UMAP, HDBSCAN, PyNNDescent. He / Him.
Jun 23 • 8 tweets • 3 min read
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.
Hover to see details, zoom to explore fine-grained topics, click to go to a page. Search page names to find interesting starting points for exploration.🧵
lmcinnes.github.io/datamapplot_ex…
All of this is really just a tech-demo for the tools backing it: Toponymy for creating topics and topic labels, and DataMapPlot for creating the interactive visualizations.
Datamapplot 0.4 is out now, and has far more powerful and effective interactive plots.
Here is an example of a Data Map of 2.4 million papers on ArXiv, ready to be explored.
Performance on large datasets got a major overhaul, supporting million scale datasets with ease.
Datamapplot 0.4 also introduces new filtering and selection tools that integrate with existing search functionality.
Feb 22, 2024 • 9 tweets • 4 min read
A major update for DataMapPlot adds interactive plots.
See for an example.
Let's dig in to what you can do with DatMapPlot 0.2 ... 🧵 lmcinnes.github.io/datamapplot_ex…
Given a data map and labels making rich interactive plots is easy. The ArXiv example above can be generated as follows:
Nov 10, 2022 • 8 tweets • 3 min read
Ever needed a few more colours than the standard colour cycle for your plot? Ever wanted a categorical colour palette based around your own custom colours? With glasbey you can create and extend custom categorical colour palettes with ease.🧵
The glasbey library is on github: github.com/lmcinnes/glasb…
Documentation can be found on readthedocs: glasbey.readthedocs.org
And you can pip install it:
$ pip install glasbey
Jan 12, 2021 • 14 tweets • 6 min read
The latest version of umap-learn is now out. Version 0.5 includes some major new features, including ParametricUMAP, DensMAP, AlignedUMAP, model composition, and model updating. Thank you to everyone who contributed! 1/14
ParametricUMAP uses a neural network to learn a UMAP embedding. This allows for a number of significant advantages. 2/14
Jan 5, 2020 • 6 tweets • 4 min read
Pynndescent, an approximate nearest neighbor search library, got a major update recently. Index construction is now multicore by default. Querying is now much faster -- competitive with some of the fastest ANN libraries around.
(1/4)
Performance in particularly strong for higher accuracy (>90%) queries.
(2/4)