Leland McInnes Profile picture
A mathematician dabbling in the world of data science. Researcher at the Tutte Institute for Mathematics and Computing. UMAP, HDBSCAN, PyNNDescent. He / Him.
Jun 23 8 tweets 3 min read
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.

Hover to see details, zoom to explore fine-grained topics, click to go to a page. Search page names to find interesting starting points for exploration.🧵

lmcinnes.github.io/datamapplot_ex… All of this is really just a tech-demo for the tools backing it: Toponymy for creating topics and topic labels, and DataMapPlot for creating the interactive visualizations.

github.com/TutteInstitute…
github.com/TutteInstitute…
Oct 9, 2024 8 tweets 3 min read
Datamapplot 0.4 is out now, and has far more powerful and effective interactive plots.
Here is an example of a Data Map of 2.4 million papers on ArXiv, ready to be explored. Performance on large datasets got a major overhaul, supporting million scale datasets with ease.
Datamapplot 0.4 also introduces new filtering and selection tools that integrate with existing search functionality.
Feb 22, 2024 9 tweets 4 min read
A major update for DataMapPlot adds interactive plots.
See for an example.
Let's dig in to what you can do with DatMapPlot 0.2 ... 🧵 lmcinnes.github.io/datamapplot_ex…


Given a data map and labels making rich interactive plots is easy. The ArXiv example above can be generated as follows: Image
Nov 10, 2022 8 tweets 3 min read
Ever needed a few more colours than the standard colour cycle for your plot? Ever wanted a categorical colour palette based around your own custom colours? With glasbey you can create and extend custom categorical colour palettes with ease.🧵 The glasbey library is on github: github.com/lmcinnes/glasb…
Documentation can be found on readthedocs: glasbey.readthedocs.org
And you can pip install it:

$ pip install glasbey
Jan 12, 2021 14 tweets 6 min read
The latest version of umap-learn is now out. Version 0.5 includes some major new features, including ParametricUMAP, DensMAP, AlignedUMAP, model composition, and model updating. Thank you to everyone who contributed! 1/14 ParametricUMAP uses a neural network to learn a UMAP embedding. This allows for a number of significant advantages. 2/14
Jan 5, 2020 6 tweets 4 min read
Pynndescent, an approximate nearest neighbor search library, got a major update recently. Index construction is now multicore by default. Querying is now much faster -- competitive with some of the fastest ANN libraries around.
(1/4) Performance in particularly strong for higher accuracy (>90%) queries.
(2/4)