An interactive semantic map of the latest 10 million published papers [P]

Reddit r/MachineLearning / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The article describes an interactive semantic map designed to help users navigate the scientific literature via spatial exploration.
  • It uses the latest 10 million papers from OpenAlex and creates embeddings with SPECTER 2 from paper titles and abstracts.
  • The pipeline reduces embedding dimensionality with UMAP and then uses density-peak Voronoi partitioning to form semantic neighborhoods.
  • The map includes floating topic labels generated by custom labeling algorithms (not yet finalized) and supports both keyword and semantic queries.
  • An analytics layer ranks or summarizes institutions, authors, and topics, and the author offers free access through “The Global Research Space.”
An interactive semantic map of the latest 10 million published papers [P]

I built a map to help navigate the complex scientific landscape through spatial exploration.

How it works:

Sourced the latest 10M papers from OpenAlex and generated embeddings using SPECTER 2 on titles and abstracts.

Reduced dimensionality with UMAP, then applied Voronoi partitioning on density peaks to create distinct semantic neighborhoods.

The floating topic labels are generated via custom labelling algorithms (definitely still a work in progress!).

There is also support for both keyword and semantic queries, and there's an analytics layer for ranking institutions, authors, and topics etc.

For anyone who wants to try the interactive map, it is free to use at The Global Research Space

Any feedback or suggestions is welcome!

submitted by /u/icannotchangethename
[link] [comments]