[P] citracer: a small CLI tool to trace where a concept comes from in a citation graph

Reddit r/MachineLearning / 4/9/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The citracer CLI tool helps trace where an idea in a research PDF originates by parsing the bibliography with GROBID, finding references cited near keyword occurrences in the text, and recursively expanding the citation graph.
It can download and include papers from arXiv or OpenReview (when available) and outputs an interactive HTML visualization of the citation chains.
A “reverse” mode uses Semantic Scholar citation contexts to locate papers that cite a given work in connection with a specific keyword, without downloading PDFs.
The author notes limitations including GROBID domain coverage (stronger for ML/CS), incomplete Semantic Scholar citation-context coverage, and slower runs without a free Semantic Scholar API key due to rate limits.
The project is positioned as a narrowly focused alternative to broader literature-graph tools, and the author invites contributions via bug reports, parser fixes, edge-case handling, and documentation improvements.

Hi all, I made a small tool that I've been using for my own literature reviews and figured I'd share in case it's useful to anyone else.

It takes a research PDF and a keyword, parses the bibliography with GROBID, finds the references that are cited near each occurrence of the keyword in the text, downloads those papers when they're on arXiv or OpenReview, and recursively walks the resulting graph. The output is an interactive HTML visualization.

There's also a "reverse" mode that uses Semantic Scholar's citation contexts endpoint to find papers citing a given work specifically about a keyword, without downloading any PDFs.

Short demo (2 min): https://youtu.be/0VxWgaKixSI

I built it because I was spending too much time clicking through Google Scholar to figure out which paper introduced a particular idea I'd seen mentioned in passing. It's not a replacement for tools like Connected Papers or Inspire HEP — those answer different questions. This one is narrowly focused on "show me the citations of this PDF that mention X".

Some honest caveats: - It depends on GROBID for parsing, which works well on ML/CS papers but can struggle on other domains. - The reverse mode relies entirely on Semantic Scholar's coverage and citation contexts, which aren't always complete. - Without a free Semantic Scholar API key, things get noticeably slower due to rate limiting. - It's a personal project, so expect rough edges.

The project is still very young and I'm pretty sure it'll only get more useful as it evolves. If anyone is interested in contributing — bug reports, edge cases, parser fixes, new features, doc improvements, anything — it would genuinely be welcome. PRs and issues open.

Repo: https://github.com/marcpinet/citracer PyPI: https://pypi.org/project/citracer/

If you try it on a paper you care about, I'd love to hear whether the chains it produces make sense.

submitted by /u/Roux55
[link] [comments]