Finding Duplicates in 1.1M BDD Steps: cukereuse, a Paraphrase-Robust Static Detector for Cucumber and Gherkin
arXiv cs.CL / 4/23/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces cukereuse, an open-source, purely static (no test execution) Python CLI to detect duplicate Cucumber/Gherkin steps in any repository while being robust to paraphrases.
- cukereuse uses a layered approach combining exact hashing, Levenshtein ratio, and sentence-transformer embeddings, and the release includes a large corpus (347 GitHub repos, 23,667 .feature files, 1,113,616 Gherkin steps).
- The authors quantify duplication prevalence, finding a step-weighted exact-duplicate rate of 80.2% and a median per-repository exact-duplicate rate of 58.6%, with a largest hybrid cluster containing 20.7k occurrences across 2.2k files.
- Evaluation against 1,020 manually labeled step pairs (high inter-annotator agreement, Fleiss’ kappa = 0.84) reports strong pair-level performance, with the best near-exact F1 reaching about 0.822 and semantic F1 about 0.906 under the primary rubric, noting an inflation artefact affecting recall.
- Beyond detection, the paper provides a CDN-style critique of Gherkin, arguing that most cognitive dimensions are problematic or not supported, and releases the tool, corpus, labels, rubric, and pipeline under permissive licenses.
Related Articles

Black Hat USA
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Elevating Austria: Google invests in its first data center in the Alps.
Google Blog

10 AI Tools Every Developer Should Try in 2026
Dev.to