Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection

arXiv cs.CL / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes a method to expand WordNet-style lexical resources to new languages by generating senses through dictionary-based cross-lingual semantic projection.
  • It uses a sense-tagged English corpus plus aligned translations to project English synsets onto aligned tokens in the target language and assign the corresponding lemmas.
  • To create high-quality alignments and reduce errors, the approach enhances a pre-trained base aligner with a bilingual dictionary and uses the dictionary to filter incorrect sense projections.
  • Experiments across multiple languages compare the method with earlier approaches and baselines, including dictionary-based and large language model systems, showing improved precision with interpretability and low external resource requirements.
  • The authors plan to release the code, documentation, and generated sense inventories to enable reuse and further evaluation.

Abstract

We study the task of automatically expanding WordNet-style lexical resources to new languages through sense generation. We generate senses by associating target-language lemmas with existing lexical concepts via semantic projection. Given a sense-tagged English corpus and its translation, our method projects English synsets onto aligned target-language tokens and assigns the corresponding lemmas to those synsets. To generate these alignments and ensure their quality, we augment a pre-trained base aligner with a bilingual dictionary, which is also used to filter out incorrect sense projections. We evaluate the method on multiple languages, comparing it to prior methods, as well as dictionary-based and large language model baselines. Results show that the proposed project-and-filter strategy improves precision while remaining interpretable and requiring few external resources. We plan to make our code, documentation, and generated sense inventories accessible.