Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection

arXiv cs.CL / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes a method to expand WordNet-style lexical resources to new languages by generating senses through dictionary-based cross-lingual semantic projection.
It uses a sense-tagged English corpus plus aligned translations to project English synsets onto aligned tokens in the target language and assign the corresponding lemmas.
To create high-quality alignments and reduce errors, the approach enhances a pre-trained base aligner with a bilingual dictionary and uses the dictionary to filter incorrect sense projections.
Experiments across multiple languages compare the method with earlier approaches and baselines, including dictionary-based and large language model systems, showing improved precision with interpretability and low external resource requirements.
The authors plan to release the code, documentation, and generated sense inventories to enable reuse and further evaluation.

Abstract

We study the task of automatically expanding WordNet-style lexical resources to new languages through sense generation. We generate senses by associating target-language lemmas with existing lexical concepts via semantic projection. Given a sense-tagged English corpus and its translation, our method projects English synsets onto aligned target-language tokens and assigns the corresponding lemmas to those synsets. To generate these alignments and ensure their quality, we augment a pre-trained base aligner with a bilingual dictionary, which is also used to filter out incorrect sense projections. We evaluate the method on multiple languages, comparing it to prior methods, as well as dictionary-based and large language model baselines. Results show that the proposed project-and-filter strategy improves precision while remaining interpretable and requiring few external resources. We plan to make our code, documentation, and generated sense inventories accessible.

FastAPI With LangChain and MongoDB

Dev.to

[Patterns] AI Agent Error Handling That Actually Works

Dev.to

Building ONNX Embedding Workflows in Oracle AI Database with Python

Dev.to

🌱 Green Habit Tracker

Dev.to

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Dev.to

Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection

Key Points

Abstract

Related Articles

FastAPI With LangChain and MongoDB

[Patterns] AI Agent Error Handling That Actually Works

Building ONNX Embedding Workflows in Oracle AI Database with Python

🌱 Green Habit Tracker

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer