Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

arXiv cs.CL / 4/22/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper presents Bangla Key2Text, a large-scale dataset containing 2.6M Bangla keyword–text pairs for keyword-driven text generation in a low-resource setting.
The dataset is built from millions of Bangla news articles using a BERT-based keyword extraction pipeline to convert raw articles into supervised training examples.
The authors fine-tune two sequence-to-sequence models, mT5 and BanglaT5, to create baselines on this new benchmark.
Results indicate that task-specific fine-tuning significantly improves keyword-conditioned generation in Bangla versus zero-shot large language models.
The dataset, trained models, and code are released publicly to enable further research on Bangla NLG and keyword-to-text generation.

Abstract

This paper introduces \textit{Bangla Key2Text}, a large-scale dataset of

2.6

million Bangla keyword--text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword--text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, \texttt{mT5} and \texttt{BanglaT5}, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language generation and keyword-to-text generation tasks.

Autoencoders and Representation Learning in Vision

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Now Meta will track what employees do on their computers to train its AI agents

The Verge

Context Bloat in AI Agents

Dev.to

We open sourced the AI dev team that builds our product

Dev.to

Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

Key Points

Abstract

Related Articles

Autoencoders and Representation Learning in Vision

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Now Meta will track what employees do on their computers to train its AI agents

Context Bloat in AI Agents

We open sourced the AI dev team that builds our product

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer