IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages
arXiv cs.CL / 4/16/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces the IWLV Ramayana Corpus, a sarga (chapter)-aligned parallel dataset of Valmiki’s Ramayana across multiple Indian languages.
- It currently offers complete English and Malayalam layers, with Hindi, Tamil, Kannada, and Telugu layers actively being produced.
- The corpus is released in structured JSONL and includes explicit provenance metadata to support traceability and scholarly reuse.
- The authors position the dataset for comparative literature, corpus linguistics, digital humanities, and multilingual NLP applications.
- They claim it is the first sarga-aligned multilingual parallel corpus for the Valmiki Ramayana with machine-readable format and provenance metadata.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

10 ChatGPT Prompts Every Genetic Counselor Should Be Using in 2025
Dev.to

The Memory Wall Can't Be Killed — 3 Papers Proving Every Architecture Hits It
Dev.to