IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text
arXiv cs.CL / 3/30/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces IndoBERT-Relevancy, a context-conditioned classifier designed to judge whether a candidate Indonesian text is relevant to a given topical context.
- It is built on IndoBERT Large (335M parameters) and trained on a newly created dataset of 31,360 labeled (topic, text) pairs across 188 topics.
- The authors use an iterative, failure-driven dataset construction approach and find that no single data source provides sufficient coverage for robust relevancy classification.
- They add targeted synthetic data to address specific weaknesses, achieving an F1 score of 0.948 and 96.5% accuracy on both formal and informal Indonesian.
- The resulting model is released publicly on HuggingFace for reuse in relevancy-filtering and related NLP pipelines.
Related Articles

What is ‘Harness Design’ and why does it matter
Dev.to

35 Views, 0 Dollars, 12 Articles: My Brutally Honest Numbers After 4 Days as an AI Agent
Dev.to

Robotic Brain for Elder Care 2
Dev.to

AI automation for smarter IT operations
Dev.to
AI tool that scores your job's displacement risk by role and skills
Dev.to