Adaptive Engram Memory System for Indonesian Language Model: Generative AI Based on TOBA LM for Batak and Minang Language

arXiv cs.CL / 3/12/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

TOBA-LM is a 1.2B-parameter trilingual language model for Indonesian, Batak, and Minangkabau based on GPT-2 with syllabic-agglutinative tokenization.
It introduces an Engram Memory mechanism, an adaptive n-gram-based external memory with a 500,000 x 768 embedding table to capture morphological dependencies via bigram and trigram pathways.
The model demonstrates substantial training efficiency, reporting an 80% reduction in training requirements and a loss drop from 6.4 to 1.7996 in 12,973 steps, faster than a conventional transformer needing over 70,000 steps.
These results indicate that external statistical memory can substantially reduce computational needs for developing regional language models under resource constraints.

Abstract

This study presents TOBA-LM, a trilingual language model based on GPT-2 architecture with 1.2 billion parameters, trained on a corpus encompassing Indonesian, Batak, and Minangkabau using syllabic-agglutinative tokenization. The architecture integrates an Engram Memory mechanism, an adaptive n-gram-based memory system with a 500,000 x 768 embedding table that captures morphological dependencies through bigram and trigram pathways. Empirical results demonstrate a training efficiency of 80%, with the loss value dropping from 6.4 to 1.7996 in only 12,973 steps -- significantly faster than the conventional transformer architecture, which required over 70,000 steps to achieve comparable convergence. These findings confirm that the integration of external statistical memory substantially reduces computational requirements for developing regional language models under limited resources.

Run Claude Opus 4.6 via OpenAI-compatible API using your existing Pro/Max subscription

Dev.to

Jupyter AI Extension - Multi-LLM Support

Dev.to

Run Claude Opus 4.6 as an OpenAI-compatible API using your Pro/Max subscription ($0 extra)

Dev.to

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Dev.to

Top Web Development Trends in 2026

Dev.to

Adaptive Engram Memory System for Indonesian Language Model: Generative AI Based on TOBA LM for Batak and Minang Language

Key Points

Abstract

Related Articles

Run Claude Opus 4.6 via OpenAI-compatible API using your existing Pro/Max subscription

Jupyter AI Extension - Multi-LLM Support

Run Claude Opus 4.6 as an OpenAI-compatible API using your Pro/Max subscription ($0 extra)

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Top Web Development Trends in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer