LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines

arXiv cs.CL / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a tradeoff in text classification by combining pretrained language model (PLM) semantic strength with the interpretability of Tsetlin Machines (TMs).
It introduces an LLM-guided semantic bootstrapping pipeline where, for each class label, an LLM generates sub-intents that drive synthetic data creation via a three-stage curriculum (seed, core, enriched).
A Non-Negated Tsetlin Machine (NTM) is trained to extract high-confidence, interpretable literals that serve as semantic cues derived from the LLM.
By injecting these learned cues into real data, the TM can better align clause-level logic with LLM-inferred semantics without needing embeddings or runtime LLM calls.
Experiments across multiple text classification tasks show improved interpretability and accuracy over vanilla TMs, reaching performance comparable to BERT while remaining fully symbolic and efficient.

Abstract

Pretrained language models (PLMs) like BERT provide strong semantic representations but are costly and opaque, while symbolic models such as the Tsetlin Machine (TM) offer transparency but lack semantic generalization. We propose a semantic bootstrapping framework that transfers LLM knowledge into symbolic form, combining interpretability with semantic capacity. Given a class label, an LLM generates sub-intents that guide synthetic data creation through a three-stage curriculum (seed, core, enriched), expanding semantic diversity. A Non-Negated TM (NTM) learns from these examples to extract high-confidence literals as interpretable semantic cues. Injecting these cues into real data enables a TM to align clause logic with LLM-inferred semantics. Our method requires no embeddings or runtime LLM calls, yet equips symbolic models with pretrained semantic priors. Across multiple text classification tasks, it improves interpretability and accuracy over vanilla TM, achieving performance comparable to BERT while remaining fully symbolic and efficient.

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Dev.to

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

Reddit r/MachineLearning

How AI Interview Assistants Are Changing Job Preparation in 2026

Dev.to

Consciousness in Artificial Intelligence: Insights from the Science ofConsciousness

Dev.to

NEW PROMPT INJECTION

Dev.to

LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines

Key Points

Abstract

Related Articles

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

How AI Interview Assistants Are Changing Job Preparation in 2026

Consciousness in Artificial Intelligence: Insights from the Science ofConsciousness

NEW PROMPT INJECTION

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer