Structured Exploration and Exploitation of Label Functions for Automated Data Annotation

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the challenge of costly, error-prone manual annotation by using label functions (heuristic rules) to generate weak labels automatically for ML training.
It argues that prior automated label-function generation methods can suffer from limited coverage and unreliable quality, especially when relying on surface-level LLM heuristics or constrained primitive-based synthesis.
The proposed EXPONA framework treats LF generation as a structured process that balances diversity (exploring multi-level LFs across surface, structural, and semantic views) with reliability (suppressing noisy or redundant heuristics).
Experiments on eleven classification datasets show EXPONA achieves up to 98.9% label coverage, improves weak label quality by up to 87%, and improves downstream weighted F1 by up to 46% versus state-of-the-art methods.
Overall, the results suggest that multi-level exploration plus reliability-aware filtering can produce more consistent weak-label sets and better downstream task performance across diverse domains.

Abstract

High-quality labeled data is critical for training reliable machine learning and deep learning models, yet manual annotation remains costly and error-prone. Programmatic labeling addresses this challenge by using label functions (LFs), i.e., heuristic rules that automatically generate weak labels for training datasets. However, existing automated LF generation methods either rely on large language models (LLMs) to synthesize surface-level heuristics or employ model-based synthesis over hand-crafted primitives. These approaches often result in limited coverage and unreliable label quality. In this paper, we introduce EXPONA, an automated framework for programmatic labeling that formulates LF generation as a principled process balancing diversity and reliability. EXPONA systematically explores multi-level LFs, spanning surface, structural, and semantic perspectives. EXPONA further applies reliability-aware mechanisms to suppress noisy or redundant heuristics while preserving complementary signals. To evaluate EXPONA, we conducted extensive experiments on eleven classification datasets across diverse domains. Experimental results show that EXPONA consistently outperformed state-of-the-art automated LF generation methods. Specifically, EXPONA achieved nearly complete label coverage (up to 98.9%), improved weak label quality by up to 87%, and yielded downstream performance gains of up to 46% in weighted F1. These results indicate that EXPONA's combination of multi-level LF exploration and reliability-aware filtering enabled more consistent label quality and downstream performance across diverse tasks by balancing coverage and precision in the generated LF set.