Three Models of RLHF Annotation: Extension, Evidence, and Authority
arXiv cs.CL / 4/29/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper analyzes Reinforcement Learning with Human Feedback (RLHF) by making explicit the usually-implicit normative role of human annotators’ judgments.
- It proposes three conceptual models for how annotators influence system outputs: extension (extending designers’ judgments), evidence (providing independent facts), and authority (acting with independent standing to decide outputs).
- The author argues that RLHF pipeline design should differ depending on which model best fits each annotation dimension, including solicitation, validation, and aggregation strategies.
- By surveying key RLHF-related literature, the paper shows that many approaches implicitly combine these models and can fail when they conflate them.
- The central recommendation is to decompose RLHF annotation into separable dimensions and build tailored, dimension-specific pipelines instead of using one unified annotation pipeline.
Related Articles
LLMs will be a commodity
Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring
Dev.to