Aligning Large Language Models with Searcher Preferences

arXiv cs.CL / 3/12/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces SearchLLM, the first large language model designed for open-ended generative search, shifting from item-centric ranking to answer-centric synthesis.
It proposes a hierarchical, multi-dimensional reward system that separates factual grounding, answer quality, and format compliance from behavior objectives like robustness to noisy retrieval and user alignment, yielding an interpretable score vector.
A Gated Aggregation Strategy is presented to derive training rewards for optimizing SearchLLM using Group Relative Policy Optimization (GRPO).
Deployment in RedNote’s AI search entry shows offline and online AB-test improvements in user engagement, with a 1.03% increase in Valid Consumption Rate and a 2.81% reduction in Re-search Rate, while maintaining strict safety standards.

Abstract

The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress has applied generative techniques to closed-set item ranking in e-commerce, research and deployment of open-ended generative search on large content platforms remain limited. This setting introduces challenges, including robustness to noisy retrieval, non-negotiable safety guarantees, and alignment with diverse user needs. In this work, we introduce SearchLLM, the first large language model (LLM) for open-ended generative search. We design a hierarchical, multi-dimensional reward system that separates bottom-line constraints, including factual grounding, basic answer quality and format compliance, from behavior optimization objectives that promote robustness to noisy retrieval and alignment with user needs. Concretely, our reward model evaluates responses conditioned on the user query, session history, and retrieved evidence set, combining rule-based checks with human-calibrated LLM judges to produce an interpretable score vector over these dimensions. We introduce a Gated Aggregation Strategy to derive the training reward for optimizing SearchLLM with Group Relative Policy Optimization (GRPO). We deploy SearchLLM in the AI search entry of RedNote. Offline evaluations and online A/B tests show improved generation quality and user engagement, increasing Valid Consumption Rate by 1.03% and reducing Re-search Rate by 2.81%, while upholding strict safety and reliability standards.

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Reddit r/artificial

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Dev.to

Memristor demonstrates use in fully analog hardware-based neural network

Reddit r/artificial

Understanding Seq2Seq Neural Networks – Part 8: When Does the Decoder Stop?

Dev.to

v0.18.3

Ollama Releases

Aligning Large Language Models with Searcher Preferences

Key Points

Abstract

Related Articles

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Memristor demonstrates use in fully analog hardware-based neural network

Understanding Seq2Seq Neural Networks – Part 8: When Does the Decoder Stop?

v0.18.3

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer