Aligning Large Language Models with Searcher Preferences
arXiv cs.CL / 3/12/2026
📰 NewsModels & Research
Key Points
- The paper introduces SearchLLM, the first large language model designed for open-ended generative search, shifting from item-centric ranking to answer-centric synthesis.
- It proposes a hierarchical, multi-dimensional reward system that separates factual grounding, answer quality, and format compliance from behavior objectives like robustness to noisy retrieval and user alignment, yielding an interpretable score vector.
- A Gated Aggregation Strategy is presented to derive training rewards for optimizing SearchLLM using Group Relative Policy Optimization (GRPO).
- Deployment in RedNote’s AI search entry shows offline and online AB-test improvements in user engagement, with a 1.03% increase in Valid Consumption Rate and a 2.81% reduction in Re-search Rate, while maintaining strict safety standards.




![[Boost]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D800%252Cheight%3D%252Cfit%3Dscale-down%252Cgravity%3Dauto%252Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Fuser%252Fprofile_image%252F3833034%252F44fa15e0-8eb9-4843-a424-a4a7b3538f43.jpeg&w=3840&q=75)