H-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversations

arXiv cs.CL / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces H-RAG, a hierarchical parent-child retrieval pipeline submitted to SemEval-2026 Task 8 (MTRAGEval), covering both retrieval quality (Task A) and multi-turn RAG generation with evidence grounding (Task C).
H-RAG splits documents into overlapping sentence-based “child” chunks for fine-grained retrieval, while retaining full documents as “parent” units to reconstruct coherent context during generation.
The retrieval stage uses a hybrid dense-sparse search with tunable weighting plus embedding-similarity rescoring over child chunks, then aggregates retrieved evidence at the parent level for the language model.
Reported results show nDCG@5 of 0.4271 on Task A and a harmonic mean of 0.3241 on Task C, highlighting that retrieval configuration and parent-level evidence aggregation are critical for multi-turn RAG performance.

Abstract

We present H-RAG, our submission to SemEval-2026 Task 8 (MTRAGEval), addressing both Task A (Retrieval) and Task C (Generation with Retrieved Passages). Task A evaluates standalone retrieval quality, while Task C assesses end-to-end retrieval-augmented generation (RAG) in multi-turn conversational settings, requiring both accurate answer generation and faithful grounding in retrieved evidence. Our approach implements a hierarchical parent-child RAG pipeline that separates fine-grained child-level retrieval from parent-level context reconstruction during generation. Documents are segmented into overlapping sentence-based child chunks, while full documents are preserved as parent units to provide coherent context. Retrieval combines hybrid dense-sparse search, tunable weighting, and embedding-based similarity rescoring over child chunks. Retrieved evidence is aggregated at the parent level and supplied to an instruction-tuned language model for response generation. H-RAG achieves an nDCG@5 score of 0.4271 on Task A and a harmonic mean score of 0.3241 on Task C (RB_agg: 0.2488, RL_F: 0.2703, RB_llm: 0.6508), underscoring the importance of retrieval configuration and parent-level aggregation in multi-turn RAG performance.

A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

Reddit r/LocalLLaMA

ALM on Power Platform: ADO + GitHub, the best of both worlds

Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Dev.to

Find 12 high-volume, low-competition GEO content topics Topify.ai should rank on

Dev.to

When a memorized rule fits your bug too well: a meta-trap of agent workflows

Dev.to

H-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversations

Key Points

Abstract

Related Articles

A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

ALM on Power Platform: ADO + GitHub, the best of both worlds

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Find 12 high-volume, low-competition GEO content topics Topify.ai should rank on

When a memorized rule fits your bug too well: a meta-trap of agent workflows

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer