QU-NLP at ArchEHR-QA 2026: Two-Stage QLoRA Fine-Tuning of Qwen3-4B for Patient-Oriented Clinical Question Answering and Evidence Sentence Alignment
arXiv cs.CL / 4/17/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The QU-NLP team proposes a unified model for ArchEHR-QA 2026 that tackles both answer generation and evidence sentence alignment using a single end-to-end system design.
- For answer generation (Subtask 3), they fine-tune Qwen3-4B with a two-stage quantized LoRA (QLoRA) pipeline: first on 30,000 emrQA-MedSQuAD samples for clinical-domain adaptation, then on 20 annotated development cases for task-specific output style.
- The resulting system scores 32.87 overall on the official test-2026 split for Subtask 3, with reported metrics including BLEU 9.42, ROUGE-L 27.04, and BERTScore 43.00.
- For evidence alignment (Subtask 4), they combine three retrieval approaches (BM25 with relative thresholding, TF-IDF cosine similarity, and a fine-tuned cross-encoder) into a weighted ensemble, reaching micro-F1 67.16 on a 100-case test set.
- Their experiments suggest the core limitation is that 20 annotated training cases are not enough to reliably separate relevant from irrelevant clinical sentences, making data augmentation the most promising next step.
Related Articles
langchain-anthropic==1.4.1
LangChain Releases

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development
Dev.to

Stop burning tokens on DOM noise: a Playwright MCP optimizer layer
Dev.to

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs
Dev.to

AI Will Run Companies. Here's Why That Should Excite You, Not Scare You.
Dev.to