ReHARK: Refined Hybrid Adaptive RBF Kernels for Robust One-Shot Vision-Language Adaptation
arXiv cs.CV / 3/13/2026
📰 NewsModels & Research
Key Points
- The paper addresses the stability-plasticity trade-off in adapting large-scale vision-language models to downstream tasks with extremely limited data, highlighting limitations of prior training-free methods that rely on local estimators.
- ReHARK reinterprets few-shot adaptation as global proximal regularization in a reproducing kernel Hilbert space (RKHS) and introduces a training-free, multistage refinement pipeline to improve robustness.
- The pipeline includes Hybrid Prior Construction (fusing zero-shot textual knowledge from CLIP and GPT-3 with visual class prototypes), Support Set Augmentation (bridging), Adaptive Distribution Rectification, and Multi-Scale RBF Kernels.
- On 11 benchmarks, it achieves an average accuracy of 65.83%, setting a new state-of-the-art for one-shot vision-language adaptation, with code released at GitHub for practical adoption.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA