Prime Once, then Reprogram Locally: An Efficient Alternative to Black-Box Service Model Adaptation

arXiv cs.LG / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that adapting closed-box service APIs via Zeroth-Order Optimization (ZOO) is often inefficient, requiring many costly API calls and suffering from slow or unstable optimization, with extra difficulties appearing in modern APIs like GPT-4o.
  • It introduces AReS (Alternative efficient Reprogramming for Service models), which replaces continuous black-box optimization with a single-pass “priming” step that trains only a lightweight adapter on a local encoder.
  • After priming, AReS switches to a glass-box (white-box) reprogramming stage on the local proxy model, so adaptation and inference run locally and incur essentially no further API usage.
  • Experiments report strong improvements on GPT-4o, including a +27.8% gain over the zero-shot baseline where ZOO-based methods largely fail, plus gains across ten datasets (e.g., +2.5% for VLMs and +15.6% for standard VMs).
  • The method reduces API calls by over 99.99% while achieving state-of-the-art or better performance relative to prior approaches, positioning it as a practical alternative for API-based model adaptation.

Abstract

Adapting closed-box service models (i.e., APIs) for target tasks typically relies on reprogramming via Zeroth-Order Optimization (ZOO). However, this standard strategy is known for extensive, costly API calls and often suffers from slow, unstable optimization. Furthermore, we observe that this paradigm faces new challenges with modern APIs (e.g., GPT-4o). These models can be less sensitive to the input perturbations ZOO relies on, thereby hindering performance gains. To address these limitations, we propose an Alternative efficient Reprogramming approach for Service models (AReS). Instead of direct, continuous closed-box optimization, AReS initiates a single-pass interaction with the service API to prime an amenable local pre-trained encoder. This priming stage trains only a lightweight layer on top of the local encoder, making it highly receptive to the subsequent glass-box (white-box) reprogramming stage performed directly on the local model. Consequently, all subsequent adaptation and inference rely solely on this local proxy, eliminating all further API costs. Experiments demonstrate AReS's effectiveness where prior ZOO-based methods struggle: on GPT-4o, AReS achieves a +27.8% gain over the zero-shot baseline, a task where ZOO-based methods provide little to no improvement. Broadly, across ten diverse datasets, AReS outperforms state-of-the-art methods (+2.5% for VLMs, +15.6% for standard VMs) while reducing API calls by over 99.99%. AReS thus provides a robust and practical solution for adapting modern closed-box models.