FLARE: Task-agnostic embedding model evaluation through a normalization process

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces FLARE, a task-agnostic embedding model evaluation method for cases where task-specific labels are unavailable.
  • FLARE estimates “information sufficiency” using a normalization/flow-based procedure based on log-likelihood, avoiding unstable, distance- or density-based estimators common in high-dimensional settings.
  • The authors provide a finite-sample error boundary showing that estimation error is driven by the intrinsic dimension of the data manifold rather than the embedding dimension.
  • Experiments across 11 datasets and 8 embedders show FLARE achieving a high Spearman correlation (ρ = 0.90) on a supervised benchmark and maintaining stability even for very high-dimensional embeddings (d ≥ 3,584), unlike prior labelless baselines.

Abstract

When task-specific labels are not available, it becomes difficult to select an embedding model for a specific target corpus. Existing labelless measures based on kernel estimators or Gaussian mixes fail in high-dimensional space, resulting in unstable rankings. We propose a flow-based labelless representation embedding evaluation (FLARE), which utilizes normalized streams to estimate information sufficiency directly from log-likelihood and avoid distance-based density estimation. We give a finite sample boundary, indicating that the estimation error depends on the intrinsic dimension of the data manifold rather than the original embedding dimension. On 11 datasets and 8 embedders, FLARE reached Spearman's \rho of 0.90 under the supervised benchmark and remained stable in high-dimensional embeddings (d \geq 3{,}584) as the existing labelless baseline collapsed.