Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing
arXiv cs.CL / 5/5/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper examines how controlled, paraphrase-like semantic variations are organized locally within sentence embedding space, focusing on the geometry of embedding “clouds.”
- It proposes local geometric modeling using fitted low-degree carrier functions (affine, quadratic, and cubic), and introduces a surface-based latent probing method that generates synthetic latent points in a reduced local PCA space.
- The generated points are evaluated for fidelity to the fitted surface, preservation of neighborhood structure, agreement with the empirical embedding distribution, and stability of second-order (Hessian) shape descriptors and fitted coefficients.
- Experiments indicate that nonlinear local models (quadratic/cubic) capture embedding clouds more accurately than affine models, with surface-based generation achieving strong geometric consistency (including Hessian and coefficient consistency).
- However, downstream tasks show that geometric validity of synthetic latent points does not necessarily improve classification performance, emphasizing the difference between “geometry-aware validity” and “discriminative utility,” and the work also releases the CoPaGE-300K dataset.
Related Articles

Backed by Y Combinator and 20 unicorn founders, Moritz lands $9M
Tech.eu

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to