What Do Your Logits Know? (The Answer May Surprise You!)
Apple Machine Learning Journal / 4/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Recent research shows that probing neural network internals can expose information that isn’t visible from the model’s outputs, creating risks of unintentional or malicious information leakage.
- This paper uses vision-language models to systematically compare how much information is retained when it is compressed at different “representational levels,” starting from the residual stream.
- The analysis examines information passing through two natural bottlenecks: low-dimensional projections and attention-based pooling/aggregation mechanisms.
- The results suggest that seemingly “compressed” representations (e.g., logits-related signals) can still preserve substantial recoverable information, potentially surprising model owners and evaluators.
- The work highlights a need for stronger privacy and security assumptions when deploying models, since internal-feature leakage may occur even when generations look safe.
Recent work has shown that probing model internals can reveal a wealth of information not apparent from the model generations. This poses the risk of unintentional or malicious information leakage, where model users are able to learn information that the model owner assumed was inaccessible. Using vision-language models as a testbed, we present the first systematic comparison of information retained at different “representational levels” as it is compressed from the rich information encoded in the residual stream through two natural bottlenecks: low-dimensional projections of the residual…
Continue reading this article on the original site.
Read original →Related Articles

Adobe Just Made MCP an Enterprise Procurement Line Item
Dev.to
Explainable Causal Reinforcement Learning for precision oncology clinical workflows in hybrid quantum-classical pipelines
Dev.to

AI Photo Captions for Instagram: Stop Staring at the Blank Box
Dev.to

Image-to-Prompt: Reverse-Engineering AI Art in 2026
Dev.to

How to Write Alt Text with AI in 2026 (WCAG-Compliant Examples)
Dev.to