Neural architectures for resolving references in program code

arXiv cs.LG / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies reference resolution and rewriting in programming code by modeling it as direct and indirect indexing by permutation, motivated by a real-world decompilation task.
It introduces synthetic benchmarks and finds that existing sequence-to-sequence architectures perform poorly on these indexing-focused tasks.
The authors propose new sequence-to-sequence neural architectures for both direct and indirect indexing, demonstrating improved robustness and scalability.
Experiments show the new models can process inputs about 10x longer than the strongest baseline while maintaining better performance.
In a real decompilation setting (switch-statement decompilation with an indexing subtask), the extended model reduces the error rate by 42%, and ablation studies confirm the necessity of all key components.

Abstract

Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended model decreases the error rate by 42%. Multiple ablation studies show that all components of our architectures are essential.