Neural architectures for resolving references in program code

arXiv cs.LG / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies reference resolution and rewriting in programming code by modeling it as direct and indirect indexing by permutation, motivated by a real-world decompilation task.
  • It introduces synthetic benchmarks and finds that existing sequence-to-sequence architectures perform poorly on these indexing-focused tasks.
  • The authors propose new sequence-to-sequence neural architectures for both direct and indirect indexing, demonstrating improved robustness and scalability.
  • Experiments show the new models can process inputs about 10x longer than the strongest baseline while maintaining better performance.
  • In a real decompilation setting (switch-statement decompilation with an indexing subtask), the extended model reduces the error rate by 42%, and ablation studies confirm the necessity of all key components.

Abstract

Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended model decreases the error rate by 42%. Multiple ablation studies show that all components of our architectures are essential.