eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem

arXiv cs.LG / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces eDySec, a deep learning-based, explainable framework for dynamically analyzing PyPI packages to detect next-generation supply chain malware behaviors.
  • It targets the difficulties faced by traditional ML detectors, where high-dimensional and sparse dynamic signals (e.g., system calls, network traffic, directory access, and dependency logs) reduce accuracy, stability, and interpretability.
  • Using the QUT-DV25 dataset covering both install-time and post-installation behaviors, the authors evaluate DL models and feature sets to find the most discriminative attributes for efficient detection.
  • eDySec is designed for operational reliability and transparency, incorporating model stability analysis and explainable AI to produce more stable, interpretable decisions.
  • Experimental results report strong gains over prior work, including halving feature dimensionality and reducing false positives by 82% and false negatives by 79%, with about 170ms inference latency per package and near-perfect stability.

Abstract

The security of open-source software repositories is increasingly threatened by next-gen software supply chain attacks. These attacks include multiphase malware execution, remote access activation, and dynamic payload generation. Traditional Machine Learning (ML) detectors struggle to detect these attacks due to the high-dimensional and sparse nature of dynamic behavioral data, including system calls, network traffic, directory access patterns, and dependency logs. As a result, these data characteristics degrade the performance, stability, and explainability of ML models. These challenges have made Deep Learning (DL) a promising alternative, given its success across various domains and its potential for modeling complex patterns. This paper presents eDySec, a DL-based efficient, stable, and explainable framework for dynamic behavioral analysis to detect malicious packages. Using the QUT-DV25 dataset, which captures both install-time and post-installation behaviors of packages, we evaluate DL models and investigate feature sets to identify the most discriminative attributes for enabling efficient malicious package detection. Additionally, model stability analysis and explainable AI techniques are incorporated into the detection pipeline to enable stable, and transparent interpretations of model decisions. Experimental results demonstrate that eDySec significantly outperforms the state-of-the-art frameworks. Specifically, it halves feature dimensionality while lowering false positives by 82% and false negatives by 79%. It also improves accuracy by 3%, achieves near-perfect stability, and maintains an inference latency of 170ms per package. Further analysis reveals that feature and model selection play a critical role, as certain combinations degrade performance. Ultimately, this study advances the understanding of the strengths and limitations of dynamic analysis against next-gen attacks.