Visual graph classification for blockchain security: Experiences fine-tuning Qwen2-VL on AMD MI300X [D]

Reddit r/MachineLearning / 5/5/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The post describes a computer-vision/security method for identifying malicious blockchain transaction patterns that are mathematically obfuscated but topologically distinct when rendered as 2D graphs.
Instead of conventional graph embeddings or standard GNNs, the author fine-tunes the vision-language model Qwen2-VL-2B-Instruct to classify visual graph topologies linked to different attack types.
The model is fine-tuned with LoRA (r=16, alpha=32) by targeting attention projection layers (q, k, v, o) and trained on a synthetic Dogon-10K dataset of 10,000 NetworkX/Matplotlib-generated graph images across four classes.
Training and experimentation are run on AMD MI300X using the ROCm stack, specifically to stress-test PEFT/TRL workflows for vision-centric tasks on AMD hardware.
The author shares LoRA weights on Hugging Face and plans to publish the full inference engine and dataset generator code later, while inviting others to explore VLM-based visual anomaly detection for abstract structures.

Hi everyone,

I’ve been working on a computer vision approach to a specific security problem in the "Agentic Economy": identifying malicious transaction patterns that are mathematically obfuscated but topologically distinct.

The Problem

Traditional rule-based security engines and even standard GNNs often struggle with "splitting attacks"—where a high-value transaction is fragmented into thousands of micro-transactions to bypass statistical thresholds. However, when these flows are projected as a 2D graph topology, they exhibit very specific adversarial signatures (Star patterns, centralized hubs, mixing chains).

The Approach: VLM for Graph Classification

Instead of relying on graph embeddings, I’ve experimented with a Vision-Language approach using Qwen2-VL-2B-Instruct. The intuition is that VLMs are increasingly efficient at recognizing structural relationships in 2D layouts.

Technical Specs:

Base Model: Qwen2-VL-2B-Instruct.
Fine-tuning: LoRA (r=16, alpha=32) targeting attention projections (q, k, v, o).
Dataset (Dogon-10K): I generated 10,000 synthetic transaction graph images using NetworkX and Matplotlib. The dataset covers four classes: NORMAL, DRAIN_STAR, MIXING_CHAIN, and COORDINATED_CLUSTER.
Hardware / Stack: Trained on an AMD MI300X using the ROCm stack. This was a great opportunity to stress-test PEFT/TRL on AMD hardware for vision-centric tasks.

Why VLM over GNN?

While GNNs are the standard for graph data, the "image-based" approach allowed for faster prototyping of adversarial pattern recognition without the complexity of building a custom graph auto-encoder for every new chain's schema. The VLM’s ability to interpret "visual intent" proved highly effective at distinguishing a decentralized organic ecosystem from a coordinated sybil attack.

Model & Code

The LoRA weights are available on Hugging Face for anyone interested in testing visual graph classification: 🔗 Hugging Face: https://huggingface.co/Ibonon/imina_na_lora

The full source code for the inference engine and the Dogon dataset generator is currently being cleaned up. 🔗 GitHub: [Under Construction]

I’m particularly interested in hearing if anyone else is using VLMs for visual anomaly detection in abstract data structures (like graphs or network logs).

submitted by /u/Any_Good_2682
[link] [comments]