Hi everyone,
I’ve been working on a computer vision approach to a specific security problem in the "Agentic Economy": identifying malicious transaction patterns that are mathematically obfuscated but topologically distinct.
The Problem
Traditional rule-based security engines and even standard GNNs often struggle with "splitting attacks"—where a high-value transaction is fragmented into thousands of micro-transactions to bypass statistical thresholds. However, when these flows are projected as a 2D graph topology, they exhibit very specific adversarial signatures (Star patterns, centralized hubs, mixing chains).
The Approach: VLM for Graph Classification
Instead of relying on graph embeddings, I’ve experimented with a Vision-Language approach using Qwen2-VL-2B-Instruct. The intuition is that VLMs are increasingly efficient at recognizing structural relationships in 2D layouts.
Technical Specs:
- Base Model: Qwen2-VL-2B-Instruct.
- Fine-tuning: LoRA (r=16, alpha=32) targeting attention projections (q, k, v, o).
- Dataset (Dogon-10K): I generated 10,000 synthetic transaction graph images using NetworkX and Matplotlib. The dataset covers four classes:
NORMAL,DRAIN_STAR,MIXING_CHAIN, andCOORDINATED_CLUSTER. - Hardware / Stack: Trained on an AMD MI300X using the ROCm stack. This was a great opportunity to stress-test PEFT/TRL on AMD hardware for vision-centric tasks.
Why VLM over GNN?
While GNNs are the standard for graph data, the "image-based" approach allowed for faster prototyping of adversarial pattern recognition without the complexity of building a custom graph auto-encoder for every new chain's schema. The VLM’s ability to interpret "visual intent" proved highly effective at distinguishing a decentralized organic ecosystem from a coordinated sybil attack.
Model & Code
The LoRA weights are available on Hugging Face for anyone interested in testing visual graph classification: 🔗 Hugging Face: https://huggingface.co/Ibonon/imina_na_lora
The full source code for the inference engine and the Dogon dataset generator is currently being cleaned up. 🔗 GitHub: [Under Construction]
I’m particularly interested in hearing if anyone else is using VLMs for visual anomaly detection in abstract data structures (like graphs or network logs).
[link] [comments]




