Abstract
Graph-based anti-money laundering (AML) systems on blockchain networks can score suspicious activity at two granularity levels -- transactions or actor addresses -- yet compliance action is conducted per actor. This paper contributes an evaluation methodology for measuring how scoring granularity affects investigation queue composition under fixed review budgets. We formalize the evaluation through a projection framework mapping transaction-level scores to the actor-level action unit via four aggregation operators, and introduce budgeted investigation metrics -- yield@budget, burden decomposition, and case fragmentation. Using the public Elliptic++ Bitcoin dataset (203,769 transactions; 822,942 address occurrences), we train independent random forest classifiers at each level under a causal temporal protocol and compare review queues through Jaccard overlap, burden decomposition, and feature-matching ablations. At one-percent budget, temporal evaluation yields mean Jaccard of 0.374 (SD 0.171); static pooled evaluation yields 0.087 (95% CI [0.079, 0.094]). An enriched address model receiving all 237 features produces even lower overlap (Jaccard=0.051), with 4.3% illicit per 100 reviews versus 30.2% for the transaction-projected queue. Address-level detection value is temporally concentrated: two timesteps exceed 91% illicit per 100 reviews while the static burden is only 3.4%. A fixed hybrid policy underperforms the best single-level queue by 5.05pp (CI [-10.2pp, -0.9pp]). These findings establish that scoring granularity is a consequential design variable for AML investigation systems -- same data, same budget, different queues, different addresses investigated.