Hey r/LocalLLaMA,
Quick request — I’m submitting my first arXiv paper and need one endorser.
Key results:
• 95.2% detection across 2,067 held-out payloads (110 attack categories)
• 14× fewer false positives than single-feature scoring
• Uses Gemma Scope SAEs (layers 6/12/18) + conjunctive co-activation patterns mined via FP-Growth
• Trust boundary + BOS token exclusion
• p95 latency 8.6 ms on consumer GPU
PDF (full paper): https://drive.google.com/file/d/1GTQpR0o1Uz_conkQJexlQLR5FCvE3QNs/view
Endorsement link: https://arxiv.org/auth/endorse?x=BPLUNM
Super quick to endorse (takes 30 seconds). Happy to answer any questions about the method, results, or implementation.
Thanks so much — really appreciate the help from this community! 🚀
[link] [comments]
