Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!
arXiv cs.CL / 4/27/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that commonly proposed watermarking approaches may not withstand continue training, leaving LLM attribution and copyright protection vulnerable.
- It proposes a robust LLM fingerprinting method based on intrinsic model characteristics, specifically the standard deviation distributions of attention parameter matrices across layers.
- The authors report that these distribution “signatures” remain stable even after extensive continued training and can be used to identify model lineage and detect potential infringement.
- Experiments across multiple model families validate the method’s effectiveness for model authentication.
- The study presents evidence that Huawei’s recently released Pangu Pro MoE model may have been upcycled from Qwen-2.5 14B rather than trained from scratch, suggesting possible plagiarism and IP/copyright violations.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

The company with a monopoly on AI's most critical machine is racing to build more
THE DECODER

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools
Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared
Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research
Dev.to
The Open Source AI Studio That Nobody's Talking About
Dev.to