LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

arXiv cs.AI / 4/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tackles a gap in automated code compliance research by analyzing how training choices influence LLM interpretive behavior rather than treating the models as black boxes.
Using a perturbation-based attribution method, it compares interpretive behaviors across full fine-tuning (FFT), LoRA, and quantized LoRA fine-tuning, and across different model scales (parameter sizes).
It finds that FFT yields attribution patterns that are statistically distinct and more focused than those from parameter-efficient fine-tuning approaches.
As model scale increases, the models adopt more specific interpretive strategies (e.g., emphasizing numerical constraints and rule identifiers), but semantic similarity performance plateaus for models larger than 7B.
The findings aim to improve explainability for critical, regulation-based applications in the AEC (Architecture, Engineering, and Construction) industry.

Abstract

Existing research on large language models (LLMs) for automated code compliance has primarily focused on performance, treating the models as black boxes and overlooking how training decisions affect their interpretive behavior. This paper addresses this gap by employing a perturbation-based attribution analysis to compare the interpretive behaviors of LLMs across different fine-tuning strategies such as full fine-tuning (FFT), low-rank adaptation (LoRA) and quantized LoRA fine-tuning, as well as the impact of model scales which include varying LLM parameter sizes. Our results show that FFT produces attribution patterns that are statistically different and more focused than those from parameter-efficient fine-tuning methods. Furthermore, we found that as model scale increases, LLMs develop specific interpretive strategies such as prioritizing numerical constraints and rule identifiers in the building text, albeit with performance gains in semantic similarity of the generated and reference computer-processable rules plateauing for models larger than 7B. This paper provides crucial insights into the explainability of these models, taking a step toward building more transparent LLMs for critical, regulation-based tasks in the Architecture, Engineering, and Construction industry.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer