Efficient Fine-Tuning Methods for Portuguese Question Answering: A Comparative Study of PEFT on BERTimbau and Exploratory Evaluation of Generative LLMs

arXiv cs.CL / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a systematic comparative study of parameter-efficient fine-tuning (PEFT) and quantization for Portuguese (Brazilian Portuguese) extractive question answering using BERTimbau on SQuAD-BR.
  • Across 40 experimental configurations, LoRA is reported to achieve 95.8% of baseline performance on BERTimbau-Large while cutting training time by 73.5%, with F1 dropping from 84.86 to 81.32.
  • Higher learning rates (2e-4) are found to substantially improve PEFT results, with F1 gains reported up to +19.71 points versus standard learning rates.
  • The study finds that larger models are more resilient to quantization, with smaller F1 degradation (4.83 vs 9.56) when quantized.
  • An exploratory comparison with generative LLMs (Tucano and Sabia) suggests competitive F1 can be reached via LoRA, but at the cost of up to 4.2× more GPU memory and 3× more training time than BERTimbau-Base, supporting encoder-based efficiency and “Green AI” goals.

Abstract

Although large language models have transformed natural language processing, their computational costs create accessibility barriers for low-resource languages such as Brazilian Portuguese. This work presents a systematic evaluation of Parameter-Efficient Fine-Tuning (PEFT) and quantization techniques applied to BERTimbau for Question Answering on SQuAD-BR, the Brazilian Portuguese translation of SQuAD v1. We evaluate 40 configurations combining four PEFT methods (LoRA, DoRA, QLoRA, QDoRA) across two model sizes (Base: 110M, Large: 335M parameters). Our findings reveal three critical insights: (1) LoRA achieves 95.8\% of baseline performance on BERTimbau-Large while reducing training time by 73.5\% (F1=81.32 vs 84.86); (2) higher learning rates (2e-4) substantially improve PEFT performance, with F1 gains of up to +19.71 points over standard rates; and (3) larger models show twice the quantization resilience (loss of 4.83 vs 9.56 F1 points). These results demonstrate that encoder-based models can be efficiently fine-tuned for extractive Brazilian Portuguese QA with substantially lower computational cost than large generative LLMs, promoting more sustainable approaches aligned with \textit{Green AI} principles. An exploratory evaluation of Tucano and Sabi\'a on the same extractive QA benchmark shows that while generative models can reach competitive F1 scores with LoRA fine-tuning, they require up to 4.2\times more GPU memory and 3\times more training time than BERTimbau-Base, reinforcing the efficiency advantage of smaller encoder-based architectures for this task.