Generating and Evaluating Sustainable Procurement Criteria for the Swiss Public Sector using In-Context Prompting with Large Language Models

arXiv cs.CL / 3/25/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper tackles the labor-intensive challenge of converting Swiss and EU sustainability regulations into concrete, verifiable public procurement criteria used in tenders.
  • It proposes a configurable, LLM-assisted pipeline that generates and evaluates catalogs of sustainability-oriented selection/award/technical criteria using in-context prompting and interchangeable LLM backends.
  • Automated output validation and an LLM-based evaluation component are used to improve auditability and reduce errors compared with purely manual drafting.
  • A proof-of-concept instantiates the system by ingesting structured official guidelines, and evaluation combines automated quality checks with expert comparison to a manually curated “gold standard.”
  • Results indicate substantial reductions in manual drafting effort while maintaining consistency with official guidelines, alongside documented limitations and failure modes for real deployments.

Abstract

Public procurement refers to the process by which public sector institutions, such as governments, municipalities, and publicly funded bodies, acquire goods and services. Swiss law requires the integration of ecological, social, and economic sustainability requirements into tender evaluations in the format of criteria that have to be fulfilled by a bidder. However, translating high-level sustainability regulations into concrete, verifiable, and sector-specific procurement criteria (such as selection criteria, award criteria, and technical specifications) remains a labor-intensive and error-prone manual task, requiring substantial domain expertise in several groups of goods and services and considerable manual effort. This paper presents a configurable, LLM-assisted pipeline that is presented as a software supporting the systematic generation and evaluation of sustainability-oriented procurement criteria catalogs for Switzerland. The system integrates in-context prompting, interchangeable LLM backends, and automated output validation to enable auditable criteria generation across different procurement sectors. As a proof of concept, we instantiate the pipeline using official sustainability guidelines published by the Swiss government and the European Commission, which are ingested as structured reference documents. We evaluate the system through a combination of automated quality checks, including an LLM-based evaluation component, and expert comparison against a manually curated gold standard. Our results demonstrate that the proposed pipeline can substantially reduce manual drafting effort while producing criteria catalogs that are consistent with official guidelines. We further discuss system limitations, failure modes, and design trade-offs observed during deployment, highlighting key considerations for integrating generative AI into public sector software workflows.