BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization

arXiv cs.AI / 4/10/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The article introduces BDI-Kit, an extensible toolkit aimed at reducing bottlenecks in data harmonization caused by heterogeneous schemas and value conventions across domains.
  • BDI-Kit offers two interfaces: a Python API for developers to build and reuse harmonization pipelines, and an AI-assisted chat interface for domain experts to work via natural-language dialogue.
  • The demonstration emphasizes an iterative workflow that combines automated schema/value matching, AI-assisted reasoning, and user-driven refinement to validate and improve match quality.
  • Two showcased scenarios compare programmatic pipeline composition (examining intermediate outputs and reusing transformations) versus interactive conversational refinement using assistant suggestions.

Abstract

Data harmonization remains a major bottleneck for integrative analysis due to heterogeneity in schemas, value representations, and domain-specific conventions. BDI-Kit provides an extensible toolkit for schema and value matching. It exposes two complementary interfaces tailored to different user needs: a Python API enabling developers to construct harmonization pipelines programmatically, and an AI-assisted chat interface allowing domain experts to harmonize data through natural language dialogue. This demonstration showcases how users interact with BDI-Kit to iteratively explore, validate, and refine schema and value matches through a combination of automated matching, AI-assisted reasoning, and user-driven refinement. We present two scenarios: (i) using the Python API to programmatically compose primitives, examine intermediate outputs, and reuse transformations; and (ii) conversing with the AI assistant in natural language to access BDI-Kit's capabilities and iteratively refine outputs based on the assistant's suggestions.