OpenSimula — open implementation of Simula-style mechanism design for synthetic data (in AfterImage) [P]

Reddit r/MachineLearning / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • AfterImage’s open-source dataset tool added OpenSimula, an experimental Python implementation of a Simula-style mechanism-design recipe based on Davidson et al. for generating synthetic data with controlled diversity.
  • The pipeline uses LLM-generated factor taxonomies, weighted factor sampling, meta-prompt diversification (optionally complexified), and iterative “requirement critic” refinement to produce accepted JSONL dataset points.
  • It includes optional double-critic gating specifically aimed at verifiable multiple-choice (MCQ) generation, with versioned artifacts (checkpoint, taxonomy bundle, sampling strategy) and append-only logging for accepted examples.
  • The project provides hooks for observability (e.g., GenerationMonitor) and for scenario-to-conversation integration via callbacks, plus example implementations and API documentation.
  • The authors emphasize experimental status and warn that large taxonomy widths/depths can sharply increase cost and latency, while also clarifying that this “mechanism design” structures data generation but won’t inherently fix model failures or poor teacher data.

Hi r/MachineLearning,

We added OpenSimula to our open-source dataset tool AfterImage: an experimental Python implementation of the Simula mechanism-design recipe from Davidson et al. (TMLR, PDF; framing also in this research blog).

Problem it targets:

For some SFT/eval setups you care less about “one prompt → one answer” and more about controlled diversity over a reasoning space: which axes of variation exist, how you joint-sample them, and how you stress-test generations before they land in a JSONL file.

What the code actually does (high level):

LLM-built factor taxonomiesweighted mix sampling over factors → meta-prompt diversification (+ optional complexification) → requirement critic loop with refinement → optional double-critic gate for verifiable MCQ. Artifacts are a versioned opensimula/ checkpoint (manifest, taxonomy bundle, sampling strategy) plus append-only JSONL for accepted points. You can plug in the same GenerationMonitor we use elsewhere for observability into generation metrics, or bridge scenarios into ConversationGenerator via a small callback.

Hard disclaimers (please read):

  • This is not a Google product, not a reference port of anything internal—just our read of the published recipe in the paper.
  • API is explicitly experimental and may change.
  • Cost and latency explode if you remove the caps on taxonomy width/depth; wide trees are many structured calls unless you tune bounds.
  • “Mechanism design” here helps structure the data-generating process; it does not magically fix model collapse or bad teacher models.

Code & docs:

I genuinely would love hear your feedback if any.

submitted by /u/Individual-Road-5784
[link] [comments]