AI Navigate

Coding agents for data analysis

Simon Willison's Blog / 3/17/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post is a NICAR 2026 handout showing how Claude Code and OpenAI Codex can be used to explore, analyze, and clean data for data journalism.
  • The table of contents outlines a workflow from a warmup with ChatGPT and Claude through setup, querying a database, exploring data, cleaning neighborhood codes, visualizations, and scraping data with agents.
  • The workshop was run in GitHub Codespaces using Codex, illustrating a budget-friendly approach where attendees spent about $23 on Codex tokens.
  • Exercises throughout use Python and SQLite, with some sessions leveraging Datasette for data serving.
  • A highlight demonstrated Claude Code generating new interactive visualizations inside a viz/ folder and producing a heat map for a trees database using Leaflet.heat.
Sponsored by: CodeRabbit — Planner helps 10x your coding agents while minimizing rework and AI slop. Try Now.

16th March 2026 - Link Blog

Coding agents for data analysis. Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed at data journalists demonstrating ways that tools like Claude Code and OpenAI Codex can be used to explore, analyze and clean data.

Here's the table of contents:

I ran the workshop using GitHub Codespaces and OpenAI Codex, since it was easy (and inexpensive) to distribute a budget-restricted API key for Codex that attendees could use during the class. Participants ended up burning $23 of Codex tokens.

The exercises all used Python and SQLite and some of them used Datasette.

One highlight of the workshop was when we started running Datasette such that it served static content from a viz/ folder, then had Claude Code start vibe coding new interactive visualizations directly in that folder. Here's a heat map it created for my trees database using Leaflet and Leaflet.heat, source code here.

Screenshot of a "Trees SQL Map" web application with the heading "Trees SQL Map" and subheading "Run a query and render all returned points as a heat map. The default query targets roughly 200,000 trees." Below is an input field containing "/trees/-/query.json", a "Run Query" button, and a SQL query editor with the text "SELECT cast(Latitude AS float) AS latitude, cast(Longitude AS float) AS longitude, CASE WHEN DBH IS NULL OR DBH = '' THEN 0.3 WHEN cast(DBH AS float) <= 0 THEN 0.3 WHEN cast(DBH AS float) >= 80 THEN 1.0" (query is truncated). A status message reads "Loaded 1,000 rows and plotted 1,000 points as heat map." Below is a Leaflet/OpenStreetMap interactive map of San Francisco showing a heat map overlay of tree locations, with blue/green clusters concentrated in areas like the Richmond District, Sunset District, and other neighborhoods. Map includes zoom controls and a "Leaflet | © OpenStreetMap contributors" attribution.

I designed the handout to also be useful for people who weren't able to attend the session in person. As is usually the case, material aimed at data journalists is equally applicable to anyone else with data to explore.

Posted 16th March 2026 at 8:12 pm

This is a link post by Simon Willison, posted on 16th March 2026.

data-journalism 76 geospatial 82 python 1232 speaking 119 sqlite 447 ai 1911 datasette 1456 generative-ai 1694 llms 1660 github-codespaces 12 nicar 12 coding-agents 177 claude-code 98 codex-cli 27 leaflet 46

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe