GitHub hits CTRL-Z, decides it will train its AI with user data after all

The Register / 3/26/2026

📰 NewsDeveloper Stack & InfrastructureIndustry & Market Moves

Key Points

  • GitHub is changing its AI training policy so that, starting April 24, it plans to use user data to train its AI models unless users opt out.
  • The article frames the decision as a reversal (“CTRL-Z”), implying GitHub previously moved toward a different stance on AI training data.
  • Users will need to take action to prevent their data from being included in AI training, making the change primarily relevant to privacy and governance workflows.
  • The move affects how organizations manage consent, data retention, and compliance for code hosted on GitHub.
  • The change is likely to influence expectations for developer-tool vendors around transparency and user control of training data.
  • categories: ['industry-market-moves', 'dev-stack-infra']

GitHub hits CTRL-Z, decides it will train its AI with user data after all

As of April 24 you'll be feeding the Octocat unless you opt out

Thu 26 Mar 2026 // 00:13 UTC

Microsoft's GitHub next month plans to begin using customer interaction data – "specifically inputs, outputs, code snippets, and associated context" – to train its AI models.

The code locker’s revised policy applies to Copilot Free, Pro, and Pro+ customers, as of April 24. Copilot Business and Copilot Enterprise users are exempt thanks to the terms of their contracts. Students and teachers who access Copilot will also be spared.

Those affected have the option to opt out in accordance with "established industry practices" – meaning according to US norms as opposed to European norms where opt-in is commonly required. To opt out, GitHub users should visit /settings/copilot/features and disable "Allow GitHub to use my data for AI model training" under the Privacy heading.

Mario Rodriguez, GitHub's chief product officer, would rather you didn't.

"By participating, you'll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production," he wrote in a blog post.

To excuse its covetous behavior, GitHub in its FAQs notes that Anthropic, JetBrains, and corporate parent Microsoft operate similar opt-out data use policies.

The rationale for the change, according to Rodriguez, is that interaction data makes company AI models perform better. Adding interaction data from Microsoft employees has led to meaningful improvements, he claims, such as an increased acceptance rate for AI model suggestions.

The data GitHub wants includes:

  • Model outputs that have been accepted or modified;
  • Model inputs including code snippets shown;
  • Code context surrounding your cursor position;
  • Comments and documentation you've written;
  • File names and repo structure;
  • Interactions with Copilot features (e.g. chats); and
  • Feedback (e.g. thumbs up/down ratings).

The policy shift does somewhat change the meaning of GitHub private repositories, which are notionally "only accessible to you, people you explicitly share access with, and, for organization repositories, certain organization members." These might be more accurately described as "GitHub private* repositories," with the asterisk to denote the limits of GitHub’s definition of the word "private."

As the FAQs explain: "If a Copilot user has their settings set to enable model training on their interaction data, code snippets from private repositories can be collected and used for model training while the user is actively engaged with Copilot while working in that repository."

Recent banter in the GitHub community doesn’t include much enthusiasm for the plan. To judge by emoji votes alone, users have offered 59 thumbs-down votes and just three rocket ships, which we understand signal some measure of excitement.

But among the 39 posts commenting on the change at the time this article was filed, no one other than Martin Woodward, GitHub VP of developer relations, has really endorsed the idea.

User indignation might be somewhat mitigated if GitHub users recognized that OpenAI's Codexused in GitHub Copilot – is "a GPT language model fine-tuned on publicly available code from GitHub." That verbiage shows the data-gorged AI horse is already out of the barn, so to speak.

Shutting the doors at this point won't change the fact that the AI industry is built on data gathered without asking for a strong indicator of enthusiastic consent. ®

More like these
×

Narrower topics

Broader topics

More about

More like these
×

Narrower topics

Broader topics

TIP US OFF

Send us news