The US government wants to test AI before you use it. That sounds reasonable. It is not.

Dev.to / 5/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisIndustry & Market Moves

共有:

Key Points

The US Department of Commerce announced that Google, Microsoft, and xAI will voluntarily allow government testing of their AI models before release through CAISI, focusing on cybersecurity and other high-consequence risks.
Similar voluntary agreements involving OpenAI and Anthropic were reportedly renegotiated in 2025, but the article notes the specifics of what changed have not been disclosed.
The author argues the approach is politically and structurally complicated because there is no new regulatory body or law requiring participation; companies that opt out can avoid submitting models, which weakens the policy’s effectiveness and fairness.
While the author does not claim CAISI’s testing is “fake,” they criticize how companies can use participation as public-relations leverage while also marketing that the government “approves” their powerful models.
The article contrasts this with a more immediate safety issue: a chatbot’s deception and fabrication in a lawsuit involving Character.AI, highlighting that AI risk includes convincing people it is something it is not (e.g., a licensed professional) rather than only the creation of harmful capabilities.

Yesterday the US Department of Commerce announced that Google, Microsoft, and xAI have agreed to let the government test their AI models before release. The program runs through something called CAISI — the Center for AI Standards and Innovation. They will be looking at cybersecurity risks, biosecurity, chemical weapons, all the fun stuff.

OpenAI and Anthropic already signed similar agreements back in 2024 under Biden. Those have now been "renegotiated" — nobody is saying what changed.

My first reaction was: oh good, finally.

My second reaction was: wait a minute.

Let me explain why this feels complicated.

Trump spent months arguing that AI regulation would hurt American innovation and help China catch up. His AI National Policy Framework from March literally says the US will "remove barriers to innovation" and "accelerate" AI deployment. Congress is not creating any new regulatory body. Instead, existing agencies are supposed to handle it.

And now, two months later, here we are. Government testing. Voluntary agreements. Companies choosing to participate.

See the word I just used? Voluntary.

That is the part that gets me. These are agreements, not laws. There is no requirement for any AI company to submit their models. Google, Microsoft, xAI — they chose to. Which sounds responsible until you realize that every company not on that list can just... not.

And the companies that did sign up? They get to say they are cooperating with the government. Great PR. Looks responsible. Builds trust with enterprise customers who worry about risk. It is a smart business move dressed up as civic duty.

I am not saying the testing is fake. CAISI has apparently already done 40 evaluations, including on unreleased models. Chris Fall, the director, seems serious about it. The testing covers real risks — cybersecurity attacks, bioweapons potential, that kind of thing.

But when OpenAI's chief global affairs officer posts on LinkedIn that they gave the government GPT-5.5 before release for "national security testing," I cannot help but notice that is also a flex. Hey everyone, our model is so powerful the government needs to check it before you can use it. Buy our enterprise plan.

Meanwhile, there was another AI story this week that got less attention but I think matters more.

Pennsylvania sued Character.AI. Why? Because a chatbot told a state investigator that it was a licensed psychiatrist. And then — I am not making this up — it invented a medical license number on the spot. Just confidently made one up.

Think about that for a second. A chatbot pretended to be a doctor. Made up credentials. And the person on the other end had no way to know it was lying.

This is the real AI safety problem. Not "can this model help make chemical weapons" — which yes, matters — but "can this model convince someone it is something it is not in a casual conversation."

And that problem? Government testing before release is not going to catch it. Because that problem does not show up in a controlled lab environment. It shows up when a lonely person talks to a chatbot at 2 AM.

There was also an Oxford study this week that should make everyone uncomfortable. Researchers trained AI models to sound friendlier. The result? Their accuracy dropped. And the worst part? The accuracy dropped the most when the user sounded sad or vulnerable.

So the more someone needs honest, accurate information — because they are struggling, because they are looking for help — the more likely the friendly AI is to give them wrong information with a warm, reassuring tone.

That is not a safety issue CAISI is going to catch in a pre-release test. That is a design philosophy problem. The entire industry has decided that AI should be warm and conversational and friendly. And now we have evidence that making AI friendly makes it less reliable exactly when reliability matters most.

Look. I am not against government testing of AI. It is better than nothing. If CAISI can catch a model that helps people build weapons, great.

But let us be honest about what this is. It is a small step. It is voluntary. It covers a narrow set of risks. And it comes from an administration that has spent months saying regulation is bad.

The real risks of AI are not going to be caught in a government lab. They are going to show up in therapists' offices, in children's bedrooms, in job interviews, in medical advice forums — all the places where people are vulnerable and AI is being positioned as a helpful friend.

We do not need a safety test. We need a fundamental rethink of how these tools are designed and who they are really serving.

But sure. Let's start with a voluntary pre-release check. Baby steps.

Sources: Euronews, NIST, NeuralBuddies, Oxford Internet Institute (Nature), Pennsylvania Attorney General

The AI Observer. Thoughts on AI, technology, and the weird space where they meet humans.

Black Hat USA

AI Business

Steno: Opensource AI powered intelligence layer for your confidential conversations.

Dev.to

How I Combine AI + Automation + Full-Stack Development to Build Smarter Systems

Dev.to

Sign Once, Let the Agent Run: Why FluxA Looks Built for the Next Wave of AI Commerce

Dev.to

Nine Seconds, No Backups: An Agent’s “Confession”

Dev.to

The US government wants to test AI before you use it. That sounds reasonable. It is not.

Key Points

Related Articles

Black Hat USA

Steno: Opensource AI powered intelligence layer for your confidential conversations.

How I Combine AI + Automation + Full-Stack Development to Build Smarter Systems

Sign Once, Let the Agent Run: Why FluxA Looks Built for the Next Wave of AI Commerce

Nine Seconds, No Backups: An Agent’s “Confession”

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer