2026 · 05 · 02 · Sat

Updates for 5/2

This round of updates adds more “do-it-for-me” features across major AI tools, including agents that can carry out multi-step work on your computer and inside everyday apps. We also captured notable shifts in business readiness: stronger security offerings, clearer pricing tiers, and new signals about which vendors are trusted for sensitive government work.

What you see here is not a collection of AI news, but only the changes actually applied to our chaos map / AI Encyclopedia.

A · Theme of the day

AI assistants that can do tasks, not just answer

Several products moved from chat to hands-on help, completing multi-step work across apps and devices.

Perplexity adds “Computer” agent on Max tier

PerplexityPerplexity
What changed

Perplexity Computer autonomous agent (19-model orchestration, Max tier)

Compared to before

Perplexity was mainly a search-and-answer tool with sources. Now it also offers a “Computer” assistant that can carry out tasks for you. This capability is positioned as a premium feature on the Max plan. The focus shifts from finding info to taking actions based on it.

Why it matters

Teams can shorten the gap between research and execution (for example: gather info, then apply it). It raises the bar for vendor comparisons: not just answer quality, but task completion. Budgeting may change because the most useful workflow features sit behind a higher tier. It can reduce manual effort for repetitive “do the steps for me” work.

Perplexity “Computer” expands to 400+ app connections

PerplexityPerplexity
What changed

Perplexity Computer: autonomous agent orchestrating 19 models with 400+ app integrations (Max tier)

Compared to before

Previously, Perplexity’s strengths were centered on fast, cited answers. Now “Computer” can coordinate many AI engines and connect to hundreds of apps. That means it can move information between tools instead of leaving you to copy/paste. This is presented as a Max-tier capability.

Why it matters

App connectivity is often the difference between a demo and daily use. If it can reliably act across your tools, it may replace smaller single-purpose automations. It also increases vendor lock-in risk, since workflows may become tied to one platform. Decision-makers should test real tasks end-to-end, not just prompt quality.

Claude Cowork rolls out broadly; adds Zoom connection

ClaudeClaude
What changed

Claude Cowork (PC-automation AI agent) now generally available for paid plan users; Zoom connector ingests meeting data into Cowork

Compared to before

Claude already offered strong assistance for complex work and long documents. Cowork is now available for paid users, making PC task automation easier to access. A new Zoom connection can pull meeting information into Cowork. This strengthens the “from meeting to action” workflow inside Claude.

Why it matters

Meeting notes can turn into tasks faster: follow-ups, summaries, and action lists. It reduces the friction of getting context into the assistant in a safe, repeatable way. For businesses, it supports more consistent execution after meetings. It also signals a push toward integrated work assistants rather than standalone chat.

Mistral’s Le Chat adds Agent Mode for multi-step tasks

MistralMistral
What changed

Le Chat gains Agent Mode for autonomous multi-step task execution

Compared to before

Le Chat was primarily a conversational assistant. Agent Mode adds the ability to run multi-step tasks more autonomously. This is a shift from “tell me” to “do this sequence of steps.” It aligns Le Chat with a broader industry move toward action-taking assistants.

Why it matters

Organizations evaluating Mistral now need to consider workflow automation, not only model cost. Multi-step task support can improve productivity in support, operations, and analysis. It increases the need for clear controls: what the assistant is allowed to do. Pilots should focus on measurable time savings and error rates.

Grok adds “Computer” for desktop task automation

GrokGrok
What changed

Grok Computer: autonomous desktop automation agent with parallel planning and execution

Compared to before

Grok was best known for fast answers and strong integration with X. “Grok Computer” adds the ability to operate a desktop and complete tasks. It can plan and run steps in parallel, rather than one at a time. This shifts Grok toward being an execution tool, not just a chat tool.

Why it matters

Desktop automation can cut time spent on routine internal processes. Parallel task handling may improve turnaround for multi-part requests. It also raises governance questions: who can run actions, and how they’re reviewed. Buyers should evaluate reliability and audit trails before using it for critical work.

Grok 4.3 adds “Imagine” mode for creative tasks

GrokGrok
What changed

Grok 4.3 "Imagine" agent mode: autonomous agent for creative and image-generation tasks

Compared to before

Previously, creative work in Grok was more prompt-driven and manual. “Imagine” adds a more autonomous mode focused on creative and image tasks. It suggests a workflow where the assistant iterates and produces outputs with less back-and-forth. This is introduced alongside broader “agent” capabilities in Grok.

Why it matters

Marketing and design teams may get faster drafts and more variations per brief. It can reduce time from concept to first usable asset. Businesses should still validate brand consistency and rights/usage policies. It widens the gap between basic chat tools and production-oriented creative tools.

B · Theme of the day

Workflows inside the tools people already use

More AI features are landing directly in familiar products like Word, team chat, and developer tooling.

Microsoft adds AI Legal Agent inside Word

Microsoft CopilotMicrosoft Copilot
What changed

AI Legal Agent integrated into Microsoft Word for contract review, clause checking, and redline suggestions

Compared to before

Copilot already supported work across Office apps. Now Word includes a Legal Agent for reviewing contracts. It can flag clauses, check for issues, and suggest redlines. This brings specialized help into the document where legal work happens.

Why it matters

Contract cycles can speed up by reducing first-pass review time. It can help non-lawyers catch common issues before sending documents to legal. Organizations should define who can rely on suggestions and what requires human sign-off. It may change software spend by reducing the need for separate contract review tools.

Poe adds large group chat across 200+ AI models

PoePoe
What changed

Group chat for up to 200 users collaborating across 200+ AI models

Compared to before

Poe was already a hub for trying many AI models. Now it supports group chats for up to 200 participants. Teams can collaborate in one thread while switching between models. This moves Poe closer to a shared workspace, not just a personal tool.

Why it matters

Cross-functional teams can evaluate models together with shared context. It can speed up consensus on outputs for content, analysis, or support responses. Centralized collaboration helps with consistency compared to scattered individual chats. Leaders should plan access controls and guidelines for what gets shared in-room.

Uber reports broad Claude Code adoption in engineering

Claude CodeClaude Code
What changed

Uber: ~95% of engineers use Claude Code monthly, ~70% of commits AI-generated (as of April 2026)

Compared to before

Claude Code has been positioned as a terminal-based coding assistant. This update adds a real-world adoption data point from Uber. It reports most engineers use it monthly and many code changes are AI-assisted. That’s a shift from “possible” to “proven at scale” in a large org.

Why it matters

Strong adoption suggests developer tools can deliver measurable productivity gains. It also implies process changes: code review, testing, and standards need to keep pace. Buyers can use this as a benchmark when setting rollout goals. It strengthens the case for investing in training and guardrails, not just licenses.

Claude Code guide emphasizes repo-wide, end-to-end help

Getting Started with Claude Code: An AI Coding Assistant from Your Terminal
What changed

Compared to before

Earlier descriptions often framed coding assistants as autocomplete tools. The updated content highlights repo-wide understanding and multi-file changes. It also stresses running tests, reviewing diffs, and fixing issues in a workflow. That reframes it as a practical assistant for shipping changes, not typing faster.

Why it matters

Teams can plan for broader use cases like refactors, debugging, and adding tests. It shifts evaluation criteria toward reliability with existing codebases and build systems. This can influence onboarding: engineers need habits that pair AI output with verification. It supports business cases tied to delivery speed, not just developer satisfaction.

C · Theme of the day

Enterprise trust, security, and policy signals

Updates this week highlight how vendors are approaching security, sensitive customers, and user data practices.

Anthropic launches Claude Security for enterprises

Claude (Anthropic)Claude (Anthropic)
What changed

Claude Security launched: enterprise AI security tooling built on Mythos model capabilities, helping defenders gain AI advantage

Compared to before

Claude previously focused on general assistant capabilities and developer features. Now Anthropic has introduced a dedicated security offering. It is positioned as tooling to help defenders work faster and smarter. This adds a clearer “security buyer” story to the Claude lineup.

Why it matters

Security teams may gain faster investigation and response workflows. A dedicated offering can simplify procurement compared to general-purpose tools. It may reduce risk by focusing features on defensive use cases. Organizations should assess how it fits with existing security processes and reporting.

OpenAI chosen for a U.S. classified AI program

GPT (OpenAI)GPT (OpenAI)
What changed

Selected for U.S. DoD classified AI program alongside Google, Nvidia, and xAI

Compared to before

OpenAI already had broad model and API availability. This update adds a major government selection signal. It places OpenAI alongside other large vendors in a sensitive program. That can be read as increased confidence in operational readiness for high-stakes use.

Why it matters

For enterprises, it may de-risk vendor selection when sensitive workloads are involved. It can influence long-term roadmaps if the vendor invests more in security and reliability. It may also shift competitive dynamics: customers may expect similar controls from others. Procurement teams can treat it as one data point—not a substitute for due diligence.

Anthropic not selected for U.S. classified AI program

Claude (Anthropic)Claude (Anthropic)
What changed

Excluded from U.S. DoD classified AI program (cited as supply-chain risk; OpenAI, Google, Nvidia, xAI were selected)

Compared to before

Claude is widely used and known for strong reasoning and coding support. This update notes it was not selected for a U.S. classified AI program. The cited reason was supply-chain risk. That adds a public signal about perceived readiness for certain high-security contexts.

Why it matters

Organizations in regulated or sensitive sectors may ask tougher questions about sourcing. It could affect competitive shortlists where government-aligned requirements matter. It highlights that performance alone isn’t the deciding factor; operational assurances matter. Buyers should map requirements to vendor capabilities, not assume parity across providers.

ChatGPT free tier: marketing tracking on by default in some regions

ChatGPTChatGPT
What changed

Marketing cookies enabled by default for free users in ad-serving countries (paid plans exempt; opt-out available in settings)

Compared to before

ChatGPT’s free tier has been moving toward ads in some countries. This update notes marketing tracking is enabled by default for free users there. Paid plans are exempt, and users can change the setting. It’s a meaningful shift in default data handling for the free experience.

Why it matters

Companies should be cautious about employees using free accounts for work topics. Default settings matter because most users never change them. This may push teams toward paid plans or approved internal tools for sensitive work. It also affects customer-facing recommendations: “free” may come with trade-offs.

D · Theme of the day

Voice and healthcare performance milestones

New updates show rapid progress in natural-sounding speech and medical decision support benchmarks.

Gemini adds more expressive, low-delay speech preview

GeminiGemini
What changed

Gemini 3.1 Flash TTS Preview: expressive speech synthesis with dynamic emotion/tempo/style control via Audio tags, low-latency streaming

Compared to before

Gemini already supported a wide range of text-based tasks. This update adds a preview of more expressive speech output. It can adjust emotion, pace, and style, and it can stream with low delay. That moves it closer to real-time voice experiences that feel natural.

Why it matters

Better voice quality can improve call assistants, training, and accessibility tools. Low delay matters for live conversations, not just reading text aloud. Businesses can prototype voice-first experiences with clearer control over tone. It may reduce the need for separate voice vendors for some use cases.

DeepMind reports strong results for AI medical co-clinician

Gemini (Google)Gemini (Google)
What changed

DeepMind AI Co-Clinician outperforms GPT-5.4 in blinded physician simulation test for medical diagnosis

Compared to before

Medical AI claims are often hard to compare across vendors. This update adds a blinded test result where DeepMind’s system performed strongly. It is framed as outperforming a leading alternative in a physician simulation. That’s a notable milestone for clinical decision support research.

Why it matters

Healthcare leaders get a clearer signal of progress toward practical assistance. It may accelerate pilots in triage support, documentation, or second-opinion workflows. However, real-world deployment still requires validation, oversight, and accountability. Non-healthcare businesses should note the broader trend: benchmarks are becoming more rigorous.

E · Theme of the day

Pricing clarity and reliability fixes

A few updates are about the basics: predictable pricing and making sure key capabilities work as expected.

Grok introduces SuperGrok Lite at $10/month

GrokGrok
What changed

SuperGrok Lite: $10/mo (from 2026/3/25)

Compared to before

Grok’s paid options ranged from bundled access to higher-priced tiers. This update adds a lower-cost SuperGrok Lite plan. It sits between free access and more expensive subscriptions. That changes how individuals and teams can start using Grok.

Why it matters

Lower entry pricing can expand adoption for pilots and small teams. It gives procurement more flexibility to match cost to usage. It may also signal a push to compete on accessibility, not just premium features. Decision-makers should compare what is included at each tier before standardizing.

Mistral Medium 3.5 long-document issue fixed

MistralMistral
What changed

Mistral Medium 3.5 GGUF initial bug (YaRN parsing) fixed by Unsloth — long-context performance restored

Compared to before

Some early users saw degraded performance on long documents with a specific format. This update notes that an initial bug was fixed by a third-party contributor. The fix restores performance for longer inputs. It improves confidence that the model behaves as expected in extended tasks.

Why it matters

Long documents are common in business: policies, contracts, research, and logs. Restored performance reduces the risk of failed evaluations or inconsistent results. It can make self-hosted or locally run deployments more dependable. Reliability fixes often matter more than new features when rolling into production.

Archive

Past updates

A daily archive of changes actually applied to the site.