DeepSeek adds AI vision in major move: ‘the whale can now see’
SCMP Tech / 4/29/2026
📰 NewsSignals & Early TrendsTools & Practical UsageIndustry & Market Moves
Key Points
- DeepSeek has introduced an “image recognition mode” in its chat interface, expanding the product beyond text-only or chat-style interactions.
- The new vision capability is positioned as a third chat mode alongside existing “expert” and “flash” modes.
- The update signals an ongoing shift toward multimodal assistants, where users can input images and get recognition/understanding responses.
- This change may affect how developers and teams integrate DeepSeek into workflows that require visual interpretation (e.g., analysis, search, or moderation).
Chinese artificial intelligence start-up DeepSeek has added multimodal capabilities to its flagship chatbot for the first time – meaning that it can process images and video in addition to text – bringing it in line with rivals that already offer the function.
The limited release to select users comes just days after the Hangzhou-based company released its new flagship model V4, which was followed by extensive price cuts.
According to DeepSeek multimodal team leader Chen Xiaokang, who made the...
Continue reading this article on the original site.
Read original →Related Articles

Black Hat USA
AI Business

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring
SCMP Tech

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges
Dev.to

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay
Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works
Dev.to