Introduction: What Claude's Multimodal Capabilities Are
The latest AI as of 2025, Claude, equips multimodal capabilities that can analyze not only text but also images. With this feature, you can convert and comprehend various visual information such as screenshots, charts, and handwritten notes into text. This article explains practical usage from image uploading to concrete analysis examples in an easy-to-understand way.
1. How to Upload Images in Claude
To use Claude's image analysis feature, first upload an image to the chat screen. Supported formats are primarily JPEG, PNG, GIF, and BMP.
The upload methods are as follows:
- Web version: Click the attachment button (the clip icon) in the chat input field, then select an image.
- Mobile app version: Tap the + button next to the input field to launch the camera, or choose from the gallery.
- Drag & Drop: If your PC supports it, you can drag and drop the image directly onto the chat screen.
After uploading, once the image is loaded, sending instructions in text will start the analysis.
2. Text Extraction from Screenshots (OCR)
For example, when you want to extract text from a screenshot of a website, instruct as follows:
Please extract the text portions from this screenshot.
Claude uses cutting-edge OCR technology to read text with high accuracy. It supports not only Japanese and English but many languages. It is highly useful for transcribing texts from invoices, email screens, chat histories, and other images.
Examples
Extract the meeting agenda titles and key points shown in the screenshot