Accelerating LLM fine-tuning with unstructured data using SageMaker Unified Studio and S3

Amazon AWS AI Blog / 3/27/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • AWS previously announced an integration that connects Amazon SageMaker Unified Studio with Amazon S3 general purpose buckets to make it easier to use unstructured S3 data for ML and analytics workloads.
  • The post provides a concrete workflow for integrating S3 buckets with Amazon SageMaker Catalog to fine-tune Llama 3.2 11B Vision Instruct for visual question answering (VQA).
  • It specifically demonstrates using the SageMaker Unified Studio environment to orchestrate fine-tuning steps with unstructured data pipelines.
  • The approach targets teams that need to leverage S3-stored, non-structured assets for LLM fine-tuning rather than relying on fully curated structured datasets.
  • Overall, it positions the Unified Studio + S3 + SageMaker Catalog combination as a practical path to accelerate LLM fine-tuning and experimentation for multimodal tasks.
Last year, AWS announced an integration between Amazon SageMaker Unified Studio and Amazon S3 general purpose buckets. This integration makes it straightforward for teams to use unstructured data stored in Amazon Simple Storage Service (Amazon S3) for machine learning (ML) and data analytics use cases. In this post, we show how to integrate S3 general purpose buckets with Amazon SageMaker Catalog to fine-tune Llama 3.2 11B Vision Instruct for visual question answering (VQA) using Amazon SageMaker Unified Studio.