Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

Amazon AWS AI Blog / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • Amazon SageMaker AI has introduced a capacity-aware instance pool that automatically falls back across a prioritized list of instance types when capacity is limited.
  • The fallback is applied automatically during endpoint creation, during scale-out, and even during scale-in, reducing the need for manual capacity management.
  • This provisioning behavior is designed to place inference endpoints on available AI infrastructure without requiring user intervention.
  • The feature is available for Single Model Endpoints, Inference Component-based endpoints, and Asynchronous Inference endpoints.
Today, Amazon SageMaker AI introduces capacity aware instance pool for new and existing inference endpoints. You define a prioritized list of instance types, and SageMaker AI automatically works through your list whenever capacity is constrained at creation, during scale-out, and during scale-in. Your endpoint provisions on available AI Infrastructure without manual intervention. This capability is available for Single Model Endpoints, Inference Component-based endpoints, and Asynchronous Inference endpoints.