Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints
Amazon AWS AI Blog / 5/5/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- Amazon SageMaker AI has introduced a capacity-aware instance pool that automatically falls back across a prioritized list of instance types when capacity is limited.
- The fallback is applied automatically during endpoint creation, during scale-out, and even during scale-in, reducing the need for manual capacity management.
- This provisioning behavior is designed to place inference endpoints on available AI infrastructure without requiring user intervention.
- The feature is available for Single Model Endpoints, Inference Component-based endpoints, and Asynchronous Inference endpoints.
Today, Amazon SageMaker AI introduces capacity aware instance pool for new and existing inference endpoints. You define a prioritized list of instance types, and SageMaker AI automatically works through your list whenever capacity is constrained at creation, during scale-out, and during scale-in. Your endpoint provisions on available AI Infrastructure without manual intervention. This capability is available for Single Model Endpoints, Inference Component-based endpoints, and Asynchronous Inference endpoints.


