要点

NVIDIA TensorRT を用いて Stable Diffusion 3.5 モデルを最適化し、RTX GPU 上で推論速度を約2倍に達したと報告されている。
メモリ使用量は約40%削減され、より効率的なデプロイが可能になり、同じハードウェア上でより大きなワークロードを実行できる。
これらの最適化は AI 画像生成サービスの実用的なデプロイメント改善として提示され、レイテンシと運用コストの削減の可能性がある。
この効果を実現するには、TensorRT 対応ビルドと互換性のある NVIDIA RTX ハードウェアが必要になる。

TensorRTで最適化されたStable Diffusion 3.5モデルは、NVIDIA RTX GPU上で2倍のパフォーマンスと40％のメモリ削減を実現

6月12日

要点:

私たちはNVIDIAと協力して、NVIDIA TensorRT-最適化されたStable Diffusion 3.5 (SD3.5) のバージョンを提供し、エンタープライズグレードの画像生成を、より広範なNVIDIA RTX GPUで利用できるようにしました。
SD3.5 TensorRT最適化モデルは、SD3.5 Largeで最大2.3倍、SD3.5 Mediumで最大1.7倍の生成速度を実現し、VRAM要件を40％削減します。
最適化されたモデルは、緩やかなStability AIコミュニティライセンスの下で商用および非商用利用が可能です。重みは Hugging Face からダウンロードでき、コードは NVIDIAのGitHubで入手できます。

NVIDIAとの共同作業で、TensorRTと FP8を使用してSD3.5ファミリのモデルを最適化し、対応するRTX GPUでの生成速度を向上させ、VRAM要件を低減しました。
SD3.5は、家庭用ハードウェアでそのまま動作するように設計されていました。NVIDIAの最適化は、さまざまなハードウェア構成で作業するクリエイティブプロフェッショナルや開発者にとって、そのアクセス性をさらに広げます。

Where the models excel

These performance improvements make SD3.5's core strengths more accessible. SD3.5 excels in the following areas, making it one of the most customizable image models on the market, while maintaining top-tier performance in prompt adherence and image quality:

Versatile Styles: Capable of generating a wide range of styles and aesthetics like 3D, photography, painting, line art, and virtually any visual style imaginable.
Diverse Outputs: Creates images representative of the world, not just one type of person, with different skin tones and features, without the need for extensive prompting.
Prompt Adherence: Our analysis shows that SD3.5 Large leads the market in prompt adherence, allowing the model to closely follow a given text prompt, making it a top choice for efficient, high-quality performance.

Now available across more NVIDIA RTX GPUs

TensorRT optimization reduces model size while maintaining quality by streamlining how models run on NVIDIA hardware. Model size reduction is achieved through FP8 quantization, a technique that makes models more efficient while maintaining high output quality. These improvements mean that five RTX 50 Series systems can now run SD3.5 Large from memory, compared to just one system before optimization.

Enhanced performance across NVIDIA RTX GPUs

SD3.5 TensorRT-optimized models run more efficiently across NVIDIA GeForce RTX 50 and 40 Series GPUs, as well as NVIDIA Blackwell and Ada Lovelace generation NVIDIA RTX PRO GPUs. They deliver up to 2.3x faster generation on SD3.5 Large and 1.7x faster on SD3.5 Medium, while reducing VRAM requirements by 40%.

SD3.5 Large

2.3x faster image generation compared to compared to the base PyTorch models.
Memory use reduced by 40%, from 19GB to 11GB, all while maintaining professional quality.

SD3.5 Medium

1.7x faster image generation for users prioritizing speed and efficiency.
Lower memory footprint, ideal for creators working on mid-range RTX hardware.

Getting started

The optimized models are now available for commercial and non-commercial use under the permissive Stability AI Community License.You can download the weights on Hugging Face and code on NVIDIA’s GitHub.

私たちの進捗を最新情報で把握するには、以下のアカウントをフォローしてください X, LinkedIn, Instagram, そして私たちの Discordコミュニティ.

ゲストユーザー

TensorRTで最適化された Stable Diffusion 3.5 モデルが NVIDIA RTX GPU 上で 2倍の推論速度と 40% のメモリ削減を実現

要点

TensorRTで最適化されたStable Diffusion 3.5モデルは、NVIDIA RTX GPU上で2倍のパフォーマンスと40％のメモリ削減を実現

要点:

Where the models excel

Now available across more NVIDIA RTX GPUs

Enhanced performance across NVIDIA RTX GPUs

Getting started

関連記事

「Google AI Studio」がFirebaseのバックエンドとAntigravityのコーディングエージェントを搭載、プロンプトだけで高度なフルスタックアプリケーションを生成可能に

AIエージェントがコマンドラインでブラウザを自動操作できる「Browser Use CLI 2.0」リリース。Chrome DevToolsへの接続などで操作速度が2倍に

エッジコンピューティングとローカル処理への大規模な移行

仕様駆動開発における自己改良エージェント

2026年版：AIでLinkedInプロフィールを最適化して採用担当者に見つけてもらう方法

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer