Stable Diffusion XL Benchmark in ComfyUI on RTX 5090



Special Offers

Test Overview

Server Configs:

GPU: NVIDIA RTX 5090 (32 GB VRAM)
CPU: Dual Intel Xeon E5-2697 v4 (36 cores, 72 threads)
256GB RAM
240GB SSD + 2TB NVMe + 8TB SATA
OS: Windows

Software:

ComfyUI Windows Portable version (latest release)

Models:

sd_xl_base_1.0.safetensors (~5.1 GB)
sd_xl_refiner_1.0.safetensors (~6.1 GB)
Downloaded directly via ComfyUI model manager (sources from Hugging Face)

Workflow & Settings

We selected SDXL Base + Refiner workflow template, which runs the base model for the initial steps and the refiner model for the final detail enhancement.

Generation settings:

Resolution: 1024 × 1024
Steps: 25 total
Refiner End Step: 20 (Refiner runs for last 5 steps)
Sampler: Default (Euler or DPM++ depending on template)
Batch Size: 1 & 4 tested (The number of images generated at the same time)
Precision: fp16 (automatic in ComfyUI)

Performance Results

Batch Size	VRAM Peak Usage	Time per Job	Output Count
1	~75% (~24 GB)	6.21s	1 image
4	100% (~32 GB)	15.11s	4 images

Observations:

Increasing batch size significantly raises VRAM usage because multiple image latents and attention maps are processed simultaneously.
RTX 5090 easily handles 4× 1024×1024 images in parallel at full VRAM load.
ComfyUI efficiently switches between base and refiner models within the same workflow without manual intervention.

The Best GPUs for AI Image Generate (1024×1024，steps=25，end_at_step=20)

Batch Size	VRAM Requirments	Time per Job
1	≈ 10~12GB	Basically runs on a 16GB GPU (A4000, V100)
2	≈ 18~20GB	Requires 20GB or more of VRAM (such as A5000, RTX4090)
4	≈ 32~36GB	Requires 32-48gb GPUs (RTX5090, A6000, etc.)

⚠ If the resolution is increased (e.g. 2048×2048), the vRAM usage will increase quadratically, and the GPU VRAM will be exhausted quickly.

Quality & Model Behavior

Different models excel in different styles and prompts — SDXL Base + Refiner generally produces more coherent, detailed, and realistic images than base-only workflows.
Achieving optimal results still requires prompt tuning and experimentation.
No quantization was applied — Hugging Face versions of SDXL are full precision, meaning VRAM requirements are relatively high compared to quantized LLMs.

User Experience

ComfyUI’s node-based workflow makes it easy to visualize and modify image generation pipelines (models, samplers, prompt inputs, saving nodes).
The RTX 5090 handled the workload smoothly, but CPU bottlenecks affected remote desktop responsiveness due to network latency between China and the U.S. Upgrading to a newer CPU or optimizing RDP encoding could improve remote control smoothness.

Features tested:

Model Library — Manage and load different checkpoints
Node Library — Large collection of processing and utility nodes
Workflow System — Templates for common setups (e.g., SDXL Base + Refiner)
Queue — Schedule multiple generations sequentially

Conclusion

The RTX 5090 delivers outstanding Stable Diffusion XL performance in ComfyUI:

Capable of generating four 1024×1024 images in ~15 seconds with SDXL Base + Refiner.
Fully utilizes VRAM capacity for large batch sizes.
Offers a flexible, free, and extensible workflow environment for AI image generation.

For professional use, ComfyUI’s free tool + API monetization model makes it attractive for both hobbyists and production pipelines, provided hardware meets VRAM requirements (12 GB+ recommended for SDXL Base + Refiner, 24 GB for high batch sizes).

Tags:

Stable Diffusion XL benchmark, SDXL Base, SDXL Refiner, ComfyUI performance, RTX 5090 AI image generation, VRAM usage, AI benchmark, Hugging Face SDXL, SDXL workflow, AI image rendering test

Outline

Stable Diffusion XL (SDXL Base + Refiner) Benchmark in ComfyUI on RTX 5090