Stable Diffusion XL (SDXL Base + Refiner) Benchmark in ComfyUI on RTX 5090

In this benchmark, we tested Stable Diffusion XL (SDXL) Base + Refiner running on ComfyUI with an RTX 5090 GPU server. The goal was to measure image generation speed, VRAM usage, and overall experience when using ComfyUI’s free workflow-based interface for high-resolution AI image creation.

Test Overview

Server Configs:

  • GPU: NVIDIA RTX 5090 (32 GB VRAM)
  • CPU: Dual Intel Xeon E5-2697 v4 (36 cores, 72 threads)
  • 256GB RAM
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • OS: Windows

Models:

Workflow & Settings

We selected SDXL Base + Refiner workflow template, which runs the base model for the initial steps and the refiner model for the final detail enhancement.

Generation settings:

  • Resolution: 1024 × 1024
  • Steps: 25 total
  • Refiner End Step: 20 (Refiner runs for last 5 steps)
  • Sampler: Default (Euler or DPM++ depending on template)
  • Batch Size: 1 & 4 tested (The number of images generated at the same time)
  • Precision: fp16 (automatic in ComfyUI)

Performance Results

Batch SizeVRAM Peak UsageTime per JobOutput Count
1~75% (~24 GB)6.21s1 image
4100% (~32 GB)15.11s4 images
sd-refiner-comfyui-5090
sd-refiner-comfyui-5090-benchmark

Observations:

  • Increasing batch size significantly raises VRAM usage because multiple image latents and attention maps are processed simultaneously.
  • RTX 5090 easily handles 4× 1024×1024 images in parallel at full VRAM load.
  • ComfyUI efficiently switches between base and refiner models within the same workflow without manual intervention.

The Best GPUs for AI Image Generate (1024×1024,steps=25,end_at_step=20)

Batch SizeVRAM RequirmentsTime per Job
1≈ 10~12GBBasically runs on a 16GB GPU (A4000, V100)
2≈ 18~20GBRequires 20GB or more of VRAM (such as A5000, RTX4090)
4≈ 32~36GBRequires 32-48gb GPUs (RTX5090, A6000, etc.)
⚠ If the resolution is increased (e.g. 2048×2048), the vRAM usage will increase quadratically, and the GPU VRAM will be exhausted quickly.

Quality & Model Behavior

  • Different models excel in different styles and prompts — SDXL Base + Refiner generally produces more coherent, detailed, and realistic images than base-only workflows.
  • Achieving optimal results still requires prompt tuning and experimentation.
  • No quantization was applied — Hugging Face versions of SDXL are full precision, meaning VRAM requirements are relatively high compared to quantized LLMs.

User Experience

  • ComfyUI’s node-based workflow makes it easy to visualize and modify image generation pipelines (models, samplers, prompt inputs, saving nodes).
  • The RTX 5090 handled the workload smoothly, but CPU bottlenecks affected remote desktop responsiveness due to network latency between China and the U.S. Upgrading to a newer CPU or optimizing RDP encoding could improve remote control smoothness.
Features tested:
  • Model Library — Manage and load different checkpoints
  • Node Library — Large collection of processing and utility nodes
  • Workflow System — Templates for common setups (e.g., SDXL Base + Refiner)
  • Queue — Schedule multiple generations sequentially

Conclusion

The RTX 5090 delivers outstanding Stable Diffusion XL performance in ComfyUI:
  • Capable of generating four 1024×1024 images in ~15 seconds with SDXL Base + Refiner.
  • Fully utilizes VRAM capacity for large batch sizes.
  • Offers a flexible, free, and extensible workflow environment for AI image generation.
For professional use, ComfyUI’s free tool + API monetization model makes it attractive for both hobbyists and production pipelines, provided hardware meets VRAM requirements (12 GB+ recommended for SDXL Base + Refiner, 24 GB for high batch sizes).
Tags:

Stable Diffusion XL benchmark, SDXL Base, SDXL Refiner, ComfyUI performance, RTX 5090 AI image generation, VRAM usage, AI benchmark, Hugging Face SDXL, SDXL workflow, AI image rendering test

Outline