Test Overview
Server Configs:
- GPU: NVIDIA RTX 5090 (32 GB VRAM)
- CPU: Dual Intel Xeon E5-2697 v4 (36 cores, 72 threads)
- 256GB RAM
- 240GB SSD + 2TB NVMe + 8TB SATA
- OS: Windows
Models:
- sd_xl_base_1.0.safetensors (~5.1 GB)
- sd_xl_refiner_1.0.safetensors (~6.1 GB)
- Downloaded directly via ComfyUI model manager (sources from Hugging Face)
Workflow & Settings
We selected SDXL Base + Refiner workflow template, which runs the base model for the initial steps and the refiner model for the final detail enhancement.
Generation settings:
- Resolution: 1024 × 1024
- Steps: 25 total
- Refiner End Step: 20 (Refiner runs for last 5 steps)
- Sampler: Default (Euler or DPM++ depending on template)
- Batch Size: 1 & 4 tested (The number of images generated at the same time)
- Precision: fp16 (automatic in ComfyUI)
Performance Results
| Batch Size | VRAM Peak Usage | Time per Job | Output Count |
|---|---|---|---|
| 1 | ~75% (~24 GB) | 6.21s | 1 image |
| 4 | 100% (~32 GB) | 15.11s | 4 images |
Observations:
- Increasing batch size significantly raises VRAM usage because multiple image latents and attention maps are processed simultaneously.
- RTX 5090 easily handles 4× 1024×1024 images in parallel at full VRAM load.
- ComfyUI efficiently switches between base and refiner models within the same workflow without manual intervention.
The Best GPUs for AI Image Generate (1024×1024,steps=25,end_at_step=20)
| Batch Size | VRAM Requirments | Time per Job |
|---|---|---|
| 1 | ≈ 10~12GB | Basically runs on a 16GB GPU (A4000, V100) |
| 2 | ≈ 18~20GB | Requires 20GB or more of VRAM (such as A5000, RTX4090) |
| 4 | ≈ 32~36GB | Requires 32-48gb GPUs (RTX5090, A6000, etc.) |
⚠ If the resolution is increased (e.g. 2048×2048), the vRAM usage will increase quadratically, and the GPU VRAM will be exhausted quickly.
Quality & Model Behavior
- Different models excel in different styles and prompts — SDXL Base + Refiner generally produces more coherent, detailed, and realistic images than base-only workflows.
- Achieving optimal results still requires prompt tuning and experimentation.
- No quantization was applied — Hugging Face versions of SDXL are full precision, meaning VRAM requirements are relatively high compared to quantized LLMs.
User Experience
- ComfyUI’s node-based workflow makes it easy to visualize and modify image generation pipelines (models, samplers, prompt inputs, saving nodes).
- The RTX 5090 handled the workload smoothly, but CPU bottlenecks affected remote desktop responsiveness due to network latency between China and the U.S. Upgrading to a newer CPU or optimizing RDP encoding could improve remote control smoothness.
Features tested:
- Model Library — Manage and load different checkpoints
- Node Library — Large collection of processing and utility nodes
- Workflow System — Templates for common setups (e.g., SDXL Base + Refiner)
- Queue — Schedule multiple generations sequentially
Conclusion
The RTX 5090 delivers outstanding Stable Diffusion XL performance in ComfyUI:
- Capable of generating four 1024×1024 images in ~15 seconds with SDXL Base + Refiner.
- Fully utilizes VRAM capacity for large batch sizes.
- Offers a flexible, free, and extensible workflow environment for AI image generation.
For professional use, ComfyUI’s free tool + API monetization model makes it attractive for both hobbyists and production pipelines, provided hardware meets VRAM requirements (12 GB+ recommended for SDXL Base + Refiner, 24 GB for high batch sizes).
Tags:
Stable Diffusion XL benchmark, SDXL Base, SDXL Refiner, ComfyUI performance, RTX 5090 AI image generation, VRAM usage, AI benchmark, Hugging Face SDXL, SDXL workflow, AI image rendering test
Outline
