Which GPU is the Cheapest for Qwen3-32B Inference with vLLM?

If you're looking to run inference on massive 32B models like Qwen-32B or Qwen3-32B from Hugging Face, the most cost-effective GPU setup for Qwen3-32B inference is 2×NVIDIA A100 40GB using vLLM with tensor-parallel-size=2.

Screenshot: 2*A100 40GB for Qwen3:32B with vLLM
deepseek-ai/DeepSeek-R1-Distill-Qwen-32BQwen/QwQ-32Bdeepseek-ai/DeepSeek-R1-Distill-Qwen-32BQwen/QwQ-32B