This configuration makes it an ideal RTX 5090 hosting solution for deep learning, LLM inference, and AI model training.
Models | gemma3 | gemma3 | llama3.1 | deepseek-r1 | deepseek-r1 | qwen2.5 | qwen2.5 | qwq |
---|---|---|---|---|---|---|---|---|
Parameters | 12b | 27b | 8b | 14b | 32b | 14b | 32b | 32b |
Size (GB) | 8.1 | 17 | 4.9 | 9.0 | 20 | 9.0 | 20 | 20 |
Quantization | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Running on | Ollama0.6.5 | Ollama0.6.5 | Ollama0.6.5 | Ollama0.6.5 | Ollama0.6.5 | Ollama0.6.5 | Ollama0.6.5 | Ollama0.6.5 |
Downloading Speed(mb/s) | 113 | 113 | 113 | 113 | 113 | 113 | 113 | 113 |
CPU Rate | 6.9% | 7.0% | 0.2% | 1.0% | 1.7% | 1.5% | 1.4% | 1.4% |
RAM Rate | 2.8% | 3.4% | 3.5% | 3.7% | 3.6% | 3.6% | 3.6% | 3.1% |
GPU Memory | 32.8% | 82% | 82% | 66.3% | 95% | 66.5% | 95% | 94% |
GPU UTL | 53% | 66% | 15% | 65% | 75% | 68% | 80% | 88% |
Eval Rate(tokens/s) | 70.37 | 47.33 | 149.95 | 89.13 | 45.51 | 89.93 | 45.07 | 57.17 |
GPU | Nvidia RTX5090 | Nvidia H100 | Nvidia A100 40GB | Nvidia RTX4090 | Nvidia RTX A6000 |
---|---|---|---|---|---|
Models | deepseek-r1:32b | deepseek-r1:32b | deepseek-r1:32b | deepseek-r1:32b | deepseek-r1:32b |
Eval Rate(tokens/s) | 45.51 | 45.36 | 35.01 | 34.22 | 26.23 |
Enterprise GPU Dedicated Server - RTX 4090
Enterprise GPU Dedicated Server - RTX A6000
Enterprise GPU Dedicated Server - A100
Multi-GPU Dedicated Server- 2xRTX 5090
RTX 5090 is best suited for LLM up to 32B, such as deepseek-r1, qwen2.5, gemma3, llama3. Models above 70B can be inferred using dual cards. Choose RTX 5090 to get the highest Ollama performance at a cheap price.
Nvidia RTX 5090 Hosting, RTX 5090 Ollama benchmark, RTX 5090 for 32B LLMs, best GPU for 32B inference, ollama RTX 5090, single-GPU LLM hosting, cheap GPU for LLMs, H100 vs RTX 5090, A100 vs RTX 5090, RTX 5090 LLM inference