The dual A100 GPUs provide a combined 80GB of GPU memory, ideal for running large language models efficiently. This configuration allows us to process models with parameter counts as high as 110B with reasonable speed and efficiency.
Models | deepseek-r1 | deepseek-r1 | deepseek-r1 | qwen | qwen | qwen | qwen2 | llama3 | llama3.1 | llama3.3 |
---|---|---|---|---|---|---|---|---|---|---|
Parameters | 14b | 32b | 70b | 32b | 72b | 110b | 72b | 70b | 70b | 70b |
Size | 9GB | 20GB | 43GB | 18GB | 41GB | 63GB | 41GB | 40GB | 43GB | 43GB |
Quantization | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Running on | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 | Ollama0.5.7 |
Downloading Speed(mb/s) | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 | 117 |
CPU Rate | 0% | 2% | 3% | 2% | 1% | 1% | 1% | 2% | 1% | 1% |
RAM Rate | 4% | 4% | 4% | 4% | 4% | 4% | 4% | 3% | 4% | 3% |
GPU UTL(2 Cards) | 0%, 80% | 0%, 88% | 44%, 44% | 36%, 39% | 42%, 45% | 50%, 50% | 38%, 37% | 92%, 0% | 44%, 43% | 44%, 43% |
Eval Rate(tokens/s) | 66.79 | 36.36 | 19.34 | 32.07 | 20.13 | 16.06 | 19.88 | 24.41 | 19.01 | 18.91 |
Enterprise GPU Dedicated Server - RTX A6000
Multi-GPU Dedicated Server - 2xA100
Enterprise GPU Dedicated Server - A100(80GB)
Enterprise GPU Dedicated Server - H100
The Dual Nvidia A100 GPU server is a powerful and cost-effective solution for running LLMs with parameter sizes up to 110B. It offers excellent performance for mid-range to large models like Qwen:32B, DeepSeek-R1:70B, and Qwen:72B, with a significant price advantage over higher-end GPUs like the H100.
For users who need to process large-scale models without the steep costs of more premium GPUs, A100*2 hosting offers a compelling option that balances performance and affordability.
Nvidia A100, LLM hosting, AI server, Ollama, performance analysis, GPU server hosting, DeepSeek-R1, Qwen model, AI performance, A100 server, Nvidia GPUs