This configuration provides ample RAM and storage for smooth model loading and execution, while the P100's 16GB VRAM enables running larger models compared to the RTX2060 Ollama benchmark we previously conducted.
Models | deepseek-r1 | deepseek-r1 | deepseek-r1 | deepseek-coder-v2 | llama2 | llama2 | llama3.1 | gemma2 | qwen2.5 | qwen2.5 |
---|---|---|---|---|---|---|---|---|---|---|
Parameters | 7b | 8b | 14b | 16b | 7b | 13b | 8b | 9b | 7b | 14b |
Size(GB) | 4.7 | 4.9 | 9 | 8.9 | 3.8 | 7.4 | 4.9 | 5.4 | 4.7 | 9.0 |
Quantization | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Running on | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 |
Downloading Speed(mb/s) | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
CPU Rate | 3% | 3% | 3% | 3% | 4% | 3% | 3% | 3% | 3% | 3% |
RAM Rate | 5% | 5% | 5% | 4% | 4% | 4% | 5% | 5% | 5% | 5% |
GPU UTL | 85% | 89% | 90% | 65% | 91% | 95% | 88% | 81% | 87% | 91% |
Eval Rate(tokens/s) | 34.31 | 33.06 | 19.43 | 40.25 | 49.66 | 28.86 | 32.99 | 29.54 | 34.70 | 19.45 |
Professional GPU Dedicated Server - P100
Professional GPU VPS - A4000
Advanced GPU Dedicated Server - V100
Enterprise GPU Dedicated Server - RTX 4090
For Nvidia P100 rental users, the best models for small-scale AI inference on Ollama are:
Ollama P100, Tesla P100 LLMs, Ollama Tesla P100, Nvidia P100 hosting, benchmark P100, Ollama benchmark, P100 for 7B-14B LLMs inference, Nvidia P100 rental, Tesla P100 performance, Ollama LLMs, deepseek-r1 P100, Llama2 P100 benchmark, Qwen2.5 Tesla P100, Nvidia P100 AI inference