Models | llama3.2 | llama3.2 | gemma2 | codegemma | qwen2.5 | qwen2.5 | qwen2.5 | tinyllama | phi3.5 |
---|---|---|---|---|---|---|---|---|---|
Parameters | 1b | 3b | 2b | 2b | 0.5b | 1.5b | 3b | 1.1b | 3.8b |
Size | 1.3GB | 2GB | 1.6GB | 1.6GB | 395MB | 1.1GB | 1.9GB | 638MB | 2.2GB |
Quantization | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Running on | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 |
Downloading Speed(mb/s) | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 |
CPU Rate | 6.7% | 6.3% | 6.3% | 6.3% | 6.5% | 6.3% | 6.3% | 6.4% | 6.4% |
RAM Rate | 4.5% | 4.8% | 4.9% | 5.0% | 5.4% | 5.4% | 4.0% | 4.0% | 4.2% |
GPU vRAM | 51.9% | 80.2% | 72.4% | 53.4% | 20% | 37.2% | 60.8% | 33.2% | 74% |
GPU UTL | 92% | 95% | 89% | 96% | 80% | 89% | 95% | 93% | 97% |
Eval Rate(tokens/s) | 28.90 | 19.97 | 19.46 | 30.59 | 54.78 | 34.43 | 17.92 | 62.33 | 18.87 |
Metric | Value for Various Models |
---|---|
Downloading Speed | 11 MB/s for all models |
CPU Utilization Rate | Ranged from 6.3% to 6.7% across all models |
RAM Utilization Rate | Consistently between 4% and 5.4% |
GPU vRAM Utilization | 20% (Qwen2.5) to 80.2% (Llama3.2-3B) |
GPU Utilization | Ranged from 89% to 97%, showcasing high GPU efficiency |
Evaluation Speed | Spanned from 17.92 tokens/s (Qwen2.5) to 62.33 tokens/s (TinyLlama) |
Express GPU Dedicated Server - P1000
Basic GPU Dedicated Server - T1000
Basic GPU Dedicated Server - GTX 1650
Basic GPU Dedicated Server - GTX 1660
This benchmark demonstrates that Ollama can efficiently leverage a Pascal-based Nvidia Quadro P1000 GPU, even under constrained memory conditions. While not designed for high-end data center applications, servers like this provide a practical solution for testing, development, and smaller-scale LLM deployments.
If you're considering deploying Ollama on similar hardware, ensure proper quantization settings and monitor GPU utilization to maximize throughput. For larger models or production use, upgrading to a GPU with higher memory capacity (e.g., 8GB or 16GB) will provide better performance.