This GPU strikes a balance between cost and performance, making it ideal for AI workloads and gaming benchmarks alike. For LLM hosting, the 8GB VRAM is sufficient for running quantized models (4-bit precision), which drastically reduce memory requirements without significant loss in performance.
Models | llama2 | llama2 | llama3.1 | mistral | gemma | gemma2 | llava | wizardlm2 | qwen2 | qwen2.5 | stablelm2 | falcon2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 7b | 13b | 8b | 7b | 7b | 9b | 7b | 7b | 7b | 7b | 12b | 11b |
Size(GB) | 3.8 | 7.4 | 4.9 | 4.1 | 5.0 | 5.4 | 4.7 | 4.1 | 4.4 | 4.7 | 7.0 | 6.4 |
Quantization | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Running on | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 | Ollama0.5.4 |
Downloading Speed(mb/s) | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 |
CPU Rate | 2% | 27-42% | 3% | 3% | 20% | 21% | 3% | 3% | 3% | 3% | 15% | 8 |
RAM Rate | 3% | 7% | 5% | 5% | 9% | 6% | 5% | 5% | 5% | 5% | 5% | 5% |
GPU vRAM | 63% | 84% | 80% | 70% | 81% | 83% | 80% | 70% | 65% | 68% | 90% | 85% |
GPU UTL | 98% | 30-40% | 98% | 88% | 93% | 68% | 98% | 100% | 98% | 96% | 90% | 80% |
Eval Rate(tokens/s) | 73.07 | 9.25 | 57.34 | 71.16 | 31.95 | 23.80 | 72.00 | 70.79 | 63.73 | 58.13 | 18.73 | 31.20 |
Basic GPU Dedicated Server - T1000
Advanced GPU Dedicated Server - RTX 3060 Ti
Enterprise GPU Dedicated Server - RTX 4090
Enterprise GPU Dedicated Server - RTX A6000
The RTX 3060 Ti proves to be a cost-effective choice for LLM benchmarks, especially when paired with Ollama's efficient quantization. For tasks involving models under 13 billion parameters, this setup offers competitive performance, high efficiency, and low resource consumption. If you're searching for an affordable RTX 3060 hosting solution to run LLMs on Ollama, this GPU delivers solid results without breaking the bank.
RTX 3060 benchmark, Ollama benchmark, LLM benchmark, Ollama test, Nvidia RTX 3060 benchmark, Ollama 3060, RTX 3060 Hosting, Ollama RTX Server