This setup makes it a viable option for Nvidia GTX 1660 hosting to run small LLM inference workloads efficiently while keeping costs in check.
Models | deepseek-r1 | deepseek-r1 | deepseek-r1 | deepseek-coder | llama3.2 | llama3.1 | codellama | mistral | gemma | codegemma | qwen2.5 | qwen2.5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 1.5b | 7b | 8b | 6.7b | 3b | 8b | 7b | 7b | 7b | 7b | 3b | 7b |
Size(GB) | 1.1 | 4.7 | 4.9 | 3.8 | 2.0 | 4.9 | 3.8 | 4.1 | 5.0 | 5.0 | 1.9 | 4.7 |
Quantization | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Running on | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 | Ollama0.5.11 |
Downloading Speed(mb/s) | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
CPU Rate | 6% | 20% | 28% | 18% | 6% | 30% | 17% | 4% | 45% | 30% | 6% | 18% |
RAM Rate | 8% | 9% | 10% | 9% | 8% | 10% | 8% | 8% | 12% | 12% | 8% | 9% |
GPU UTL | 38% | 37% | 37% | 42% | 50% | 35% | 42% | 20% | 30% | 36% | 36% | 37% |
Eval Rate(tokens/s) | 30.16 | 18.29 | 16.26 | 21.31 | 38.36 | 16.67 | 21.42 | 40.2 | 9.69 | 9.39 | 26.28 | 18.24 |
For models above 7B, such as DeepSeek-r1 (8B) and LLama 3.1 (8B), CPU utilization increases, signaling that GPU memory (6GB) becomes a bottleneck, limiting performance.
Basic GPU Dedicated Server - GTX 1660
Professional GPU Dedicated Server - RTX 2060
Basic GPU Dedicated Server - RTX 4060
Professional GPU VPS - A4000
The Nvidia GTX 1660 GPU is a cost-effective solution for running small-scale LLMs (1.5B - 7B) with good inference speeds (30-40 tokens/s) and low-cost hosting options like $159/month. For larger models, such as 8B and above, consider scaling up to GPUs with more VRAM for optimal performance. This GTX 1660 VPS is excellent for developers working with smaller language models, LLMs inference, and budget-conscious projects.
ollama 1660, small llms ollama, ollama GTX1660, Nvidia GTX1660 hosting, benchmark GTX1660, ollama benchmark, GTX1660 for llms inference, nvidia GTX1660 rental, GTX 1660 LLM hosting, Nvidia 1660 performance