Benchmarking LLMs on Ollama with Nvidia Tesla P100 | Performance & Hosting

With the growing demand for local AI inference, running large language models (LLMs) on Ollama with affordable GPU servers has become a hot topic. In this benchmark, we test the Nvidia Tesla P100—a 16GB HBM2 Pascal-based GPU—to evaluate its performance for 7B to 16B LLMs on Ollama 0.5.11. If you're considering a P100 for LLM inference, this guide will help you decide.

Server Setup: Nvidia Tesla P100 Hosting Environment

Before diving into the Ollama P100 benchmark, let's review the test setup:

Server Configuration:

  • Price: $159~199/month
  • CPU: Dual 10-Core Xeon E5-2660v2
  • RAM: 128GB DDR3
  • Storage: 120GB NVMe + 960GB SSD
  • Network: 100Mbps Unmetered
  • OS: Windows 11 Pro

GPU Details:

  • GPU: Nvidia Tesla P100
  • Compute Capability: 6.0
  • Microarchitecture: Pascal
  • CUDA Cores: 3584
  • Memory: 16GB HBM2
  • FP32 Performance: 9.5 TFLOPS

This configuration provides ample RAM and storage for smooth model loading and execution, while the P100's 16GB VRAM enables running larger models compared to the RTX2060 Ollama benchmark we previously conducted.

Benchmark Results: Running LLMs on Tesla P100 with Ollama

We tested various 7B to 16B models on the Tesla P100 Ollama setup. Here's how they performed:
Modelsdeepseek-r1deepseek-r1deepseek-r1deepseek-coder-v2llama2llama2llama3.1gemma2qwen2.5qwen2.5
Parameters7b8b14b16b7b13b8b9b7b14b
Size(GB)4.74.998.93.87.44.95.44.79.0
Quantization4444444444
Running onOllama0.5.11Ollama0.5.11Ollama0.5.11Ollama0.5.11Ollama0.5.11Ollama0.5.11Ollama0.5.11Ollama0.5.11Ollama0.5.11Ollama0.5.11
Downloading Speed(mb/s)12121212121212121212
CPU Rate3%3%3%3%4%3%3%3%3%3%
RAM Rate5%5%5%4%4%4%5%5%5%5%
GPU UTL85%89%90%65%91%95%88%81%87%91%
Eval Rate(tokens/s)34.3133.0619.4340.2549.6628.8632.9929.5434.7019.45
A video to record real-time P100 GPU resource consumption data:
Screen Shoots for Bechmarking LLMs on Ollama with Nvidia P100 GPU Server
ollama run deepseek-r1:7bollama run deepseek-r1:8bollama run deepseek-r1:14bollama run deepseek-coder-v2:16bollama run llama2:7bollama run llama2:13bollama run llama3.1:8bollama run gemma2:9bollama run qwen2.5:7bollama run qwen2.5:14b

Key Takeaways from the Benchmark

  • 7B models run best on the Tesla P100 Ollama setup, with Llama2-7B achieving 49.66 tokens/s.
  • 14B+ models push the limits—performance drops, with DeepSeek-r1-14B running at 19.43 tokens/s, still acceptable.
  • Qwen2.5 and DeepSeek models offer balanced performance, staying between 33–35 tokens/s at 7B.
  • Llama2-13B achieves 28.86 tokens/s, making it usable but slower than its 7B counterpart.
  • DeepSeek-Coder-v2-16B surprisingly outperformed 14B models (40.25 tokens/s), but lower GPU utilization (65%) suggests inefficiencies.

Is Tesla P100 Good for LLMs on Ollama?

If you're looking for affordable Nvidia P100 hosting for LLM inference, the Tesla P100 is a solid choice for models up to 13B. Compared to the RTX2060 Ollama benchmark, it handles larger models better due to its 16GB VRAM, but 16B models may struggle.

✅ Pros of Using NVIDIA P100 for Ollama

  • Handles 7B-13B models efficiently
  • Lower cost than A4000 or V100
  • Better than RTX2060 for 7B+ LLMs

❌ Limitations of NVIDIA P100 for Ollama

  • Struggles with 16B+ models
  • Older Pascal architecture lacks Tensor Cores
If you need affordable LLM hosting, renting a Tesla P100 for Ollama inference at $159–199/month offers great value for small to mid-sized AI models.

Get Started with Tesla P100 Hosting for 3-16B LLMs

For those deploying LLMs on Ollama, choosing the right NVIDIA Tesla P100 hosting solution can significantly impact performance and costs. If you're working with 3B-16B models, the P100 is a solid choice for AI inference at an affordable price.
Flash Sale to Mar.16

Professional GPU Dedicated Server - P100

129.00/mo
35% Off Recurring (Was $199.00)
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 10-Core E5-2660v2
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Tesla P100
  • Microarchitecture: Pascal
  • CUDA Cores: 3584
  • GPU Memory: 16 GB HBM2
  • FP32 Performance: 9.5 TFLOPS
  • Suitable for AI, Data Modeling, High Performance Computing, etc.
Flash Sale to Mar.16

Professional GPU VPS - A4000

102.00/mo
43% OFF Recurring (Was $179.00)
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • 24 CPU Cores
  • 320GB SSD
  • 300Mbps Unmetered Bandwidth
  • Once per 2 Weeks Backup
  • OS: Linux / Windows 10
  • Dedicated GPU: Quadro RTX A4000
  • CUDA Cores: 6,144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Available for Rendering, AI/Deep Learning, Data Science, CAD/CGI/DCC.

Advanced GPU Dedicated Server - V100

229.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2690v3
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia V100
  • Microarchitecture: Volta
  • CUDA Cores: 5,120
  • Tensor Cores: 640
  • GPU Memory: 16GB HBM2
  • FP32 Performance: 14 TFLOPS
  • Cost-effective for AI, deep learning, data visualization, HPC, etc

Enterprise GPU Dedicated Server - RTX 4090

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: GeForce RTX 4090
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 16,384
  • Tensor Cores: 512
  • GPU Memory: 24 GB GDDR6X
  • FP32 Performance: 82.6 TFLOPS
  • Perfect for 3D rendering/modeling , CAD/ professional design, video editing, gaming, HPC, AI/deep learning.

Final Thoughts: Best Models for Tesla P100 on Ollama

For Nvidia P100 rental users, the best models for small-scale AI inference on Ollama are:

  • Llama2-7B (Fastest: 49.66 tokens/s)
  • DeepSeek-r1-7B (Balanced: 34.31 tokens/s)
  • Qwen2.5-7B (Strong alternative: 34.70 tokens/s)
  • DeepSeek-Coder-v2-16B (Best large model: 40.25 tokens/s)
For best performance, stick to 7B–13B models. If you need larger LLM inference, consider upgrading to A4000 or V100 GPUs. Would you like to see further benchmarks comparing the Tesla P100 vs. V100 for Ollama? Let us know in the comments!
Tags:

Ollama P100, Tesla P100 LLMs, Ollama Tesla P100, Nvidia P100 hosting, benchmark P100, Ollama benchmark, P100 for 7B-14B LLMs inference, Nvidia P100 rental, Tesla P100 performance, Ollama LLMs, deepseek-r1 P100, Llama2 P100 benchmark, Qwen2.5 Tesla P100, Nvidia P100 AI inference