Introduction
If you have an NVIDIA GPU like GeForce, RTX, Tesla, or Quadro, you can monitor GPU Ubuntu server using the command-line tool nvidia-smi. This guide explains everything a beginner needs to know, from installation to real-time monitoring.
What is NVIDIA-SMI?
nvidia-smi (NVIDIA System Management Interface) is a command-line utility included with NVIDIA GPU drivers. It allows you to:
- Monitor GPU usage and memory
- Check temperature and power consumption
- Track which processes are using the GPU
- Manage GPU settings such as persistence mode
It comes pre-installed with the NVIDIA RTX driver and CUDA toolkit. There’s no separate installation needed if you install the official NVIDIA driver.
Installing NVIDIA-SMI
- Make sure your GPU driver is installed. Example for Ubuntu 24 with RTX 4090:
sudo apt install -y nvidia-driver-535Verify nvidia-smi is available:
nvidia-smiBasic NVIDIA-SMI Commands
- Check GPU status
nvidia-smiShows a summary of GPU usage, driver version, memory, and running processes.
2. Detailed information
nvidia-smi -aDisplays complete info about your GPU monitoring and CUDA monitoring, including temperature, power usage, and performance state.
3. Real-time monitoring
watch -n 1 -d nvidia-smiRefreshes GPU stats every second for continuous monitoring.
4. List all NVIDIA devices
nvidia-smi -LExample output:
GPU 0: NVIDIA GeForce RTX 3060 Ti (UUID: GPU-fa3da260-9c42-828f-981a-f6d7b48d77b3)- Query specific GPU details
nvidia-smi --query-gpu=index,name,uuid,serial --format=csv- Monitor GPU utilization per second
nvidia-smi dmonColumns include: GPU, Power, Temperature, SM (compute usage), Memory, Encoder/Decoder usage, and Clock speeds.
7. Monitor per-process GPU usage
nvidia-smi pmonShows GPU usage per running process, including PID, process type, SM%, memory%, encoder/decoder%, and command.
Understanding NVIDIA-SMI Metrics
| Metric | Description |
|---|---|
| Temp | Core GPU temperature in °C. Normal operation is <90°C. |
| Perf | Performance state (P0 maximum – P12 minimum). |
| Persistence-M | “On” keeps the NVIDIA driver loaded for faster application startup. |
| Pwr: Usage/Cap | Current power consumption in Watts. |
| Bus-Id | PCI bus ID of GPU. |
| Disp.A | Display active flag; Off if no display attached. |
| Memory-Usage | GPU memory allocation; important for AI/ML workloads. |
| Volatile Uncorr. ECC | Error count of memory corruption since last driver load. |
| GPU-Util | Percent of GPU utilization over the sampling period. |
| Compute M. | GPU compute mode (shared or exclusive). |
| GPU | Index for multi-GPU systems. |
| PID / Process Name | Process ID and name using GPU. |
Tips for Beginners
- Run
nvidia-smifrequently when training models to monitor memory and utilization. - Use
watch -n 1 nvidia-smifor real-time observation. - If GPU temperature exceeds 90°C, check cooling or reduce load.
- Check per-process usage with
pmonto find resource-heavy applications.
Summary
nvidia-smi is the most reliable tool for monitoring NVIDIA GPUs on Ubuntu.
- Comes pre-installed with drivers
- Provides real-time GPU usage, memory, and temperature
- Supports per-process monitoring and multi-GPU setups
Beginners can use this to track GPU health, optimize AI workloads on Linux GPU server, or debug performance issues.
