[https://github.com/Syllo/nvtop NVTOP] stands for Neat Videocard TOP, a (h)top like task monitor for GPUs and accelerators. It can handle multiple GPUs and print information about them in a htop-familiar way.
Because a picture is worth a thousand words:
[[File:NVTOP.png|1121x433px]]
__FORCETOC__
= Monitor GPUs usage =
NVTOP can monitor single or multiple GPUs. It can show the GPU usage and its memory.
One can also select a specific device from the menu (F2 -> GPU Select).
NVTOP is useful to monitor and verify that your job is using the GPU as efficiently as possible.
== Monitor batch job ==
If you have submitted a non-interactive job and would like to see its current GPU usage.
1. From a login node, find the job id and select the one to monitor:
{{Command|sq}}
2. Attach to the running job:
{{Command|srun --pty --jobid JOBID nvtop}}
== Monitor interactive job ==
1. Start your interactive job with minimal resources.
2. In a second terminal, connect to the login node, find the job id:
{{Command|sq}}
3. Attach to the running job:
{{Command|srun --pty --jobid JOBID nvtop}}
You'll be able to see the usage in real time as you run your commands in the first terminal.
== Monitor a GPU on a specific node ==
When running multi-nodes jobs, it can be useful to verify that one or all GPUs are effectively used.
1. From a login node, find the job id and identify the node names:
{{Commands
|sq
|srun --jobid JOBID -n1 -c1 scontrol show hostname
}}
2. Attach to the running job on the specific node:
{{Command|srun --pty --jobid JOBiD --nodelist NODENAME nvtop}}