how-to-calculate-the-cpu-usage-in-linux

As we know, one can use top or other tools to get cpu usage。 Below is the output of top

Now I will explain how to calculate the CPU usage in Linux.

One word, the CPU usage is the percent of the time that the CPU is being used to complete tasks.

  • If there is no task to run, CPU is idle;
  • If CPU is doing some work, it’s busy;
  • Sometime CPU is waiting for the IO operation to complete, is’s so-called iowait.

There are two kind of usage.

  1. usage of host, like the third line of top output above, indicating how busy is the machine.
  2. usage of task, like the output of pidstat -p <pid> or top -p <pid>, indicating how busy is the task.

This two kinds are calculated from two different methods.

CPU usage of host

The /proc/stat show the cpu time since boot, you can read man5 proc to learn the meaning of each filed.

We can use this data to calculate the cpu usage.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cat /proc/stat
cpu 213860 18873 343944 189330737 62970 0 31144 0 0 0
cpu0 49673 4080 80138 47314582 14867 0 8846 0 0 0
cpu1 56404 5252 91137 47337714 17009 0 7137 0 0 0
cpu2 53982 4898 86368 47347553 16491 0 6917 0 0 0
cpu3 53799 4642 86300 47330886 14601 0 8242 0 0 0
intr 3954208668 0 0 1591691 74 0 10698876 447683 0 0 0 10104090 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3907847122 0 0 0 0 0 0 0 0 218 0 274327 0 0 0 0 0 0 0 0 1261 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 244707 4212 0 0 0 0 3852 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22990556 0 0 0
ctxt 66538962
btime 1653442436
processes 115120
procs_running 1
procs_blocked 0
softirq 121681140 2 9751934 1197630 19644916 285353 0 76231181 11528062 4160 3037902
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
$ man5 proc

/proc/stat
kernel/system statistics. Varies with architecture.
Common entries include:

cpu 10132153 290696 3084719 46828483 16683 0 25195 0
175628 0
cpu0 1393280 32966 572056 13343292 6130 0 17875 0 23933 0
The amount of time, measured in units of USER_HZ
(1/100ths of a second on most architectures, use
sysconf(_SC_CLK_TCK) to obtain the right value),
that the system ("cpu" line) or the specific CPU
("cpuN" line) spent in various states:

user (1) Time spent in user mode.

nice (2) Time spent in user mode with low
priority (nice).

system (3) Time spent in system mode.

idle (4) Time spent in the idle task. This value
should be USER_HZ times the second entry in
the /proc/uptime pseudo-file.

iowait (since Linux 2.5.41)
(5) Time waiting for I/O to complete. This
value is not reliable, for the following
reasons:

1. The CPU will not wait for I/O to
complete; iowait is the time that a task
is waiting for I/O to complete. When a
CPU goes into idle state for outstanding
task I/O, another task will be scheduled
on this CPU.

2. On a multi-core CPU, the task waiting for
I/O to complete is not running on any
CPU, so the iowait of each CPU is
difficult to calculate.

3. The value in this field may decrease in
certain conditions.

irq (since Linux 2.6.0)
(6) Time servicing interrupts.

softirq (since Linux 2.6.0)
(7) Time servicing softirqs.

steal (since Linux 2.6.11)
(8) Stolen time, which is the time spent in
other operating systems when running in a
virtualized environment

guest (since Linux 2.6.24)
(9) Time spent running a virtual CPU for
guest operating systems under the control of
the Linux kernel.

guest_nice (since Linux 2.6.33)
(10) Time spent running a niced guest
(virtual CPU for guest operating systems
under the control of the Linux kernel).

We cat /proc/stat and store the content as stat1, sleep a while, cat-store again as stat2.

1
2
3
4
5
total = (stat2.cpu.user + stat2.cpu.nice + stat2.cpu.system + stat2.cpu.idle + stat2.cpu.iowait + stat2.cpu.irq + stat2.cpu.softirq + stat2.cpu.steal + stat2.cpu.guest + stat2.cpu.guest)-
(stat1.cpu.user + stat1.cpu.nice + stat1.cpu.system + stat1.cpu.idle + stat1.cpu.iowait + stat1.cpu.irq + stat1.cpu.softirq + stat1.cpu.steal + stat1.cpu.guest + stat1.cpu.guest)
idle = stat2.cpu.idle - stat1.cpu.idle
idle.percent = idle/total * 100 %
usage = cpu.percent = 1 - idle

Usage is between 0 and 100%.

In multi process machine, we can see the cpu0, cpu1… lines, so we can get the single core usage by same method.

CPU usage of task

The /proc/<pid>/stat show the cpu time used by the task since the task was started.
For example: task pid = 1024, the path is /proc/1024/stat.

You can also read the man proc the learn the meaning of each filed.

1
2
$ cat /proc/1024/stat
1024 (java) S 1 730 730 0 -1 1077936384 1225 0 152 0 3192 11929 0 0 20 0 9 0 7098 5029285888 3554 18446744073709551615 1 1 0 0 0 0 0 0 2143420159 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
$ man5 proc 
/proc/[pid]/stat
Status information about the process. This is used by
ps(1). It is defined in the kernel source file
fs/proc/array.c.

The fields, in order, with their proper scanf(3) format
specifiers, are listed below. Whether or not certain of
these fields display valid information is governed by a
ptrace access mode PTRACE_MODE_READ_FSCREDS |
PTRACE_MODE_NOAUDIT check (refer to ptrace(2)). If the
check denies access, then the field value is displayed as
0. The affected fields are indicated with the marking
[PT].

(1) pid %d
The process ID.

(2) comm %s
The filename of the executable, in parentheses.
Strings longer than TASK_COMM_LEN (16) characters
(including the terminating null byte) are silently
truncated. This is visible whether or not the
executable is swapped out.

(3) state %c
One of the following characters, indicating process
state:

R Running

S Sleeping in an interruptible wait

D Waiting in uninterruptible disk sleep

Z Zombie

T Stopped (on a signal) or (before Linux 2.6.33)
trace stopped

t Tracing stop (Linux 2.6.33 onward)

W Paging (only before Linux 2.6.0)

X Dead (from Linux 2.6.0 onward)

x Dead (Linux 2.6.33 to 3.13 only)

K Wakekill (Linux 2.6.33 to 3.13 only)

W Waking (Linux 2.6.33 to 3.13 only)

P Parked (Linux 3.9 to 3.13 only)

(4) ppid %d
The PID of the parent of this process.

(5) pgrp %d
The process group ID of the process.

(6) session %d
The session ID of the process.

(7) tty_nr %d
The controlling terminal of the process. (The
minor device number is contained in the combination
of bits 31 to 20 and 7 to 0; the major device
number is in bits 15 to 8.)

(8) tpgid %d
The ID of the foreground process group of the
controlling terminal of the process.

(9) flags %u
The kernel flags word of the process. For bit
meanings, see the PF_* defines in the Linux kernel
source file include/linux/sched.h. Details depend
on the kernel version.

The format for this field was %lu before Linux 2.6.

(10) minflt %lu
The number of minor faults the process has made
which have not required loading a memory page from
disk.

(11) cminflt %lu
The number of minor faults that the process's
waited-for children have made.

(12) majflt %lu
The number of major faults the process has made
which have required loading a memory page from
disk.

(13) cmajflt %lu
The number of major faults that the process's
waited-for children have made.

(14) utime %lu
Amount of time that this process has been scheduled
in user mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK)). This includes guest time,
guest_time (time spent running a virtual CPU, see
below), so that applications that are not aware of
the guest time field do not lose that time from
their calculations.

(15) stime %lu
Amount of time that this process has been scheduled
in kernel mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK)).

(16) cutime %ld
Amount of time that this process's waited-for
children have been scheduled in user mode, measured
in clock ticks (divide by sysconf(_SC_CLK_TCK)).
(See also times(2).) This includes guest time,
cguest_time (time spent running a virtual CPU, see
below).

(17) cstime %ld
Amount of time that this process's waited-for
children have been scheduled in kernel mode,
measured in clock ticks (divide by
sysconf(_SC_CLK_TCK)).

(18) priority %ld
(Explanation for Linux 2.6) For processes running a
real-time scheduling policy (policy below; see
sched_setscheduler(2)), this is the negated
scheduling priority, minus one; that is, a number
in the range -2 to -100, corresponding to real-time
priorities 1 to 99. For processes running under a
non-real-time scheduling policy, this is the raw
nice value (setpriority(2)) as represented in the
kernel. The kernel stores nice values as numbers
in the range 0 (high) to 39 (low), corresponding to
the user-visible nice range of -20 to 19.

Before Linux 2.6, this was a scaled value based on
the scheduler weighting given to this process.

(19) nice %ld
The nice value (see setpriority(2)), a value in the
range 19 (low priority) to -20 (high priority).

(20) num_threads %ld
Number of threads in this process (since Linux
2.6). Before kernel 2.6, this field was hard coded
to 0 as a placeholder for an earlier removed field.

(21) itrealvalue %ld
The time in jiffies before the next SIGALRM is sent
to the process due to an interval timer. Since
kernel 2.6.17, this field is no longer maintained,
and is hard coded as 0.

(22) starttime %llu
The time the process started after system boot. In
kernels before Linux 2.6, this value was expressed
in jiffies. Since Linux 2.6, the value is
expressed in clock ticks (divide by
sysconf(_SC_CLK_TCK)).

The format for this field was %lu before Linux 2.6.

(23) vsize %lu
Virtual memory size in bytes.

(24) rss %ld
Resident Set Size: number of pages the process has
in real memory. This is just the pages which count
toward text, data, or stack space. This does not
include pages which have not been demand-loaded in,
or which are swapped out. This value is
inaccurate; see /proc/[pid]/statm below.

(25) rsslim %lu
Current soft limit in bytes on the rss of the
process; see the description of RLIMIT_RSS in
getrlimit(2).

(26) startcode %lu [PT]
The address above which program text can run.

(27) endcode %lu [PT]
The address below which program text can run.

(28) startstack %lu [PT]
The address of the start (i.e., bottom) of the
stack.

(29) kstkesp %lu [PT]
The current value of ESP (stack pointer), as found
in the kernel stack page for the process.

(30) kstkeip %lu [PT]
The current EIP (instruction pointer).

(31) signal %lu
The bitmap of pending signals, displayed as a
decimal number. Obsolete, because it does not
provide information on real-time signals; use
/proc/[pid]/status instead.

(32) blocked %lu
The bitmap of blocked signals, displayed as a
decimal number. Obsolete, because it does not
provide information on real-time signals; use
/proc/[pid]/status instead.

(33) sigignore %lu
The bitmap of ignored signals, displayed as a
decimal number. Obsolete, because it does not
provide information on real-time signals; use
/proc/[pid]/status instead.

(34) sigcatch %lu
The bitmap of caught signals, displayed as a
decimal number. Obsolete, because it does not
provide information on real-time signals; use
/proc/[pid]/status instead.

(35) wchan %lu [PT]
This is the "channel" in which the process is
waiting. It is the address of a location in the
kernel where the process is sleeping. The
corresponding symbolic name can be found in
/proc/[pid]/wchan.

(36) nswap %lu
Number of pages swapped (not maintained).

(37) cnswap %lu
Cumulative nswap for child processes (not
maintained).

(38) exit_signal %d (since Linux 2.1.22)
Signal to be sent to parent when we die.

(39) processor %d (since Linux 2.2.8)
CPU number last executed on.

(40) rt_priority %u (since Linux 2.5.19)
Real-time scheduling priority, a number in the
range 1 to 99 for processes scheduled under a real-
time policy, or 0, for non-real-time processes (see
sched_setscheduler(2)).

(41) policy %u (since Linux 2.5.19)
Scheduling policy (see sched_setscheduler(2)).
Decode using the SCHED_* constants in
linux/sched.h.

The format for this field was %lu before Linux
2.6.22.

(42) delayacct_blkio_ticks %llu (since Linux 2.6.18)
Aggregated block I/O delays, measured in clock
ticks (centiseconds).

(43) guest_time %lu (since Linux 2.6.24)
Guest time of the process (time spent running a
virtual CPU for a guest operating system), measured
in clock ticks (divide by sysconf(_SC_CLK_TCK)).

(44) cguest_time %ld (since Linux 2.6.24)
Guest time of the process's children, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK)).

(45) start_data %lu (since Linux 3.3) [PT]
Address above which program initialized and
uninitialized (BSS) data are placed.

(46) end_data %lu (since Linux 3.3) [PT]
Address below which program initialized and
uninitialized (BSS) data are placed.

(47) start_brk %lu (since Linux 3.3) [PT]
Address above which program heap can be expanded
with brk(2).

(48) arg_start %lu (since Linux 3.5) [PT]
Address above which program command-line arguments
(argv) are placed.

(49) arg_end %lu (since Linux 3.5) [PT]
Address below program command-line arguments (argv)
are placed.

(50) env_start %lu (since Linux 3.5) [PT]
Address above which program environment is placed.

(51) env_end %lu (since Linux 3.5) [PT]
Address below which program environment is placed.

(52) exit_code %d (since Linux 3.5) [PT]
The thread's exit status in the form reported by
waitpid(2).

So the cpu usage of one task is the cpu time used by the task(14 utime + 15 stime, unit is clock ticks) / the real time past (from /proc/uptime, unit is seconds).

We cat /proc/<pid>/stat and /proc/uptime, store the content as stat1 and uptime1, sleep a while, cat-store again as stat2 and uptime2.

1
2
3
4
cpu usage of task = 
((stat2.utime+stat2.stime-stat1.utime-stat1.stime)/sysconf(_SC_CLK_TCK)) /
(uptime2 - uptime1)
* 100%

Usage is between 0 and (count of cpus) * 100%.
If the task has multi thread and all run in different core together, than the cpu usage of task is larger than 100%.
That`s different from the cpu usage of host.