Throttling IO with Linux

Why?

I guess the first question which comes to mind when reading this title is ‘Why’? For a database, but I guess for any IO depended application, we want IO’s to be faster, not throttle them, alias make them slower. Well, the ‘why’ is: if you want to investigate IO’s, you sometimes want them to slow down, so it’s easier to see them. Also, (not so) recent improvements in the Oracle database made great progress in being able to use the available bandwidth by doing IO in parallel, which could strip away much of the ability to see them in Oracle’s popular SQL trace.

Virtualisation

I use VMWare Fusion on my MacBook and use Linux in the VM’s to run the Oracle datatabase. Desktop virtualisation, like VMWare Fusion (and Virtualbox and VMWare Workstation, I think all desktop virtualisation products) use the operating system IO subsystem. This introduces a funny effect: if you stress the IO subsystem in the VM, and measure throughput, it looks like the disk or disks are getting faster and faster every run. The reason for this effect is the blocks in the file, which is the disk from the perspective of the VM, are getting touched (read and/or written) more and more, thus are increasingly better candidates for caching from the perspective of the underlying operating system.

I think that if you combine the ‘disk getting faster’ effect with the need to investigate IO’s, you understand that it can be beneficial to throttle IO’s in certain cases.

Cgroups

The mechanism which can be used to control and throttle resources is ‘cgroups’. Cgroups is a Linux kernel feature, which is an abbreviation of ‘control groups’, and has the function to limit, account and isolate resource usage (see this wikipedia article. Cgroups are a Linux kernel feature since kernel version 2.6.24. This means there is no cgroups in Redhat and Oracle linux version 5, but there is version 6.

The idea behind cgroups is to have control over resources in a system, which becomes more and more important with today’s systems getting bigger. Cgroups have been created to function from single processes to complete (virtualised) systems.

Simple setup and usage

(please mind all commands are either executed as root (indicated by ‘#’), or as a regular user (oracle in my case, indicated by ‘$’))

First, we need to make sure the ‘blkio’ controller is available:

# grep blkio /proc/mounts || mkdir -p /cgroup/blkio ; mount -t cgroup -o blkio none /cgroup/blkio

Next, we create a cgroup called ‘iothrottle’:

# cgcreate -g blkio:/iothrottle

In order to throttle IO on the device, we need to find the major and minor number of the block device. If you use ASM, you can list the PATH field in the V$ASM_DISK view, and generate a long listing of it on linux:

$ ls -ls /dev/oracleasm/disk1 
0 brw-rw----. 1 oracle dba 8, 16 Dec 15 13:22 /dev/oracleasm/disk1

This shows that the major and minor number of the block device are: 8 16.

The next step is to use the ‘read_iops_device’ configuration option of the blkio controller to apply throttling to the ‘iothrottle’ cgroup. The ‘read_ops_device’ configuration option uses the following format: major_number:minor_number nr_IO_per_second (major:minor, space, maximum number of read IO’s per second)

cgset -r blkio.throttle.read_iops_device="8:16 10" iothrottle

Okay, we now have a cgroup called ‘iothrottle’ setup, and used the ‘read_iops_device’ option of the ‘blkio’ controller. Please mind there are no processes assigned to the cgroup yet. The next steps are to use an IO generation and measurement tool to first measure uncapped IO performance, then assign the process to the ‘iothrottle’ cgroup, and rerun performance measurement.

For the IO tests I use ‘fio’. This tool gives you the opportunity to investigate your system’s IO subsystem and IO devices performance. This is my fio.run file:

$ cat fio.run 
[global]
ioengine=libaio
bs=8k
direct=1
numjobs=4
rw=read

[simple]
size=1g
filename=/dev/oracleasm/disk1

Now run it! Please mind I’ve snipped a large part of the output, because fio gives a great deal of output, which is extremely interesting, but not really relevant to this blog:

$ fio --section=simple fio.run 
simple: (g=0): rw=read, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1
...
simple: (g=0): rw=read, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1
fio 1.57
Starting 4 processes
fio: only root may flush block devices. Cache flush bypassed!
fio: only root may flush block devices. Cache flush bypassed!
fio: only root may flush block devices. Cache flush bypassed!
fio: only root may flush block devices. Cache flush bypassed!
Jobs: 2 (f=2): [_R_R] [100.0% done] [123.3M/0K /s] [15.4K/0  iops] [eta 00m:00s]
...

So, we did an average of 15.4K read IOPS. Now lets put the process which runs fio in the ‘iothrottle’ cgroup!
Get PID of the process we just used with ‘fio’:

$ echo $$
5994

And assign the ‘iothrottle’ cgroup to it:

# echo 5994 > /cgroup/blkio/iothrottle/tasks

You can see if your process is assigned a cgroup by reading the ‘cgroup’ file in ‘proc’:

$ cat /proc/self/cgroup 
1:blkio:/iothrottle

Okay, we are assigned the ‘iothrottle’ cgroup! Now rerun the ‘simple’ fio benchmark:

$ fio --section=simple fio.run 
simple: (g=0): rw=read, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1
...
simple: (g=0): rw=read, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1
fio 1.57
Starting 4 processes
fio: only root may flush block devices. Cache flush bypassed!
fio: only root may flush block devices. Cache flush bypassed!
fio: only root may flush block devices. Cache flush bypassed!
fio: only root may flush block devices. Cache flush bypassed!
Jobs: 4 (f=4): [RRRR] [0.3% done] [81K/0K /s] [9 /0  iops] [eta 14h:37m:42s]

To be honest, I cancelled this fio run after a little while, because the time it takes to run is very long (approximately 14 hours and 30 minutes, as can seen above).
I think this example shows the cgroup ‘iothrottle’ in action very clear!

Cgroup usage in reality

I can’t imagine anybody want to echo all the process ID’s in the cgroup’s ‘tasks’ file in order to get these processes in a certain cgroup. With my DBA background, I would love to have control over an entire (all processes belonging to a) database. Setting up cgroups like done above (manually) means you have to set it up every time the server reboots. Luckily, there is a way to automate cgroup creation and assigning!

cgconfig service
In order to create cgroups, there is a service called ‘cgconfig’, which reads the file /etc/cgconfig.conf. In order to get the ‘iothrottle’ cgroup, and the disk throttling configuration created automatically, use this configuration:

mount {
	blkio = /cgroup/blkio;
}

group iothrottle {
	blkio {
		blkio.throttle.read_iops_device="8:16 10";
	}
}

In order to use this configuration restart the cgconfig service using ‘service cgconfig restart’. Optionally, you can enable automatic starting of this service on startup using ‘chkconfig –level 2345 cgconfig on’ (optionally check when this service is started with ‘chkconfig –list cgconfig’). Now the cgroup is created. But now do we assign processes to it?

cgred service
This is where the cgred service is for. This daemon uses a simple configuration file: /etc/cgrules.conf. Once configured and active, this service assigns cgroups to users, groups or processes. For the purpose of limiting IO from an Oracle database, I created this simple line:

oracle			blkio			/iothrottle

Now the cgred service can be started using ‘service cgred restart’. Optionally, you can enable automatic starting of this service using ‘chkconfig –level 2345 cgred on’.

Summary

The purpose of this blogpost was to introduce cgroups, and let you understand why I choose the IO throttling functionality. Next it showed how to setup cgroups manually, and a simple test to prove it works, with enough information to let you repeat the test for yourself. The last part showed how to automate cgroup creation and assignment.

A word of caution is on it’s place. It’s a fairly new feature at the time of writing, which means it could break or does not work as expected. So use at your own risk! In my limited tests it worked like a charm.

13 comments
  1. Nice article. 1 comment though : Oracle Linux 5 has Cgroups with the UEK kernel. There are just no binaries or scripts to manage it. So you have to manually edit the necessary files to enable it.

    • Hi Bjorn! Thank you for stopping by, and commenting!

      You are right, cgroups is a kernel function which entered the kernel at version 2.6.24; if the kernel version is the same or higher than that version, you got the cgroups functionality. The UEK kernel version is 2.6.32, and you can install that on OL5. Still I think it only makes sense to use it with OL6 in the most cases, which is why I put it that way.

  2. Awesome for performance bottleneck examples.
    The linux distro I had 2.6.32 needed to have cgroup installed
    yum install libcgroup

    Thanks for posting this!

    – Kyle Hailey

  3. Pingback: LVM — mini Wiki

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.