Exadata V2 and CPU scaling

A while ago I was doing some bandwith and latency testing of infiniband interfaces on different Exadata configurations and ran into something that might be interesting for Exadata V2 owners out there. You can do some performance test on your IB stack using a set of tools that start with ib_read_*, ib_send_* and ib_write_*. These tools work in a client/server setting where you start a server on side of your path and start a client on the side and start sending and reading data. When running these tests on an Exadata X2 and higher your output will look somewhat like this:

[root@dm0102 ~]# ib_read_bw dm01db01-priv -a
------------------------------------------------------------------
                    RDMA_Read BW Test
Connection type : RC
  local address:  LID 0x0e, QPN 0x40005f, PSN 0x1f908d RKey 0x30003153 VAddr 0x007f368c713000
  remote address: LID 0x0c, QPN 0x28005d, PSN 0x6bbc6e, RKey 0x400428d1 VAddr 0x007f691a135000
Mtu : 2048
------------------------------------------------------------------
 #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]
      2        1000               7.22                  2.25
      4        1000              14.94                 13.86
      8        1000              28.81                 26.53
     16        1000              59.03                 54.09
     32        1000             113.56                104.37
     64        1000             223.18                197.08
    128        1000             113.06                112.79
    256        1000             226.70                226.16
    512        1000             452.23                451.57
   1024        1000             902.15                900.82
   2048        1000            1805.63               1801.26
   4096        1000            2435.64               2434.48
   8192        1000            2916.02               2915.82
  16384        1000            2937.44               2937.33
  32768        1000            2934.50               2931.79
  65536        1000            2941.48               2940.98
 131072        1000            2941.67               2941.60
 262144        1000            2943.20               2941.26
 524288        1000            2940.64               2938.26
1048576        1000            2939.29               2939.29
2097152        1000            2942.21               2941.77
4194304        1000            2941.54               2941.00
8388608        1000            2939.31               2938.80
------------------------------------------------------------------
[root@dm0102 ~]#

Try running this same test on a Exadata V2 and you will see this:

Connection type : RC
  local address:  LID 0x0e, QPN 0x34004f, PSN 0xc3db36 RKey 0xf8015f71 VAddr 0x007f2a75f2d000
  remote address: LID 0x0c, QPN 0x2c0065, PSN 0xdd9285, RKey 0x60003b19 VAddr 0x007f2100646000
Mtu : 2048
------------------------------------------------------------------
 #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]
Conflicting CPU frequency values detected: 1600.000000 != 2534.000000
      2        1000               0.00                  0.00
Conflicting CPU frequency values detected: 1600.000000 != 2534.000000
      4        1000               0.00                  0.00
Conflicting CPU frequency values detected: 1600.000000 != 2534.000000
      8        1000               0.00                  0.00
^C
[root@dm0102 ~]#

It looks like this tool is telling me that my CPU’s are running 1.6Mhz but wants to run at 2.53Ghz. Now this tool is right, on this Exadata my CPU’s are running at that speed:

[root@dm0102 ~]# cat /proc/cpuinfo | grep "cpu MHz" | uniq
cpu MHz		: 1600.000

Also, my tool was right at the expected speed, my V2 system uses an Intel Xeon E5440 running at 2.53Ghz:

[root@dm0102 ~]# cat /proc/cpuinfo | grep "model name" | uniq
model name	: Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz
[root@dm0102 ~]#

So your CPU’s are not running at full speed at an Exadata V2. On my test V2 there is nothing running that generates load on the CPU, if you run some load then some of the CPU’s are kicking into full throttle:

[root@dm0102 ~]# cat /proc/cpuinfo | grep "cpu MHz" | uniq
cpu MHz		: 1600.000
cpu MHz		: 2534.000
cpu MHz		: 1600.000

What is causing this behavior, well it is a linux kernel architecture implementation of CPU frequency scaling. The scaling of the CPU speed is done by a process called the CPU Scaling Governor, and you can see it’s status by looking in the /sys filesystem:

[root@dm0102 ~]# for i in {0..12}; do cat /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand

The status of all my CPU cores is ‘ondemand’ which means that Linux will dynamically scale my CPU speed when needed. We can control the behavior with a daemon simply called cpuspeed:

[root@dm0102 ~]# /etc/init.d/cpuspeed status
Frequency scaling enabled using ondemand governor

We can stop this daemon to disable the CPU Scaling Governor:

[root@dm0102 ~]# /etc/init.d/cpuspeed stop
Disabling ondemand cpu frequency scaling:                  [  OK  ]
[root@dm0102 ~]# for i in {0..12}; do cat /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done
userspace
userspace
userspace
userspace
userspace
userspace
userspace
userspace
userspace
userspace
userspace
userspace
userspace

If we now look at the our CPU speed we can see that they are running at full speed:

[root@dm0102 ~]# cat /proc/cpuinfo | grep "cpu MHz" | uniq
cpu MHz		: 2534.000

Interestingly enough Oracle apparently decided not to use CPU Scaling on their X2 and X3 models.

About these ads

7 thoughts on “Exadata V2 and CPU scaling

  1. ~2.9 GB/s on 40Gb QDR IB with RDS? Yep. Looks about right. Ever wonder why there are folks out there prone to exaggeration saying 40Gb QDR is 4GB/s? :-)

    It might also be interesting to monitor the *real* clock frequencies on modern Xeons since Turbo Boost 2.0 can sometimes have a mind of it’s own. I discussed this topic a bit at my session during last year’s Oaktableworld: http://kevinclosson.files.wordpress.com/2013/03/openworld2012-kc.pdf

    • Yeah those QDR IB speeds are raw speeds, in real life you have to deal with stuff like 8b/10b encoding when putting symbols on the wire. So your real usable bandwidth is way less.

      Thanks for the link to your presentation, I did not see that before, it is an interesting read.

      • You’ll likely never see more than 3.2 GB/s even with rds-stress(OFED).

        For what it’s worth you can view that Oaktableworld session here:

  2. Hoi Klaas-Jan,

    Grappig Artikel.

    Wat me direct opviel is een paar typos, je hbt het over een CPU speed van 1.6 MHz, maar daar bodoel je natuurlijk 1.6 GHz. Die typo zit op meer plaatsen. Niet storend, wel net wat slordig.

    Groeten, Olivier

  3. Hi Klaas-Jan,

    You might want to check with Oracle if they’d prefer to changing the CPU governor to “performance” instead of “ondemand” this way you do not have to disable it entirely. The “performance”-governor also sets the CPU to run at max performance.

    If you want to try your hand (and lab-environment) at some tuning see here for more info: https://wiki.archlinux.org/index.php/CPU_Frequency_Scaling

    Also you might want to check if Turbo Boost Technology is enabled/available on those Exadata-nodes as the Intel E5540’s can run at 2,8Ghz if within power/temperature tolerances.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s