With cellcli you can run the calibrate command, this will measure the performance of your harddisks and flashcards. It will eventually report to you the throughput and amount of IOPS that was measured during the calibrate, it will also report to you which luns are not performing up to par.
Calibrate has a few simple options:
CellCLI> help calibrate Usage: CALIBRATE [HARDDISK | FLASHDISK | lun1 [, lun2]] [FORCE] Purpose: Runs raw performance tests on disks. CELLSRV must be shutdown. User id root is required. Options: [FLASHDISK]: option to specify that only flash LUNs be calibrated. [HARDDISK] : option to specify that only hard disk LUNs be calibrated. lun1 [, lun2] : option to specify list of LUNs by name [FORCE] : runs test even while CELLSRV is running. Examples: CALIBRATE CALIBRATE '0_8', '0_10' force CellCLI>
So let’s do a test run of calibrate on one of the harddisks in this cell in this V2 half rack:
CellCLI> calibrate 0_2 force Calibration will take a few minutes... LUN 0_2 on drive [20:2] random read throughput: 168.21 MBPS, and 419 IOPSCALIBRATE results are within an acceptable range. CellCLI>
Now this does tell us something about the throughput and the IOPS that was measured but it does not tells you HOW cellcli did manage to measure this. Let’s have a peek into how cellcli works, in order to set we have to increase the logmode and start cellcli in debugging mode:
[root@dm01cel01 ~]# cellcli -vvv -d CellCLI: Release 11.2.3.2.1 - Production on Sun Oct 27 15:19:55 CET 2013 Copyright (c) 2007, 2012, Oracle. All rights reserved. Cell Efficiency Ratio: 236
What this does, is dumping a lot of extra information on your screen regarding calls that cellcli does to ossmgmt.jar. Now cellcli is nothing but a perl script that calls to that jar file. With cellcli now running in debug mode and loglevel set to the highest we can now run our calibrate command again:
CellCLI> calibrate 0_2 force loop yyn:0 state:0 yychar:-1 tok: -3 sval: calibrate yy:calibrate next yychar:289 reduce state 0, reducing 0 by rule 2 ($$1 :) ... .. . state 37, reducing 4 by rule 4 (line : $$1 cli $$2 EOL) CLI command: CALIBRATE(289) SPOOL arg : NONE(0) CLI verb: CALIBRATE(289) CLI object: NONE(0) 0_2 CLI addverb: attr#1: FORCE(334) ++++Over Value is: 0 ForAll is false CLI addverb: attr#1: FORCE(334) reduce After reduction, shifting from state 0 to state 2 tok: -1 sval: null state 2, reading 0 (end-of-file) Getting the state of CELLSRV services... running Calibration will take a few minutes... LUN 0_2 on drive [20:2] random read throughput: 168.33 MBPS, and 420 IOPSCALIBRATE results are within an acceptable range. CellCLI>
I have truncated the output above a little bit to make readable. Now this does not tell us a whole lot more but we can now see some interesting info in the cellcli logfile in /var/log/oracle/deploy/cellcli.lst.0* where the last parts can vary on your system depending the times you have started cellcli. When scrolling through that logfile you will see the output generated by cellcli you will find our cellcli calibrate command:
Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.CellCLI readAndProcessLine INFO: Executing cellcli command: "calibrate 0_2 force" Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.CellCLI invokeSingleRSCmd FINE: Executing RS command: 0.0STATE CELLSRV!!!!!!!!!!!!!!. Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.CellCLI invokeSingleRSCmd FINE: RS comamnd return status: 2.
You see that the oracle.ossmgmt.ms.cli.CellCLI class checks wetter or not cellsrv is running or not (i have added the force parameter so i can run calibrate without stopping cellsrv).
Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.Calibrate fetchLuns INFO: disknames.size: 1 listluncmd - list lun "0_2" attributes name, devicename, physicaldrives, disktype, lunsize Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults FINER: fetchCli cmd: MOD CALIBRATE :list lun "0_2" attributes name, devicename, physicaldrives, disktype, lunsize Oct 27, 2013 3:22:14 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults FINER: fetchCli result: . 0_2 /dev/sdc 20:2 HardDisk 557.861328125G
Next you see that cellcli uses oracle.ossmgmt.ms.cli.Calibrate to translate my lun name 0_2 to a blockdevice name, in my case /dev/sdc. it the goes on the gather names of all the pun:
Oct 27, 2013 3:22:14 PM oracle.ossmgmt.ms.cli.Calibrate fetchLuns INFO: listluns: 0_2 /dev/sdc 20:2 HardDisk 557.861328125G Oct 27, 2013 3:22:14 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults FINER: fetchCli cmd: MOD CALIBRATE :list physicaldisk attributes name, luns, diskMediaGroup Oct 27, 2013 3:22:20 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults FINER: fetchCli result: . 20:0 0_0 DMG_SYSTEM_HDD1 20:1 0_1 DMG_SYSTEM_HDD1 20:2 0_2 DMG_DATA_HDD1 20:3 0_3 DMG_DATA_HDD1 ... .. . Oct 27, 2013 3:22:20 PM oracle.ossmgmt.ms.cli.Calibrate fetchLuns INFO: listpds: 20:0 0_0 DMG_SYSTEM_HDD1 20:1 0_1 DMG_SYSTEM_HDD1 20:2 0_2 DMG_DATA_HDD1 20:3 0_3 DMG_DATA_HDD1 ... .. .
The output above is shortened for readability. Next oracle.ossmgmt.ms.cli.Calibrate runs scripts_common.sh, which is also run during cell deployment. scripts_common.sh tune_block_io /dev/sdc sets nr_request in /sys/block/sdc/queue/nr_requests to 4096:
Oct 27, 2013 3:22:20 PM oracle.ossmgmt.ms.cli.Calibrate calibrate FINER: Execute: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/deploy/scripts/unix/hwadapter/diskadp/scripts_common.sh tune_block_io /dev/sdc
The nr_requests sets the amount of IO slots 4096 meaning that the deadline IO scheduler has more chance to merge IO requests. This should already been set to 4096 on all of your cells, the calibrate command sets it again anyway.
Oct 27, 2013 3:22:21 PM oracle.ossmgmt.ms.cli.CalibrateTest executeOrion INFO: execute: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/orion -run advanced -type rand -matrix point -disk_start_offset 0 -duration 30 -num_disks 1 -testname rand_0 -disk_end_offset 285184m -size_large 8192 -num_small 0 -num_large 10 Oct 27, 2013 3:22:51 PM oracle.ossmgmt.ms.cli.CalibrateTest executeOrion INFO: execute: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/orion -run advanced -type rand -matrix point -disk_start_offset 0 -duration 30 -num_disks 1 -testname rand_IOPS_0 -disk_end_offset 285184m -size_small 8 -num_large 0 -num_small 20
Finally calibrate starts the actual calibrate, it does this by calling Orion which is located in $OSS_BIN. It runs two separate runs, one doing small 8Kb requests and one doing 8192Kb large IO’s.