# Inserts on HCC tables

There are already a lot of blogposts and presentations done about Hybrid Columnar Compression and i am adding one more blogpost to that list. Recently i was doing some small tests one HCC and noticed that that inserts on a HCC row didn’t got compressed and yes i was using direct path loads:

```DBA@TEST1> create table hcc_me (text1 varchar2(4000)) compress for archive high;

Table created.

KJJ@TEST1> insert /*+ append */ into hcc_me select dbms_random.string('x',100) from dual;

1 row created.

KJJ@TEST1> commit;

Commit complete.

KJJ@TEST1> select rowid from hcc_me;

ROWID
------------------
AAAWw/AAAAAACXzAAA

DBA@TEST1> @compress
Enter value for schemaname: kjj
Enter value for tablename: hcc_me
Enter value for rowid: AAAWw/AAAAAACXzAAA
old   2:    dbms_compression.get_compression_type(upper('&SchemaName'),upper('&TableName'),'&RowID'),
new   2:    dbms_compression.get_compression_type(upper('kjj'),upper('hcc_me'),'AAAWw/AAAAAACXzAAA'),

COMPRESSION_TYPE
---------------------
COMP_NOCOMPRESS
```

So my row did not got compressed, let’s insert a little bit more data into our test table:

```declare
sql_stmt varchar(200);
begin
for i in 1..1000 loop
sql_stmt := 'insert /*+ append_values */ into hcc_me select dbms_random.string(''x'',100) from dual';
execute immediate sql_stmt;
commit;
end loop;
end;
/
```

And lets see what we end up with:

```select count(*), compression_type from (
select decode(dbms_compression.get_compression_type('KJJ','HCC_ME',rowid),
1, 'COMP_NOCOMPRESS',
2, 'COMP_FOR_OLTP',
4, 'COMP_FOR_QUERY_HIGH',
8, 'COMP_FOR_QUERY_LOW',
16, 'COMP_FOR_ARCHIVE_HIGH',
32, 'COMP_FOR_ARCHIVE_LOW',
64, 'COMP_BLOCK',
1000000, 'COMP_RATIO_MINROWS',
-1, 'COMP_RATIO_ALLROWS') "COMPRESSION_TYPE"
from hcc_me)
group by compression_type;
```

so none of my records got compressed:

```  COUNT(*) COMPRESSION_TYPE
---------- ---------------------
1000 COMP_NOCOMPRESS
```

Maybe it size dependent, the row i am inserting into this HCC table is extremely small. Lets re-create the table and make every row one byte bigger then the previous row:

```declare

sql_stmt    varchar(200);
v_random1   varchar2(4000);

begin

execute immediate 'drop table hcc_me';
execute immediate 'create table hcc_me (text1 varchar2(4000)) compress for archive high';

for i in 1..1000 loop
v_random1 := dbms_random.string('x', i);
sql_stmt := 'insert /*+ append_values */ into hcc_me values (:1)';
execute immediate sql_stmt using v_random1;
commit;
end loop;
end;
```

This will give me a table that has row 1 being 1 byte big and the last row 1000 bytes big. re-run our select statement and see if we have HCC compressed rows now:

```  COUNT(*) COMPRESSION_TYPE
---------- ---------------------
697 COMP_FOR_ARCHIVE_HIGH
303 COMP_NOCOMPRESS

2 rows selected.
```

Victory! But now lets see where our records are starting to compress, lets adapt the query a bit:

```select vsize(text1) row_bytes,
decode(dbms_compression.get_compression_type('KJJ','HCC_ME',rowid),
1, 'COMP_NOCOMPRESS',
2, 'COMP_FOR_OLTP',
4, 'COMP_FOR_QUERY_HIGH',
8, 'COMP_FOR_QUERY_LOW',
16, 'COMP_FOR_ARCHIVE_HIGH',
32, 'COMP_FOR_ARCHIVE_LOW',
64, 'COMP_BLOCK',
1000000, 'COMP_RATIO_MINROWS',
-1, 'COMP_RATIO_ALLROWS') COMPRESSION_TYPE
from hcc_me;

ROW_BYTES COMPRESSION_TYPE
---------- ---------------------
1 COMP_NOCOMPRESS
2 COMP_NOCOMPRESS
3 COMP_NOCOMPRESS
4 COMP_NOCOMPRESS
5 COMP_NOCOMPRESS
<cut>
292 COMP_NOCOMPRESS
293 COMP_NOCOMPRESS
294 COMP_NOCOMPRESS
295 COMP_FOR_ARCHIVE_HIGH
296 COMP_NOCOMPRESS
297 COMP_NOCOMPRESS
298 COMP_NOCOMPRESS
299 COMP_NOCOMPRESS
300 COMP_FOR_ARCHIVE_HIGH
301 COMP_NOCOMPRESS
302 COMP_NOCOMPRESS
303 COMP_NOCOMPRESS
304 COMP_FOR_ARCHIVE_HIGH
305 COMP_FOR_ARCHIVE_HIGH
306 COMP_FOR_ARCHIVE_HIGH
307 COMP_FOR_ARCHIVE_HIGH
308 COMP_FOR_ARCHIVE_HIGH
309 COMP_FOR_ARCHIVE_HIGH
310 COMP_FOR_ARCHIVE_HIGH
311 COMP_FOR_ARCHIVE_HIGH
312 COMP_FOR_ARCHIVE_HIGH
313 COMP_FOR_ARCHIVE_HIGH
314 COMP_NOCOMPRESS
315 COMP_NOCOMPRESS
316 COMP_FOR_ARCHIVE_HIGH
317 COMP_FOR_ARCHIVE_HIGH
318 COMP_FOR_ARCHIVE_HIGH
319 COMP_FOR_ARCHIVE_HIGH
<cut>
996 COMP_FOR_ARCHIVE_HIGH
997 COMP_FOR_ARCHIVE_HIGH
998 COMP_FOR_ARCHIVE_HIGH
999 COMP_FOR_ARCHIVE_HIGH
1000 COMP_FOR_ARCHIVE_HIGH

1000 rows selected.
```

Alright, this is unexpected. I ran this test multiple times and ended up with different results, some rows around the 300 bytes mark are getting compressed and some are not. So somewhere around the 300 bytes oracle decides randomly wetter or not to compress. Oh wait, randomly… i am using DBMS_RANDOM to fill my 1 column with data, so lets take the random factor out of the equation and fill our rows fit on fixed character:

```declare

sql_stmt    varchar(200);

begin

execute immediate 'create table hcc_me (text1 varchar2(4000)) compress for archive high';

for i in 1..1000 loop
sql_stmt := 'insert /*+ append_values */ into hcc_me select lpad( ''x'','||i||',''x'') from dual';
execute immediate sql_stmt;
commit;
end loop;
end;
```

Now we end up with a much effective compression:

``` ROW_BYTES COMPRESSION_TYPE
---------- ---------------------
1 COMP_NOCOMPRESS
2 COMP_NOCOMPRESS
3 COMP_NOCOMPRESS
4 COMP_NOCOMPRESS
5 COMP_NOCOMPRESS
<cut>
65 COMP_NOCOMPRESS
66 COMP_NOCOMPRESS
67 COMP_NOCOMPRESS
68 COMP_FOR_ARCHIVE_HIGH
69 COMP_FOR_ARCHIVE_HIGH
70 COMP_FOR_ARCHIVE_HIGH
71 COMP_FOR_ARCHIVE_HIGH
<cut>
998 COMP_FOR_ARCHIVE_HIGH
999 COMP_FOR_ARCHIVE_HIGH
1000 COMP_FOR_ARCHIVE_HIGH

1000 rows selected.
```

Moral of this story is to be careful with HCC and small inserts, there is some logic built into the HCC engine that decides wetter or not it should compress based on size and estimated compression ratio. If you end up with a situation were not all rows are compressed (the insert was to small, forgot the do a direct load, or for whatever reason) a simple alter table move will compress all rows again.

# Cellcli can lie to you…

Yes that is true, i said it, cellcli can lie to you there are some special cases were the output of cellcli is not the reality and you should double check it’s output with your standard OS tools. So this the output if dcli calling cellcli on an Exadata rack from an client:

```[root@dm01db01 ~]# dcli -g cell_group -l root cellcli -e list cell attributes cellsrvStatus,msStatus,rsStatus
dm01cel01: running       running         running
dm01cel02: running       running         running
dm01cel03: stopped       running         running
dm01cel04: running       running         running
dm01cel05: running       running         running
dm01cel06: running       running         running
dm01cel07: running       running         running
dm01cel08: running       running         running
dm01cel09: running       running         running
dm01cel10: running       running         running
dm01cel11: running       running         running
dm01cel12: running       running         running
dm01cel13: running       running         running
dm01cel14: running       running         running
[root@dm01db01 ~]#
```

It seems that on this node cellsrv on cel03 has stopped, lets zoom in and logon to that cell and verify:

```CellCLI> list cell attributes cellsrvStatus,msStatus,rsStatus
stopped         running         running
```

Well that is expected, same command some output but now double-check this with what is actually happening on the OS. Let see what processes are actually running at OS level:

```[root@dm01cel03 trace]# ps -ef | grep cellsrv/bin/cell[srv]
root      3143  3087  0 12:00 ?        00:00:00 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellsrvstat -interval=5 -count=720
root     21040     1 60 Mar26 ?        20:33:37 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellsrv 100 5000 9 5042
root     25662     1  0 Mar26 ?        00:02:08 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellrssrm -ms 1 -cellsrv 1
root     25673 25662  0 Mar26 ?        00:00:07 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellrsbmt -ms 1 -cellsrv 1
root     25674 25662  0 Mar26 ?        00:00:07 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellrsmmt -ms 1 -cellsrv 1
root     25676 25673  0 Mar26 ?        00:00:01 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellrsbkm -rs_conf /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/deploy/config/cellinit.ora -ms_conf /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/deploy/config/cellrsms.state -cellsrv_conf /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/deploy/config/cellrsos.state -debug 0
root     25710 25676  0 Mar26 ?        00:00:07 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellrssmt -rs_conf /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/deploy/config/cellinit.ora -ms_conf /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/deploy/config/cellrsms.state -cellsrv_conf /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/deploy/config/cellrsos.state -debug 0
[root@dm01cel03 trace]#
```

That is confusing, cellcli tells me that cellsrv is down but “ye good ol'” ps is telling me that cellsrv is up-and-running as it should. It looks like my cell storage is available, lets verify if we got all 12 disks from that cell available at ASM level and if we don’t have any repair timers counting down:

```SYS@+ASM1> select count(*), repair_timer from v\$asm_disk where path like '%DATA%dm05cel03' group by repair_timer;

COUNT(*) REPAIR_TIMER
---------- ------------
12            0

1 row selected.
```

All disks are having a repair timer of 0, meaning that no disks have failed, if there would really be a problem with the disks we would see the repair_timer counting down.

Now we can confirm that cellsrv is available and the output of cellcli is just plain wrong here. What is going on here. Lets start by checken the cell alertlog in \$CELLTRACE

```[RS] monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/cellsrv/bin/cellrsomt (pid: 0) returned with error: 126
[RS] Monitoring process for service CELLSRV detected a flood of restarts. Disable monitoring process.
Errors in file /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/log/diag/asm/cell/dm05cel03/trace/rstrc_25662_4.trc  (incident=179):
RS-7445 [CELLSRV monitor disabled] [Detected a flood of restarts] [] [] [] [] [] [] [] [] [] []
Incident details in: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130912/log/diag/asm/cell/dm05cel03/incident/incdir_179/rstrc_25662_4_i179.trc
Sweep [inc][179]: completed
```

The cellrs processes are monitoring the cellms and the cellsrv processes, however there is a flood control being built in to prevent a loop of restarts. If this would happen it could bring down a cell, so in order to prevent that from happening this flood control has been built in. When this happens RS will stop monitoring the problematic service, cellsrv in this case. This also means that it will report back to cellcli that process is stopped. Personally i think that this built-in flood control is a good thing, however i would like to see cellcli report this properly. For instance it would be nice if Oracle would let cellcli report the cellsrv status to something like intermediate when RS has stopped monitoring it, it is now saying ‘stopped’ which not true at all. This also means that when you see cellcli reporting that cellsrv is down, you always need to double check if this is actually true before you try restarting cellsrv.

# Info about Exadata’s imageinfo and rpm’s

Recently we noticed that imageinfo was displaying the following information at a customers site. Imageinfo was not displaying data i suspected, so a quick little blogpost about what was going on there.

```[root@dm01db01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root "imageinfo | grep 'Image version'"
dm01db01: Image version: 11.2.3.2.1.130109
dm01db02: Image version: 11.2.3.2.1.130109
dm01db03: Image version: 11.2.3.2.1.130109
dm01db04: Image version: 11.2.3.2.1.130109
dm01db05: Image version: 11.2.3.2.0.120713
dm01db06: Image version: 11.2.3.2.0.120713
dm01db07: Image version: 11.2.3.2.0.120713
dm01db08: Image version: 11.2.3.2.0.120713
```

So what happened here, did those last four nodes not got updated when installing the latest CellOS QDPFE? Well it turns out that all the correct packages are installed after a quick investigation:

```[root@dm0105 ~]# for i in `yum info | grep Name | grep exadata |cut -d: -f2`; do yum info \$i ; done | grep 'Version\|Name'
Name       : exadata-applyconfig
Version    : 11.2.3.2.1.130109
Name       : exadata-asr
Version    : 11.2.3.2.1.130109
Name       : exadata-base
Version    : 11.2.3.2.1.130109
Name       : exadata-commonnode
Version    : 11.2.3.2.1.130109
Name       : exadata-exachk
Version    : 11.2.3.2.1.130109
Name       : exadata-firmware-compute
Version    : 11.2.3.2.1.130109
Name       : exadata-ibdiagtools
Version    : 11.2.3.2.1.130109
Name       : exadata-ipconf
Version    : 11.2.3.2.1.130109
Name       : exadata-onecommand
Version    : 11.2.3.2.1.130109
Name       : exadata-oswatcher
Version    : 11.2.3.2.1.130109
Name       : exadata-sun-computenode
Version    : 11.2.3.2.1.130109
Name       : exadata-validations-compute
Version    : 11.2.3.2.1.130109
[root@dm0101 rpm-extraxt]#
```

So according to yum everything is installed here, rpm -qa also confirms this so no miss-match there:

```[root@dm0105 ~]# rpm -qa | grep exadata
exadata-asr-11.2.3.2.1.130109-1
exadata-validations-compute-11.2.3.2.1.130109-1
exadata-base-11.2.3.2.1.130109-1
exadata-exachk-11.2.3.2.1.130109-1
exadata-oswatcher-11.2.3.2.1.130109-1
exadata-ipconf-11.2.3.2.1.130109-1
exadata-firmware-compute-11.2.3.2.1.130109-1
exadata-applyconfig-11.2.3.2.1.130109-1
exadata-onecommand-11.2.3.2.1.130109-1
exadata-commonnode-11.2.3.2.1.130109-1
exadata-sun-computenode-11.2.3.2.1.130109-1
exadata-ibdiagtools-11.2.3.2.1.130109-1
[root@dm0101 ~]#
```

It looks like imageinfo is displaying incorrect information here, how come and where does it gets its information from? Well, imageinfo gets its information out of a file called image.id:

```root@dm01db05 ~]# cat /opt/oracle.cellos/image.id
id:1357731445
kernel:2.6.32-400.11.1.el5uek
kernelstock:2.6.32-300.19.1.el5uek
version:11.2.3.2.1.130109
type:production
label:OSS_11.2.3.2.1_LINUX.X64_130109
cellrpmver:OSS_11.2.3.2.1_LINUX.X64_130109
cellswver:OSS_11.2.3.2.1_LINUX.X64_130109
mode:install
created:2013-01-09 03:37:26 -0800
node:COMPUTE
f331a2bfa27cc7371d936009289c23b6 *commonos.tbz
22212b4436b57fba6d53be4d7a6fd975 *dbboot.tbz
c7cf016f2180ab820010a787b77c59fb *dbfw.tbz
fd9d8b4f4a30588ca39f66015d5edc4e *dbrpms.tbz
eaa1a7973efcdcd2fe40b09828ba0e01 *debugos.tbz
290909dec5b3995aed2298f20d345aea *exaos.tbz
cfda22b7599e47bdc686acf41fd5d67c *kernel.tbz
18db43d6e94797710ffa771702be33e8 *ofed.tbz
4831ba742464caaa11c1c0153b823886 *sunutils.tbz
activated:2013-11-24 04:41:03 +0100
status:success
```

you can see that this file contains all the informations about the current version of your Exadata software stack as well as a list of md5 checksums of the .tbz files in /opt/oracle.cellos/iso/cellbits. We checked the md5 checksums of these files as well so now we now 100% that we are on the correct level. Now how does this file gets updated, it seems that it is out-of-date or at least not inline with our system? Well it is being generated when you are patching your compute node to another level and it is being updated by the exadata-sun-computenode rpm. Lets peek a bit into that rpm and what it is deploying, first i am going to mount an ISO file as my yum repository:

```[root@dm0101 ~]# mount -o loop /u01/app/oracle/stage.2288/112_latest_repo_130109.iso /var/www/html/yum/unknown/EXADATA/dbserver/11.2/latest.2288/
[root@dm0101 ~]# cp /var/www/html/yum/unknown/EXADATA/dbserver/11.2/latest.2288/x86_64/exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm
```

Next step is to extract the files from the RPM and see what we get:

```[root@dm0101 ~]# mkdir /tmp/exadata-sun-computenode-
[root@dm0101 ~]# cd /tmp/exadata-sun-computenode-
[root@dm0101 exadata-sun-computenode-]# rpm2cpio ../exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm | cpio -idv
./etc/init.d/pciibmap
./etc/init.d/workaround_13083530
./etc/yum.repos.d/Exadata-computenode.repo.sample
./etc/yum/pluginconf.d/installonlyn.conf
./opt/oracle.SupportTools/corecontrol
./opt/oracle.SupportTools/dbserver_backup.sh
./opt/oracle.SupportTools/sundiag.sh
./opt/oracle.cellos/exadata.computenode.post.sh
./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in
194 blocks
[root@dm0101 exadata-sun-computenode-]#
```

Aha! there is a file called ./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in in that RPM, but it is not showing everything, it still doesn’t got the md5 checksum information in it:

```[root@dm0101 exadata-sun-computenode-]# cat ./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in
id:1362281283
kernel:2.6.32-400.21.1.el5uek
kernelstock:2.6.32-300.19.1.el5uek
version:11.2.3.2.1.130302
type:production
label:OSS_11.2.3.2.1_LINUX.X64_RELEASE
cellrpmver:OSS_11.2.3.2.1_LINUX.X64_130302
cellswver:OSS_11.2.3.2.1_LINUX.X64_130302
mode:install
created:2013-03-02 19:28:04 -0800
node:COMPUTE
[root@dm0101 exadata-sun-computenode-]#
```

To find out how that gets filled we need to take a look into the script that gets executed when we install this RPM. The output bellow is a bit truncated for readability:

```[root@dm0101 rpm-extraxt]# rpm --scripts -qp exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm
.
..
fi
# Don't start post script for the fresh imaging (on zero boot)
# Don't install any files (remove them)
if [ -f /.post.imaging ]; then
rm -f /opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in
rm -f /opt/oracle.cellos/exadata.computenode.post.sh
else
/sbin/service exachkcfg postrpm 11.2.3.2.1.130302 exadata-sun-computenode
fi
..
.
```

So when we install this RPM it does (among other things not shown here) remove my image.id file and a shell script called exadata.computenode.post.sh and it then executes a new version of exadata.computenode.post.sh. Now exadata.computenode.post.sh is quiet a big script and it does setup your compute node properly and then finally it reboots your compute node. In this huge exadata.computenode.post.sh script there is a function that is being called that actually calculates the md5 checksums of my cellbits directory and places that info into a variable:

```md5sum_iso_cellbits ()
{
local imageid_out=\$1

[ "\$imageid_out" == '' ] &amp;&amp; imageid_out=\$IMAGE_ID_DEF
print_log "Please wait. Calculating checksums for \$imageid_out ..." INFO
# Remove any existing md5 sums from the output file
perl -ne '/[a-f\d]\s+\*.+/ or print \$_' -i \$imageid_out
local tmp_lst=/tmp/msd5sum.cellbits.lst
pushd /opt/oracle.cellos/iso/cellbits
md5sum -b * &gt; \$tmp_lst
cat \$tmp_lst &gt;&gt; \$imageid_out
rm -f \$tmp_lst
popd
}
```

Finally the function db_update_imageinfo () in exadata.computenode.post.sh creates a new and up-to-date version of my image.id file and it makes imageinfo happy again. To make a long blogpost short, the solution to our misbehaving image info (and also imagehistory btw.) was to run

```yum --enablerepo=exadata_dbserver_11.2.3.2.1_x86_64_base reinstall exadata-sun-computenode-11.2.3.2.1.130109-1
```

and let the nodes reboot. This solved our problem with the imageinfo command, unfortunately we could not determine the root cause for the image.id problem.

# About cellcli calibrate

With cellcli you can run the calibrate command, this will measure the performance of your harddisks and flashcards. It will eventually report to you the throughput and amount of IOPS that was measured during the calibrate, it will also report to you which luns are not performing up to par.

Calibrate has a few simple options:

```CellCLI> help calibrate

Usage: CALIBRATE [HARDDISK | FLASHDISK | lun1 [, lun2]] [FORCE]
Purpose: Runs raw performance tests on disks.
CELLSRV must be shutdown.  User id root is required.

Options:
[FLASHDISK]: option to specify that only flash LUNs be calibrated.
[HARDDISK] : option to specify that only hard disk LUNs be calibrated.
lun1 [, lun2] : option to specify list of LUNs by name    [FORCE]    : runs test even while CELLSRV is running.

Examples:
CALIBRATE
CALIBRATE '0_8', '0_10' force

CellCLI>
```

So let’s do a test run of calibrate on one of the harddisks in this cell in this V2 half rack:

```CellCLI> calibrate 0_2 force
Calibration will take a few minutes...
LUN 0_2 on drive [20:2] random read throughput: 168.21 MBPS, and 419 IOPSCALIBRATE results are within an acceptable range.

CellCLI>
```

Now this does tell us something about the throughput and the IOPS that was measured but it does not tells you HOW cellcli did manage to measure this. Let’s have a peek into how cellcli works, in order to set we have to increase the logmode and start cellcli in debugging mode:

```[root@dm01cel01 ~]# cellcli -vvv -d
CellCLI: Release 11.2.3.2.1 - Production on Sun Oct 27 15:19:55 CET 2013

Copyright (c) 2007, 2012, Oracle.  All rights reserved.
Cell Efficiency Ratio: 236
```

What this does, is dumping a lot of extra information on your screen regarding calls that cellcli does to ossmgmt.jar. Now cellcli is nothing but a perl script that calls to that jar file. With cellcli now running in debug mode and loglevel set to the highest we can now run our calibrate command again:

```CellCLI> calibrate 0_2 force
loop
yyn:0  state:0  yychar:-1
tok: -3 sval: calibrate
yy:calibrate
next yychar:289
reduce
state 0, reducing 0 by rule 2 (\$\$1 :)
...
..
.
state 37, reducing 4 by rule 4 (line : \$\$1 cli \$\$2 EOL)
CLI command: CALIBRATE(289)
SPOOL arg : NONE(0)
CLI verb: CALIBRATE(289)
CLI object: NONE(0) 0_2
CLI addverb:
attr#1: FORCE(334)
++++Over Value is: 0
ForAll is false
CLI addverb:
attr#1: FORCE(334)
reduce
After reduction, shifting from state 0 to state 2
tok: -1 sval: null
state 2, reading 0 (end-of-file)
Getting the state of CELLSRV services...  running
Calibration will take a few minutes...
LUN 0_2 on drive [20:2] random read throughput: 168.33 MBPS, and 420 IOPSCALIBRATE results are within an acceptable range.

CellCLI>
```

I have truncated the output above a little bit to make readable. Now this does not tell us a whole lot more but we can now see some interesting info in the cellcli logfile in /var/log/oracle/deploy/cellcli.lst.0* where the last parts can vary on your system depending the times you have started cellcli. When scrolling through that logfile you will see the output generated by cellcli you will find our cellcli calibrate command:

```Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.CellCLI readAndProcessLine
INFO: Executing cellcli command: "calibrate 0_2 force"
Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.CellCLI invokeSingleRSCmd
FINE: Executing RS command: 0.0STATE CELLSRV!!!!!!!!!!!!!!.
Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.CellCLI invokeSingleRSCmd
FINE: RS comamnd return status: 2.
```

You see that the oracle.ossmgmt.ms.cli.CellCLI class checks wetter or not cellsrv is running or not (i have added the force parameter so i can run calibrate without stopping cellsrv).

```Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.Calibrate fetchLuns
INFO: disknames.size: 1 listluncmd - list lun "0_2" attributes name, devicename, physicaldrives, disktype, lunsize
Oct 27, 2013 3:21:33 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults
FINER: fetchCli cmd: MOD CALIBRATE :list lun "0_2" attributes name, devicename, physicaldrives, disktype, lunsize
Oct 27, 2013 3:22:14 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults
FINER: fetchCli result: .	 0_2	 /dev/sdc	 20:2	 HardDisk	 557.861328125G
```

Next you see that cellcli uses oracle.ossmgmt.ms.cli.Calibrate to translate my lun name 0_2 to a blockdevice name, in my case /dev/sdc. it the goes on the gather names of all the pun:

```Oct 27, 2013 3:22:14 PM oracle.ossmgmt.ms.cli.Calibrate fetchLuns
INFO: listluns: 	 0_2	 /dev/sdc	 20:2	 HardDisk	 557.861328125G

Oct 27, 2013 3:22:14 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults
FINER: fetchCli cmd: MOD CALIBRATE :list physicaldisk attributes name, luns, diskMediaGroup
Oct 27, 2013 3:22:20 PM oracle.ossmgmt.ms.cli.Calibrate fetchCliResults
FINER: fetchCli result: .	 20:0	 0_0	 DMG_SYSTEM_HDD1
20:1	 0_1	 DMG_SYSTEM_HDD1
20:2	 0_2	 DMG_DATA_HDD1
20:3	 0_3	 DMG_DATA_HDD1
...
..
.
Oct 27, 2013 3:22:20 PM oracle.ossmgmt.ms.cli.Calibrate fetchLuns
INFO: listpds: 	 20:0	 0_0	 DMG_SYSTEM_HDD1
20:1	 0_1	 DMG_SYSTEM_HDD1
20:2	 0_2	 DMG_DATA_HDD1
20:3	 0_3	 DMG_DATA_HDD1
...
..
.
```

The output above is shortened for readability. Next oracle.ossmgmt.ms.cli.Calibrate runs scripts_common.sh, which is also run during cell deployment. scripts_common.sh tune_block_io /dev/sdc sets nr_request in /sys/block/sdc/queue/nr_requests to 4096:

```Oct 27, 2013 3:22:20 PM oracle.ossmgmt.ms.cli.Calibrate calibrate
FINER: Execute: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/deploy/scripts/unix/hwadapter/diskadp/scripts_common.sh tune_block_io /dev/sdc
```

The nr_requests sets the amount of IO slots 4096 meaning that the deadline IO scheduler has more chance to merge IO requests. This should already been set to 4096 on all of your cells, the calibrate command sets it again anyway.

```Oct 27, 2013 3:22:21 PM oracle.ossmgmt.ms.cli.CalibrateTest executeOrion
INFO: execute: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/orion -run advanced -type rand -matrix point  -disk_start_offset 0 -duration 30 -num_disks 1 -testname rand_0 -disk_end_offset 285184m -size_large 8192 -num_small 0 -num_large 10
Oct 27, 2013 3:22:51 PM oracle.ossmgmt.ms.cli.CalibrateTest executeOrion
INFO: execute: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/orion -run advanced -type rand -matrix point  -disk_start_offset 0 -duration 30 -num_disks 1 -testname rand_IOPS_0 -disk_end_offset 285184m -size_small 8 -num_large 0 -num_small 20
```

Finally calibrate starts the actual calibrate, it does this by calling Orion which is located in \$OSS_BIN. It runs two separate runs, one doing small 8Kb requests and one doing 8192Kb large IO’s.

# Exadata system stats and MBRC

Last year Oracle added a new functionality to dbms_stats.gather_system_stats, making it more Exadata aware and thus let the optimizer make choices that might result in offloading more often. See this blog post from Kerry Osborne about it, i made a small comment about the MBRC in it and i would like to zoom in on that a bit more on this post.

So what happens during gathering van of the Exadata system statistics, de easiest way to find out is to run an SQL trace along with the system stats:

```SYS@kjj AS SYSDBA> alter session set events 'sql_trace wait=true, bind=true, level=12';

Session altered.

SYS@kjj AS SYSDBA> exec dbms_stats.gather_system_stats('EXADATA');

PL/SQL procedure successfully completed.
```

If we look in the trace file we see that Oracle starts with some recursive SQL statements to find out what datafiles there are and the starts doing a lot of IO, these are all recorded with a unique wait event:

```WAIT #139869656707320: nam='Disk file I/O Calibration' ela= 3255 count=1 p2=182669 p3=2 obj#=-1 tim=1379436371918507
```

It looks like Oracle is doing some type of IO to find out what out what this storage can do, lets take a deeper look into these IO’s. You can use the GDB route, described by Frits Hoogland here but for this purpose there is an somewhat easier way. The new trace interface in 11g has an interesting option where we can trace the OSS calls to the storage:

```SYS@kjj AS SYSDBA> oradebug setmypid
Statement processed.
SYS@kjj AS SYSDBA> oradebug unlimit
Statement processed.
SYS@kjj AS SYSDBA> alter session set events 'trace[libcell.client_library.*]';

Session altered.

SYS@kjj AS SYSDBA> alter session set events 'sql_trace wait=true, bind=true, level=12';

Session altered.

SYS@kjj AS SYSDBA> exec dbms_stats.gather_system_stats('EXADATA');

PL/SQL procedure successfully completed.

SYS@kjj AS SYSDBA> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/kjj/kjj/trace/kjj_ora_13915.trc
SYS@kjj AS SYSDBA> oradebug close_trace
Statement processed.
SYS@kjj AS SYSDBA>
```

So what am i doing there, first i want to remove the 5MB tracefile limit for my session because the above steps will generate a huge trace file, next is how we van enable trace of all the communication of my shadow process to the storage layer so this:

```alter session set events 'trace[libcell.client_library.*]';
```

Will trace all of the following

```Components in library LIBCELL:
--------------------------
Client_Library	       Client Library
Disk_Layer		       Disk Layer
Network_Layer	       Network Layer
IPC_Layer		       IPC Layer
```

libcell.client_library.* means that i want to trace all calls to libcell.client_library and all of the 3 child layers. If we now take a look at the trace file and we zoom into to the Disk file I/O Calibration waits we see that is now preceded with trace data containing information about the communication between my shadow process and the storage layer:

```oss_open called for o/192.168.100.11/DATA_CD_09_dm01cel07 with guid 56d7b5d4-12a-0-0
connect: box name -192.168.100.11, port - 5042
TCP-connected to: box name -192.168.100.11, ospid - 14950, port - 5042, my box refid: 5, source id: 2575757666
SKGXP-connecting to the box 192.168.100.11 OSS OS Pid 14950
SKGXP-connected to the appliance number -728063998
Sending message 0xdabf2f0 with frag # 0, msgid 27 to box my_box_refid: 5 (my sourceid in box: 2575757666  box inc: 17)
Command with band 1, payload length 138, payload 0xdabf4f0
Command code OPEN, msgid 27, refid 0
Open flags: readwrite - 1, sync - 1, char - 0.
Device name - DATA_CD_09_dm01cel07
Vrpc completion - cmd 0xdabf330, req 0xdabee50, msgid 27, frag # 1
Vrpc reply 0xdabf3e8 with refid 27 for existing message: 0xdabf2f0
Reply with band 0, payload length 0, payload (nil)
Command code REPLY OPEN, msgid 0, refid 27
Returned open file descriptor  0
opening file number 9
oss_read open file: 9, offset: 38498574336, blocks: 128
Sending message 0xdabf2f0 with frag # 0, msgid 28 to box my_box_refid: 5 (my sourceid in box: 2575757666  box inc: 17)
Command with band 0, payload length 0, payload (nil)
Command code BREAD, msgid 28, refid 0
Block arguments fd 0 offset 38498574336 cnt 128 blksize 512 prefetch 0
oss_read done
oss_wait called for request: 0xdabee50
Vrpc completion - cmd 0xdabf330, req 0xdabee50, msgid 28, frag # 1
Vrpc reply 0xdabf3e8 with refid 28 for existing message: 0xdabf2f0
Reply with band 7, payload length 65536, payload 0x7f35f0e37e00
Command code REPLY READ, msgid 0, refid 28
Number of bytes read 65536
oss_wait done, request 0xdabee50, result 0x7f35f116bfc8
WAIT #139869656707320: nam='Disk file I/O Calibration' ela= 3255 count=1 p2=182669 p3=2 obj#=-1 tim=1379436371918507
```

So the above is an excerpt of the tracefile generated, where we can see how Exadata handles its IO requests. I will zoom in on how Oracle does this, it is not specific for Exadata system stats though. In this case it is an IO call where Oracle needs to open up a new connection to a celldisk, this disk was apparently not needed in earlier IO requests. In line 1 you can see that the oss_open call says it it wants to open disk 9 in cell07. The next line it sets up the connection, it does this by first contacting the cell using a TCP call in and in then switches to RDS using SKGXP, sending a message down the wire with msgid 27. Line 12 tells us that the cell response to out msgid 27 with a refid 27 and we now have a new file descriptor to this cell disk with #9. On this level files are not datafiles but files are chunks of data on a disk so, on this half rack we have 7 cells * 12 disk is maximum of 84 disks. Now we have our file descriptor open we can issue our read request with the oss_read call in line 19, it requests to read 128 blocks, keep in mind that these are blocks on at a celldisk level and thus not database blocks. As you can see in line 21 you can see at has a 512K blksize. Also note that this read request has msgid of 28, it looks like these are sequential for your shadow process. Directly after the oss_read call exadata issues an oss_wait and tries to reap the IO request. In line 28 we can see that the 128 blocks resulted in a read of 64k and then oss_wait call is done and oracle issues our wait.

So now we know how Exadata handles IO request lets zoom in on our IO’s for our Disk file I/O Calibration wait

```[oracle@dm0101 [kjj] trace]\$ grep 'oss_read open file' /u01/app/oracle/diag/rdbms/kjj/kjj/trace/kjj_ora_13915.trc
...
..
.
oss_read open file: 19, offset: 37603999744, blocks: 128
oss_read open file: 12, offset: 38469820416, blocks: 16
oss_read open file: 16, offset: 38503866368, blocks: 128
oss_read open file: 43, offset: 38536380416, blocks: 16
oss_read open file: 51, offset: 37821882368, blocks: 128
oss_read open file: 78, offset: 37753462784, blocks: 16
oss_read open file: 44, offset: 38539337728, blocks: 128
oss_read open file: 33, offset: 38455091200, blocks: 16
oss_read open file: 49, offset: 37558116352, blocks: 128
oss_read open file: 30, offset: 38454730752, blocks: 16
oss_read open file: 16, offset: 38439862272, blocks: 128
oss_read open file: 67, offset: 37605408768, blocks: 16
...
..
.
[oracle@dm0101 [kjj] trace]\$
```

So when oracle fires the dbms_stats.gather_system_stats(‘EXADATA’) you see that it is switching between requesting 128 blocks and 16 blocks from different celldisks. Apparently oracle uses these IO’s to calculate wetter or not it can set the MBRC to 128 or not (it also takes db_cache_size into account). Then it deletes the results in aux_stats\$:

```DELETE FROM SYS.AUX_STATS\$ WHERE SNAME = 'SYSSTATS_MAIN'
```

and eventually it updates the values of, amongst others, MBRC to 128 in this case:

```PARSING IN CURSOR #139869655553736 len=71 dep=1 uid=0 oct=2 lid=0 tim=1379436419280111 hv=4016519028 ad='db24c118' sqlid='32kbjpmrqfdvn'
INSERT INTO SYS.AUX_STATS\$ VALUES ('SYSSTATS_MAIN', 'MBRC', :B1 , NULL)
END OF STMT
BINDS #139869655553736:
Bind#0
oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
oacflg=13 fl2=206001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f35f12e4088  bln=22  avl=03  flg=09
value=128
EXEC #139869655553736:c=0,e=267,p=0,cr=0,cu=3,mis=0,r=1,dep=1,og=1,plh=0,tim=1379436419280370
CLOSE #139869655553736:c=0,e=1,dep=1,type=3,tim=1379436419280396
```

# Analyzing page allocation failures on Exadata

I have seen several clients who are struggling to decode page allocation failures on Exadata, in this post i will try to explain how to read the backtrace. The following is an anonymized client case where page allocation failures are leading up to a node reboot.

```Jan 1 11:58:02 dm01db01 kernel: oracle: page allocation failure. order:1, mode:0x20
Jan 1 11:58:02 dm01db01 kernel: Pid: 80047, comm: oracle Tainted: P           2.6.32-400.11.1.el5uek #1
Jan 1 11:58:02 dm01db01 kernel: Call Trace:
Jan 1 11:58:02 dm01db01 kernel:  <IRQ>  [<ffffffff810ddf74>] __alloc_pages_nodemask+0x524/0x595
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8110da3f>] kmem_getpages+0x4f/0xf4
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8110dc3c>] fallback_alloc+0x158/0x1ce
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8110ddd3>] ____cache_alloc_node+0x121/0x134
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8110e3f3>] kmem_cache_alloc_node_notrace+0x84/0xb9
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8110e46e>] __kmalloc_node+0x46/0x73
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff813b9aa8>] ? __alloc_skb+0x72/0x13d
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff813b9aa8>] __alloc_skb+0x72/0x13d
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff813b9bdb>] alloc_skb+0x13/0x15
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff813b9f11>] dev_alloc_skb+0x1b/0x38
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa02a3722>] ipoib_cm_alloc_rx_skb+0x31/0x1de [ib_ipoib]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa02a4d04>] ipoib_cm_handle_rx_wc+0x3a1/0x5b8 [ib_ipoib]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa0198bdc>] ? mlx4_ib_free_srq_wqe+0x27/0x54 [mlx4_ib]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa01904d4>] ? mlx4_ib_poll_cq+0x620/0x65e [mlx4_ib]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa029fe97>] ipoib_poll+0x87/0x128 [ib_ipoib]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff813c4b69>] net_rx_action+0xc6/0x1cd
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8105e8cd>] __do_softirq+0xd7/0x19e
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff810aefdc>] ? handle_IRQ_event+0x66/0x120
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff81012eec>] call_softirq+0x1c/0x30
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff81014695>] do_softirq+0x46/0x89
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8105e752>] irq_exit+0x3b/0x7a
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff8145bea1>] do_IRQ+0x99/0xb0
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffff81012713>] ret_from_intr+0x0/0x11
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa02bc2df>] ? kcalloc+0x35/0x3d [rds]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa02bc2df>] ? kcalloc+0x35/0x3d [rds]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa02bc724>] ? __rds_rdma_map+0x16c/0x32c [rds]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa02bca09>] ? rds_get_mr+0x42/0x4f [rds]
Jan 1 11:58:02 dm01db01 kernel:  [<ffffffffa02b67b2>] ? rds_setsockopt+0xae/0x14f [rds]
Jan 1 11:58:02 dm01db01 kernel:  <EOI>  [<ffffffff81458045>] ? _spin_lock+0x21/0x25
```

Lets take a look at that call trace, in line 1 we see that this regarding the oracle binary that is having the page allocation failures. The way the linux kernel works is that it tries to allocatie pages is in powers of 2, so ‘order:1’ means that linux failed to allocate 2^1 ( $2^1$ ) pages, so 2 pages in total. We can see how much that is if we lookup the pagesize on Exadata:

```[root@dm0101 ~]# getconf PAGESIZE
4096
```

so this means we wanted to allocate: $2^1*4096=8192$

So in this case this system could not allocate 8k of memory, if we walk through the trace we can see where it went wrong. You can see in lines 10,11 en 12 that Linux tries to allocate Socket Kernel Buffers (alloc_skb) and fails. Walking further down the stack we can see it is IP over Infinband that tries to allocate these buffers (ipoib_cm_alloc_rx_skb).

The mode:0x020 line is the GFP (Get Free Pages) flag that is being sent with the request, you can look up what flag it is in gfp.h:

```[root@dm0101 ~]# grep 0x20 /usr/src/kernels/`uname -r`/include/linux/gfp.h
#define __GFP_HIGH	((__force gfp_t)0x20u)	/* Should access emergency pools? */
```

So if we look further down the messages file we can see the following part:

```Jun 14 11:58:02 dm04dbadm02 kernel: Node 0 DMA free:15800kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15252kB mlocked:0kB dirty:0kB
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
```

As we can see we have only 4kB of low memory left and absolutely no room left in kernel slabs at all. As you can see rds and ip and thus also ipoib are slab allocated pages (you can monitor slabs using slabinfo):

```[root@dm0101 ~]# cat /proc/slabinfo | grep "^ip\|rds"
rds_ib_frag        23600  23600   8192    1    4 : tunables    8    4    0 : slabdata  23600  23600      0
rds_ib_incoming    23570  23570   8192    1    4 : tunables    8    4    0 : slabdata  23570  23570      0
rds_iw_frag            0      0     40   92    1 : tunables  120   60    8 : slabdata      0      0      0
rds_iw_incoming        0      0    120   32    1 : tunables  120   60    8 : slabdata      0      0      0
rds_connection        18     77    688   11    2 : tunables   54   27    8 : slabdata      7      7      0
ip6_dst_cache         10     24    320   12    1 : tunables   54   27    8 : slabdata      2      2      0
ip6_mrt_cache          0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
ip_mrt_cache           0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
ip_fib_alias           0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
ip_fib_hash           28    106     72   53    1 : tunables  120   60    8 : slabdata      2      2      0
ip_dst_cache         486    530    384   10    1 : tunables   54   27    8 : slabdata     53     53      0
[root@dm0101 ~]#
```

Furthermore, ipoib uses an MTU size 64k so that is at least an order 4 of slab managed pages.

```[root@dm0101 ~]# ifconfig bond0 | grep MTU
UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
```

With res running on our interconnect this will eventually lead to a node eviction. If rds message can’t be written in the pages then interconnect traffic won’t arrive at this host (remember that infiniband is a lossless fabric).

# Exadata versus IPv6

Recently one of my customers got a complaint from their DNS administrators, our Exadata’s are doing 40.000 DNS requests per minute. We like our DNS admins so we had a look into these request and what was causing them. I started with just firing up a tcpdump on one of the bonded client interfaces on a random compute node:

```[root@dm01db01 ~]# tcpdump -i bondeth0 -s 0 port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bondeth0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:41:04.937009 IP dm0101.domain.local.59868 > dnsserver01.domain.local:  53563+ AAAA? dm0101-vip.domain.local. (41)
15:41:04.937287 IP dm0101.domain.local.46672 > dnsserver01.domain.local:  44056+ PTR? 8.18.68.10.in-addr.arpa. (41)
15:41:04.938409 IP dnsserver01.domain.local > dm0101.domain.local.59868:  53563* 0/1/0 (116)
15:41:04.938457 IP dm0101.domain.local.56576 > dnsserver01.domain.local:  45733+ AAAA? dm0101-vip.domain.local.domain.local. (54)
15:41:04.939547 IP dnsserver01.domain.local > dm0101.domain.local.46672:  44056* 1/1/1 PTR dnsserver01.domain.local. (120)
15:41:04.940204 IP dnsserver01.domain.local > dm0101.domain.local.56576:  45733 NXDomain* 0/1/0 (129)
15:41:04.940237 IP dm0101.domain.local.9618 > dnsserver01.domain.local:  64639+ A? dm0101-vip.domain.local. (41)
15:41:04.941912 IP dnsserver01.domain.local > dm0101.domain.local.9618:  64639* 1/1/1 A dm0101-vip.domain.local (114)
```

So what are we seeing here, there are a bunch of AAAA requests to the DNS server and only one A record request. But the weirdest thing is of course the requests with the double domainname extensions. If we zoom in at those AAAA records requests we see the following, here is the request:

```15:41:04.937009 IP dm0101.domain.local.59868 > dnsserver01.domain.local:  53563+ AAAA? dm0101-vip.domain.local. (41)
```

And here is our answer:

```15:41:04.938409 IP dnsserver01.domain.local > dm0101.domain.local.59868:  53563* 0/1/0 (116)
```

The interesting part is in the answer of the dnsserver, in 0/1/0 the DNS server tells me that for this lookup it found 0 answer resource records, 1 authority resource records, and 0 additional resource records. So it could not resolve my VIP name in DNS. Now if we look at the A records request:

```15:41:04.945697 IP dm0101.domain.local.10401 > dnsserver01.domain.local:  37808+ A? dm0101-vip.domain.local. (41)
```

and the answer:

```15:41:04.947249 IP dnsserver01.domain.local > dm0101.domain.local.10401:  37808* 1/1/1 A dm0101-vip.domain.local (114)
```

Now by looking at the answer: 1/1/1 we can see that i got 1 answer record in return (the first 1), so the DNS server knows the IP for dm0101-vip.domain.local when an A record is requested. What is going on here? Well the answer is simple, AAAA records are IPv6 DNS requests, our DNS servers are not configured for IPv6 name request so it rejects these requests. So what about those weird double domain names like dm0101-vip.domain.local.domain.local? When Linux requests a DNS record the following happens:

1. Linux issues DNS request for dm0101-vip.domain.local, because IPv6 is enabled, it issues an AAAA request.
2. DNS server is not configured for IPv6 requests and discards request.
3. Linux retries the requests, looks into resolv.conf and adds domainame, we now have dm0101-vip.domain.local.domain.local
4. Once again, the DNS server discards this request.
5. Linux once agains retries the AAAA request, adds domain name: dm0101-vip.domain.local.domain.local.domain.local
6. DNS server discards AAAA request
7. Linux now falls back to a DNS IPv4 request and does an A request: dm0101-vip.domain.local
8. DNS servers understands this and replies

This happens because Exadata comes with IPv6 Enabled on both infiniband and ethernet interfaces:

```[root@dm0101 ~]# ifconfig bond0;ifconfig bond1
bond0     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.100.1  Bcast:192.168.100.255  Mask:255.255.255.0
inet6 addr: fe80::221:2800:13f:2673/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
RX packets:226096104 errors:0 dropped:0 overruns:0 frame:0
TX packets:217747947 errors:0 dropped:55409 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:320173078389 (298.1 GiB)  TX bytes:176752381042 (164.6 GiB)

bond1     Link encap:Ethernet  HWaddr 00:21:28:84:16:49
inet addr:10.18.1.10  Bcast:10.18.1.255  Mask:255.255.255.0
inet6 addr: fe80::221:28ff:fe84:1649/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
RX packets:14132063 errors:2 dropped:0 overruns:0 frame:2
TX packets:7334898 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2420637835 (2.2 GiB)  TX bytes:3838537234 (3.5 GiB)

[root@dm0101 ~]#
```

Let’s disable ipv6, my client is not using on ipv6 its internal network anyway (like most companies i assume). You can edit /etc/modprobe.conf to prevent it from being loaded at boot time, add the following 2 lines modprobe.conf:

```alias ipv6 off
install ipv6 /bin/true
```

Then add the below entries to /etc/sysconfig/network

```IPV6INIT=no
```

Reboot the host and lets look at what we see after the host is up again:

```[root@dm0103 ~]# cat /proc/net/if_inet6
00000000000000000000000000000001 01 80 10 80       lo
fe8000000000000002212800013f111f 08 40 20 80    bond0
fe80000000000000022128fffe8e5f6a 02 40 20 80     eth0
fe80000000000000022128fffe8e5f6b 09 40 20 80    bond1
[root@dm0103 ~]# ifconfig bond0;ifconfig bond1
bond0     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.100.3  Bcast:192.168.100.255  Mask:255.255.255.0
inet6 addr: fe80::221:2800:13f:111f/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
RX packets:318265 errors:0 dropped:0 overruns:0 frame:0
TX packets:268072 errors:0 dropped:16 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:433056862 (412.9 MiB)  TX bytes:190905039 (182.0 MiB)

bond1     Link encap:Ethernet  HWaddr 00:21:28:8E:5F:6B
inet addr:10.18.1.12  Bcast:10.18.1.255  Mask:255.255.255.0
inet6 addr: fe80::221:28ff:fe8e:5f6b/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
RX packets:10256 errors:0 dropped:0 overruns:0 frame:0
TX packets:5215 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1559169 (1.4 MiB)  TX bytes:1350653 (1.2 MiB)

[root@dm0103 ~]#
```

So disabling ipv6 modules through modprobe.conf did not do the trick, what did broughgt the ipv6 stack:

```[root@dm0103 ~]# lsmod | grep ipv6
ipv6 291277 449 bonding,ib_ipoib,ib_addr,cnic
```

The infiniband stack brought up ipv6, we can disable ipv6 at kernel level:

```root@dm0103 ~]# sysctl -a | grep net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0
[root@dm0103 ~]# echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
[root@dm0103 ~]# sysctl -a | grep net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
[root@dm0103 ~]# cat /proc/net/if_inet6
[root@dm0103 ~]#
```

Now we are running this exadata compute node without ipv6, we can now check if we still have infiniband connectivity, on a cell start a ibping server and use ibstat to get the port GUID:

```[root@dm01cel01 ~]# ibstat -p
0x00212800013ea3bf
0x00212800013ea3c0
[root@dm01cel01 ~]# ibping -S
```

On our ipv6 disabled host start the ibping to one of the

```[root@dm0103 ~]# ibping -c 4 -v -G 0x00212800013ea3bf
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.148 ms
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.205 ms
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.247 ms
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.139 ms
ibwarn: [14476] report: out due signal 0

--- dm01cel01.oracle.vxcompany.local.(none) (Lid 6) ibping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 4001 ms
rtt min/avg/max = 0.139/0.184/0.247 ms
[root@dm0103 ~]#
```

So we have infiniband connectivity, lets see how Oracle reacts:

```[root@dm0103 ~]# crsctl stat res -t
```

And now we play the waiting game… well basically it never comes back, it tries to read from 2 network sockets if we look with strace, it hangs at:

```[pid 15917] poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=4, events=POLLIN|POLLRDNORM}], 2, -1
```

Which points to 2 file descriptors which it can’t read from:

```[root@dm0103 ~]# ls -altr /proc/15917/fd
total 0
dr-xr-xr-x 7 root root  0 Feb  3 18:37 ..
lrwx------ 1 root root 64 Feb  3 18:37 4 -> socket:[3447070]
lrwx------ 1 root root 64 Feb  3 18:37 3 -> socket:[3447069]
lrwx------ 1 root root 64 Feb  3 18:37 2 -> /dev/pts/0
lrwx------ 1 root root 64 Feb  3 18:37 1 -> /dev/pts/0
lrwx------ 1 root root 64 Feb  3 18:37 0 -> /dev/pts/0
dr-x------ 2 root root  0 Feb  3 18:37 .
[root@dm0103 ~]#
```

There is an dependency between ipv6 and CRS on an Exadata, disabling ipv6 will cripple your clusterware. There is no real solution for this problem because we need ipv6 on an Exadata, we can’t disable it. However we van easily reduce the amount of ipv6 DNS lookups by extending our /etc/hosts file and adding all hostnames, vip names etc. of all our hosts in our cluster in every single hostfile on computenodes. Unfortunately we can’t do this on our Cell servers, because oracle does not want us to go ‘messing’ with them so you have to live with it for now.

# Using HCC on ZFS Storage Appliances

Hybrid Columnar Compression (HCC) is one of the Exadata features but lately Oracle has been pushing this featurei to other Oracle hardware like the ZFS Storage Appliance and Axiom Pillar Storage series. We recently got a ZFS Storage Appliance (ZFSSA) at VX Company, so we are now able to use HCC on the Oracle Database Appliance (ODA). To use HCC we need to create a tablespace with datafiles on a ZFS Storage Appliance, in order to de so we going to hookup our ODA using directNFS (dNFS). I am not going in the details of explaining dnfs in this blogpost, there are enough blogs around who have excellent background information on dnfs:

So lets setup dnfs for our HCC table on my ODA. First we create a mountpoint (on all the RAC nodes):

```[root@vxoda12 ~]# mkdir /mnt/vxzfs
[root@vxoda12 ~]# chown oracle:oinstall /mnt/vxzfs
```

Add the NFS export to fstab (on all the RAC nodes), this should of course be a NFS share on our ZFSSA

```[root@vxoda11 ~]# cat /etc/fstab|grep nfs
vxzfs.oracle.vxcompany.local:/export/odatestdrive  /mnt/vxzfs  nfs  rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,vers=3,timeo=600
[root@vxoda11 ~]# mount /mnt/vxzfs/
[root@vxoda11 ~]# mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
vxzfs.oracle.vxcompany.local:/export/odatestdrive on /mnt/vxzfs type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,nfsvers=3,timeo=600,addr=10.12.0.211)
```

Next create a file called oranfstab, if you are on a RAC Instance don’t forget to change local IP address on the other nodes:

```[oracle@vxoda11 [odadb011] trace]\$ vi \$ORACLE_HOME/dbs/oranfstab
server: vxzfs.oracle.vxcompany.local
path:  10.12.0.211
local:  10.12.0.202
export: /export/odatestdrive mount: /mnt/vxzfs
```

Change the Oracle ODM library to the Oracle ODM NFS library (on all the RAC nodes):

```[oracle@vxoda11 [] ~]\$ cd \$ORACLE_HOME/lib
[oracle@vxoda11 [] lib]\$ ls -al libodm11.so
lrwxrwxrwx 1 oracle oinstall 12 Nov  8 13:15 libodm11.so -> libodmd11.so
[oracle@vxoda11 [] lib]\$ ln -sf libnfsodm11.so libodm11.so
[oracle@vxoda11 [] lib]\$ ls -al libodm11.so
lrwxrwxrwx 1 oracle oinstall 14 Nov 13 12:54 libodm11.so -> libnfsodm11.so
```

Restart the database:

```[oracle@vxoda11 [odadb011] lib]\$ srvctl stop database -d odadb01 -o immediate
[oracle@vxoda11 [odadb011] lib]\$ srvctl start database -d odadb01
[oracle@vxoda11 [odadb011] lib]\$ srvctl status database -d odadb01
Instance odadb011 is running on node vxoda11
Instance odadb012 is running on node vxoda12
```

Check alertlog you should now see:

```Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0
```

Create a tablespace on the NFS share

```SYS@odadb011 AS SYSDBA> create bigfile tablespace dnfs datafile '/mnt/vxzfs/dnfs.dbf' size 500G extent management local autoallocate;
```

Now you should see some entries in v\$dnfs_servers:

```SYS@odadb012 AS SYSDBA> col dirname form a50
SYS@odadb012 AS SYSDBA> col svrname form a50
SYS@odadb012 AS SYSDBA> select * from v\$dnfs_servers;

INST_ID        ID SVRNAME                               DIRNAME                                MNTPORT       NFSPORT      WTMAX       RTMAX
---------- ---------- -------------------------------------------------- -------------------------------------------------- ---------- ---------- ---------- ----------
1         1 vxzfs.oracle.vxcompany.local                /export/odatestdrive                          59286          2049    1048576     1048576

SYS@odadb012 AS SYSDBA>
```

If we then try to create a HCC table in our dNFS tablespace we see this:

```SYS@kjj2 AS SYSDBA> create table t3 compress for archive low tablespace dnfs_kjj as select * from dba_objects;
create table t3 compress for archive low tablespace dnfs_kjj as select * from dba_objects
*
ERROR at line 1:
ORA-64307: hybrid columnar compression is not supported for tablespaces on this storage type
```

Apperently Oracle does not know that we are storing our data on a ZFS Appliance. To see what is going we can take a look at the traffic between my ODA en ZFSSA using tcpdump. When looking at the dump in wireshark we see these packages going between ODA and the ZFSSA:

So Oracle tries to do SNMP calls to the ZFSSA, lets see what happens if we do a get to the MIB displayed in the tcp frame above:

```[root@vxoda11 ~]# snmpget -v1 -c public 10.12.0.211 1.3.6.1.4.1.42.2.225.1.4.2.0
Timeout: No Response from 10.12.0.211.
```

We need to enable SNMP on the ZFSSA (needless to say you need to replace the e-maill address with proper one):

```user@localhost:~\$ ssh admin@vxzfs.oracle.vxcompany.local
Password:
Last login: Sun Nov 18 18:20:32 2012 from 10.12.0.252
Waiting for the appliance shell to start ...
vxzfs:> configuration services snmp
vxzfs:configuration services snmp> set network=10.12.0.0/24
network = 10.12.0.0/24 (uncommitted)
vxzfs:configuration services snmp> set syscontact=<someusername>@vxcompany.com
syscontact = <someusername>@vxcompany.com (uncommitted)
vxzfs:configuration services snmp> enable
vxzfs:configuration services snmp> commit
vxzfs:configuration services snmp> show
Properties:
<status> = online
community = public
network = 10.12.0.0/24
syscontact = <someusername>@vxcompany.com
trapsinks = 127.0.0.1

vxzfs:configuration services snmp>
```

Our snmpget should now work:

```[root@vxoda11 ~]# snmpget -v1 -c public 10.12.0.211 1.3.6.1.4.1.42.2.225.1.4.2.0
SNMPv2-SMI::enterprises.42.2.225.1.4.2.0 = STRING: "Sun ZFS Storage 7120"
```

Now due to Unpublished Bug 12979161 we need to make some symlinks so that dNFS finds the correct snmp libraries libnetsnmp.so, apparently this is fixed in 11.2.0.4:

```[oracle@vxoda11 [odadb011] ~]\$ locate libnetsnmp.so
/usr/lib64/libnetsnmp.so.10
/usr/lib64/libnetsnmp.so.10.0.3
[root@vxoda11 ~]# cd /usr/lib64/
[root@vxoda11 lib64]# ln -s libnetsnmp.so.10.0.3 libnetsnmp.so
```

Last step is to restart our database so it can pickup symlinks for the snmp libraries:

```[oracle@vxoda11 [kjj1] ~]\$ srvctl stop database -d kjj -o immediate
[oracle@vxoda11 [kjj1] ~]\$ srvctl start database -d kjj
```

Now we are able to create our HCC table on our ZFSSA tablespace:

```SYS@odadb011 AS SYSDBA> create table t1 compress for archive low tablespace dnfs as select * from dba_objects;

Table created.

SYS@kjj2 AS SYSDBA> select compression, compress_for from dba_tables where table_name='T2';

COMPRESS COMPRESS_FOR
-------- ------------
ENABLED  ARCHIVE LOW
```

Happy compressing!

# Mixing UDP and RDS on an Exadata

Recently i got a question from an Exadata customer what would happen if you would forget to relink the oracle for IPC RDS use for just one home. Would Oracle be smart enough to fall back on UPC for the complete RAC database? Time to do a little test and find out what will happen here:

First lets see how IPC communication between the nodes is done right now. We can use skgxpinfo to see how it is linked since version 11.2.0.3.0:

```[oracle@dm01db01 [dbfs1] ~]\$ dcli -g dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo
dm01db01: rds
dm01db02: rds
dm01db03: rds
dm01db04: rds
[oracle@dm01db01 [dbfs1] ~]\$
```

Alternatively we can double check it if you want:

```[oracle@dm01db01 [t11203] ~]\$ dcli -g dbs_group -l oracle 'nm /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so | grep rds_enabled'
dm01db01: 00000000001de7f0 b skgxp_rds_enabled
dm01db02: 00000000001de7f0 b skgxp_rds_enabled
dm01db03: 00000000001de7f0 b skgxp_rds_enabled
dm01db04: 00000000001de7f0 b skgxp_rds_enabled
[oracle@dm01db01 [t11203] ~]\$
```

So Oracle started with RDS support enabled, this can be seen in the alert log during the startup of the instance:

```[oracle@dm01db01 [dbfs1] trace]\$ grep -A4 "Cluster communication" alert_dbfs1.log |tail -5
Cluster communication is configured to use the following interface(s) for this instance
192.168.100.1
cluster interconnect IPC version:Oracle RDS/IP (generic)
IPC Vendor 1 proto 3
Version 4.1
[oracle@dm01db01 [dbfs1] trace]\$
```

Lets change on the 4th node on the RAC cluster (this test system is half rack Exadata) the IPC library and relink it for UDP usage. Before relinking the library make sure that all instance running out of that home are stopped, when ready relink it with UDP usage (the output is trimmed a little bit for readability):

```[oracle@dm01db04 [dbfs4] trace]\$ make -C /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib -f ins_rdbms.mk ipc_g ioracle
make: Entering directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib'
rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
cp /u01/app/oracle/product/11.2.0.3/dbhome_1/lib//libskgxpg.so /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
chmod 755 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin

- Linking Oracle
rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle
gcc  -o /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle -m64 -L/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/stubs/
test ! -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle ||\
mv -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracleO
mv /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle
chmod 6751 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle
make: Leaving directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib'
[oracle@dm01db04 [dbfs4] trace]\$
```

So now in the this 4th node my oracle 11.2.0.3.0 home now linked for UDP IPC communication, we can check it with skgxpinfo:

```[oracle@dm01db01 [dbfs1] trace]\$ dcli -g ~/dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo
dm01db01: rds
dm01db02: rds
dm01db03: rds
dm01db04: udp
[oracle@dm01db01 [dbfs1] trace]\$
```

So indeed, the 4th node is now configured for UDP usage. Now what happens if I try to start a RAC database from this 11.2.0.3.0 home, will Oracle see that one node can’t use RDS and fall back to UDP for the whole database:

```[oracle@dm01db04 [dbfs4] trace]\$ srvctl start database -d dbfs
PRCR-1079 : Failed to start resource ora.dbfs.db
CRS-5017: The resource action "ora.dbfs.db start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.3/grid/log/dm01db04/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.dbfs.db' on 'dm01db04' failed
CRS-2632: There are no more servers to try to place resource 'ora.dbfs.db' on that would satisfy its placement policy
[oracle@dm01db04 [dbfs4] trace]\$ srvctl status database -d dbfs
Instance dbfs1 is running on node dm01db01
Instance dbfs2 is running on node dm01db02
Instance dbfs3 is running on node dm01db03
Instance dbfs4 is not running on node dm01db04
[oracle@dm01db04 [dbfs4] trace]\$
```

So not really a surprise there, the RAC instances can’t see communicate with each other, the alter logs of the instances are confirming that SKGCP libraries can’t communicate with each other, this from the alert log on the RDS instances:

```ORA-27506: IPC error connecting to a port
ORA-27300: OS system dependent operation:proto mismatch failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcon
ORA-27303: additional information: Protocol of this IPC does not match remote (192.168.100.4). SKGXP IPC libraries must be the same version. [local: RDS,remote: UDP]
```

Node 4 with the UDP is showing a different error stack:

```ORA-27550: Target ID protocol check failed. tid vers=1, type=1, remote instance number=1, local instance number=4
System state dump requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination].
LMON (ospid: 12700): terminating the instance due to error 481
System State dumped to trace file /u01/app/oracle/diag/rdbms/dbfs/dbfs4/trace/dbfs4_diag_12688.trc
Wed Aug 08 21:50:24 2012
ORA-1092 : opitsk aborting process
Dumping diagnostic data in directory=[cdmp_20120808215024], requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination].

Cluster communication is configured to use the following interface(s) for this instance
192.168.100.4
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
```

So basically, the message is don’t mix UDP and RDS boys and girls!

# Adding an Exadata V2 as a target in Enterprise Manager 12c

Although Oracle says that with Enterprise Manager 12c it “provides the tools to effectively and efficiently manage your Oracle Exadata Database Machine” it is a bit of a challenge to get it all working correctly on an Exadata V2. It looks like when developing the Exadata plugin for Enterprise manager 12c they clearly developed it on a X2 only, getting a V2 as a target into Enterprise Manager does not work out of the box. In order to get Enterprise Manager 12c to discover your Exadata V2 you need to do some extra steps.

Exadata discovery is done using the first compute node in your Exadata rack (e.g. dm01db01). The agent uses a file called databasemachine.xml which is located in your One Command directory:

```[oracle@dm01db01 [+ASM1] ~]\$ ls -la /opt/oracle.SupportTools/onecommand/database*
-rw-r--r-- 1 root root 15790 May 10 22:07 /opt/oracle.SupportTools/onecommand/databasemachine.xml
[oracle@dm01db01 [+ASM1] ~]\$
```

This file is being generated with dbm_configurator.xls in the One Command directory, unfortunately for V2 owners, early One Command versions did not generate these files so you have generated it yourself. Obviously you need Excel and a Windows pc to use dbm_configurator.xls as it uses VBA (Visual Basic for Applications) to generate the One Command files.

• On the first node in the rack scp the following 2 files from /opt/oracle.SupportTools/onecommand:
1. config.dat
2. onecommand.params
• Download OneCommand: Patch 13612149
• Unzip the file p13612149_112242_Generic.zip windows host
• Extract the tarbal onecmd.tar
• Open dbm_configurator.xls in Excel
• Enable macro’s withing excel
• Click on the import button in the top left and locate the onecommand.params file (make sure that config.dat is in the same directory)
• Check if the imported data is still correct
• Click the generate button
• Click the create config files button

After this upload at least the databasemachine.xml to /opt/oracle.SupportTools/onecommand on your first node in your rack.

Next step is to correct the Infiniband naming of the compute node HBA’s, right now on a V2 these are as follow:

```[root@dm01db01 mlx4_0]# ibnodes | grep dm01db
Ca     : 0x00212800013f1242 ports 2 "dm01db04 HCA-1"
Ca     : 0x00212800013f12da ports 2 "dm01db02 HCA-1"
Ca     : 0x00212800013f111e ports 2 "dm01db03 HCA-1"
Ca     : 0x00212800013f2672 ports 2 "dm01db01 HCA-1"
```

Unfortunately the agent discovery process is looking for a naming convention that goes ‘hostname S ip-address HCA-1′. Fortunately Oracle provided us with a script to correct this: /opt/oracle.cellos/ib_set_node_desc.sh. When you run this script on a V2 not much will happen, it is broken on a V2 system. The problem is in the infiniband bond naming:

```[root@dm01db01 ~]# grep IFCFG_BONDIB /opt/oracle.cellos/ib_set_node_desc.sh
local IFCFG_BONDIB=/etc/sysconfig/network-scripts/ifcfg-bondib
local addr=`awk -F= 'BEGIN {IGNORECASE=1} /^IPADDR=[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\$/ {print \$2}' \$IFCFG_BONDIB\$id 2>/dev/null`
[root@dm01db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bondib
cat: /etc/sysconfig/network-scripts/ifcfg-bondib: No such file or directory
[root@dm01db01 ~]#
```

So Exadata V2 IB bond has a different, it is actually called bond0 instead of bondib:

```[root@dm01db01 ~]# ifconfig bond0
bond0     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.100.1  Bcast:192.168.100.255  Mask:255.255.255.0
inet6 addr: fe80::221:2800:13f:2673/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
RX packets:55048256 errors:0 dropped:0 overruns:0 frame:0
TX packets:56638365 errors:0 dropped:21 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:12207158878 (11.3 GiB)  TX bytes:18646886557 (17.3 GiB)

[root@dm01db01 ~]#
```

So instead of using the broken ib_set_node_desc.sh script, fix it manually:

```[root@dm01db01 ~]# dcli -g dbs_group -l root "echo -n `hostname -s` S `ifconfig bond0 | grep 'inet addr' | cut -f2 -d: | cut -f1 -d' '` HCA-1 > /sys/class/infiniband/mlx4_0/node_desc"
```

If all went well you should end up with the following:

```[root@dm01db01 ~]# ibnodes | grep dm01db
Ca     : 0x00212800013f1242 ports 2 "dm01db01 S 192.168.100.1 HCA-1"
Ca     : 0x00212800013f12da ports 2 "dm01db01 S 192.168.100.1 HCA-1"
Ca     : 0x00212800013f111e ports 2 "dm01db01 S 192.168.100.1 HCA-1"
Ca     : 0x00212800013f2672 ports 2 "dm01db01 S 192.168.100.1 HCA-1"
```

After these changes the guided discovery of your Exadata should now run as is described in the cloud control manual.