Using HCC on ZFS Storage Appliances

Hybrid Columnar Compression (HCC) is one of the Exadata features but lately Oracle has been pushing this featurei to other Oracle hardware like the ZFS Storage Appliance and Axiom Pillar Storage series. We recently got a ZFS Storage Appliance (ZFSSA) at VX Company, so we are now able to use HCC on the Oracle Database Appliance (ODA). To use HCC we need to create a tablespace with datafiles on a ZFS Storage Appliance, in order to de so we going to hookup our ODA using directNFS (dNFS). I am not going in the details of explaining dnfs in this blogpost, there are enough blogs around who have excellent background information on dnfs:

http://www.pythian.com/news/34425/oracle-direct-nfs-how-to-start/
https://blogs.oracle.com/XPSONHA/entry/using_dnfs_for_test_purposes

So lets setup dnfs for our HCC table on my ODA. First we create a mountpoint (on all the RAC nodes):

[root@vxoda12 ~]# mkdir /mnt/vxzfs
[root@vxoda12 ~]# chown oracle:oinstall /mnt/vxzfs

Add the NFS export to fstab (on all the RAC nodes), this should of course be a NFS share on our ZFSSA

[root@vxoda11 ~]# cat /etc/fstab|grep nfs
vxzfs.oracle.vxcompany.local:/export/odatestdrive  /mnt/vxzfs  nfs  rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,vers=3,timeo=600
[root@vxoda11 ~]# mount /mnt/vxzfs/
[root@vxoda11 ~]# mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
vxzfs.oracle.vxcompany.local:/export/odatestdrive on /mnt/vxzfs type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,nfsvers=3,timeo=600,addr=10.12.0.211)

Next create a file called oranfstab, if you are on a RAC Instance don’t forget to change local IP address on the other nodes:

[oracle@vxoda11 [odadb011] trace]$ vi $ORACLE_HOME/dbs/oranfstab
server: vxzfs.oracle.vxcompany.local
path:  10.12.0.211
local:  10.12.0.202
export: /export/odatestdrive mount: /mnt/vxzfs

Change the Oracle ODM library to the Oracle ODM NFS library (on all the RAC nodes):

[oracle@vxoda11 [] ~]$ cd $ORACLE_HOME/lib
[oracle@vxoda11 [] lib]$ ls -al libodm11.so
lrwxrwxrwx 1 oracle oinstall 12 Nov  8 13:15 libodm11.so -> libodmd11.so
[oracle@vxoda11 [] lib]$ ln -sf libnfsodm11.so libodm11.so
[oracle@vxoda11 [] lib]$ ls -al libodm11.so
lrwxrwxrwx 1 oracle oinstall 14 Nov 13 12:54 libodm11.so -> libnfsodm11.so

Restart the database:

[oracle@vxoda11 [odadb011] lib]$ srvctl stop database -d odadb01 -o immediate
[oracle@vxoda11 [odadb011] lib]$ srvctl start database -d odadb01
[oracle@vxoda11 [odadb011] lib]$ srvctl status database -d odadb01
Instance odadb011 is running on node vxoda11
Instance odadb012 is running on node vxoda12

Check alertlog you should now see:

Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0

Create a tablespace on the NFS share

SYS@odadb011 AS SYSDBA> create bigfile tablespace dnfs datafile '/mnt/vxzfs/dnfs.dbf' size 500G extent management local autoallocate;

Now you should see some entries in v$dnfs_servers:

SYS@odadb012 AS SYSDBA> col dirname form a50
SYS@odadb012 AS SYSDBA> col svrname form a50
SYS@odadb012 AS SYSDBA> select * from v$dnfs_servers;

   INST_ID        ID SVRNAME                               DIRNAME                                MNTPORT       NFSPORT      WTMAX       RTMAX
---------- ---------- -------------------------------------------------- -------------------------------------------------- ---------- ---------- ---------- ----------
      1         1 vxzfs.oracle.vxcompany.local                /export/odatestdrive                          59286          2049    1048576     1048576


SYS@odadb012 AS SYSDBA> 

If we then try to create a HCC table in our dNFS tablespace we see this:

SYS@kjj2 AS SYSDBA> create table t3 compress for archive low tablespace dnfs_kjj as select * from dba_objects;
create table t3 compress for archive low tablespace dnfs_kjj as select * from dba_objects
                                                                              *
ERROR at line 1:
ORA-64307: hybrid columnar compression is not supported for tablespaces on this storage type

Apperently Oracle does not know that we are storing our data on a ZFS Appliance. To see what is going we can take a look at the traffic between my ODA en ZFSSA using tcpdump. When looking at the dump in wireshark we see these packages going between ODA and the ZFSSA:

wireshark

So Oracle tries to do SNMP calls to the ZFSSA, lets see what happens if we do a get to the MIB displayed in the tcp frame above:

[root@vxoda11 ~]# snmpget -v1 -c public 10.12.0.211 1.3.6.1.4.1.42.2.225.1.4.2.0
Timeout: No Response from 10.12.0.211.

We need to enable SNMP on the ZFSSA (needless to say you need to replace the e-maill address with proper one):

user@localhost:~$ ssh admin@vxzfs.oracle.vxcompany.local
Password:
Last login: Sun Nov 18 18:20:32 2012 from 10.12.0.252
Waiting for the appliance shell to start ... 
vxzfs:> configuration services snmp
vxzfs:configuration services snmp> set network=10.12.0.0/24
                       network = 10.12.0.0/24 (uncommitted)
vxzfs:configuration services snmp> set syscontact=<someusername>@vxcompany.com
                    syscontact = <someusername>@vxcompany.com (uncommitted)
vxzfs:configuration services snmp> enable
vxzfs:configuration services snmp> commit
vxzfs:configuration services snmp> show
Properties:
                      <status> = online
                     community = public
                       network = 10.12.0.0/24
                    syscontact = <someusername>@vxcompany.com
                     trapsinks = 127.0.0.1

vxzfs:configuration services snmp>

Our snmpget should now work:

[root@vxoda11 ~]# snmpget -v1 -c public 10.12.0.211 1.3.6.1.4.1.42.2.225.1.4.2.0
SNMPv2-SMI::enterprises.42.2.225.1.4.2.0 = STRING: "Sun ZFS Storage 7120"

Now due to Unpublished Bug 12979161 we need to make some symlinks so that dNFS finds the correct snmp libraries libnetsnmp.so, apparently this is fixed in 11.2.0.4:

[oracle@vxoda11 [odadb011] ~]$ locate libnetsnmp.so
/usr/lib64/libnetsnmp.so.10
/usr/lib64/libnetsnmp.so.10.0.3
[root@vxoda11 ~]# cd /usr/lib64/
[root@vxoda11 lib64]# ln -s libnetsnmp.so.10.0.3 libnetsnmp.so

Last step is to restart our database so it can pickup symlinks for the snmp libraries:

[oracle@vxoda11 [kjj1] ~]$ srvctl stop database -d kjj -o immediate
[oracle@vxoda11 [kjj1] ~]$ srvctl start database -d kjj

Now we are able to create our HCC table on our ZFSSA tablespace:

SYS@odadb011 AS SYSDBA> create table t1 compress for archive low tablespace dnfs as select * from dba_objects;

Table created.

SYS@kjj2 AS SYSDBA> select compression, compress_for from dba_tables where table_name='T2';

COMPRESS COMPRESS_FOR
-------- ------------
ENABLED  ARCHIVE LOW

Happy compressing!

Using SQL profiles to ignore hints

Recently i was asked to assist an ongoing issue at a customers site were one query was giving a a lot of problems. After a short investigation it turned out the application was using USE_NL hint in this select statement, which worked fine on a small table but crippled performance as the table grew. Now for several reasons the application code could not be changed within reasonable timeframe, so i created a SQL profile for this statement, which ignored all optimizer hints used in the statement.

Lets execute a query with an index hint as an example (using pipo_de_clown as way to easily find it in the cursor cache, if your not dutch: Google the phrase) and lets have a look at its plan:

SQL&gt;  select /*+ index (test_data t1) pipo_de_clown */* from test_data where 0=1;


no rows selected


SQL&gt; select sql_id,sql_text from v$sql where sql_text like '%pipo_de_clown%';


SQL_ID        SQL_TEXT
------------- ----------------------------------------------------------------------------------------------------
3pusfgjmytwzm select sql_id,sql_text from v$sql where sql_text like '%pipo_de_clown%'
33gn3bmt3kkw2  select /*+ index (test_data t1) pipo_de_clown */* from test_data where 0=1


2 rows selected.


SQL&gt; select * from table(dbms_xplan.display_cursor('33gn3bmt3kkw2'));


PLAN_TABLE_OUTPUT
------------- ---------------------------------------------------------------------------------------------------
SQL_ID  33gn3bmt3kkw2, child number 0
-------------------------------------
select /*+ index (test_data t1) pipo_de_clown */* from test_data where
0=1


Plan hash value: 3034433771


------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |     1 (100)|          |
|*  1 |  FILTER                      |           |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST_DATA | 10000 |   224K|    65   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | T1        | 10000 |       |    26   (0)| 00:00:01 |
------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------


   1 - filter(NULL IS NOT NULL)


Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing




25 rows selected.

The CBO did exactly what i asked it to do and used my index. Now, lets say that for you need this hint to be removed from this statement and let the CBO do all the smart thinking (which is in general a good idea anyway). In order to do this, we create a SQL Profile for this statement and add the hint IGNORE_OPTIM_EMBEDDED_HINTS as an attribute to this profile. This hint will make the CBO ignore all hints used in this particular statement:

SQL> begin
  2    dbms_sqltune.import_sql_profile(
  3    name => 'test_profile',
  4    category => 'DEFAULT',
  5    sql_text => 'select /*+ index (test_data t1) pipo_de_clown */* from test_data where 0=1',
  6    
profile => sqlprof_attr('IGNORE_OPTIM_EMBEDDED_HINTS'));  7  end;
  8  /


PL/SQL procedure successfully completed.

So now we have our profile in place, we can now invalidate this cursor and let the new plan with our newly created SQL Profile take its place:

SQL> select 
address, hash_value from v$sqlarea where sql_id='33gn3bmt3kkw2';pr


ADDRESS          HASH_VALUE
---------------- ----------
000000008E67D860 3373364990


SQL> exec dbms_shared_pool.purge('000000008E67D860,3373364990', 'C');


PL/SQL procedure successfully completed.

Lets execute the exact same query as i had earlier on, including the index hint and have a look at the execution plan:

SQL> select /*+ index (test_data t1) pipo_de_clown */* from test_data where 0=1;


no rows selected


SQL> select sql_id,sql_text from v$sql where sql_text like '%pipo_de_clown%';


SQL_ID        SQL_TEXT
------------- ----------------------------------------------------------------------------------------------------
3pusfgjmytwzm select sql_id,sql_text from v$sql where sql_text like '%pipo_de_clown%'
6rmm76g4j2xry select /*+ index (test_data t1) pipo_de_clown */* from test_data where 0=1


2 rows selected.


SQL> select * from table(dbms_xplan.display_cursor('33gn3bmt3kkw2'));


PLAN_TABLE_OUTPUT
------------- ---------------------------------------------------------------------------------------------------
SQL_ID 33gn3bmt3kkw2, child number 0
-------------------------------------
select /*+ index (test_data t1) pipo_de_clown */* from test_data where
0=1


Plan hash value: 2744875046


--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |       |     1 (100)|          |
|*  1 |  FILTER            |           |       |       |            |          |
|   2 |   TABLE ACCESS FULL| TEST_DATA | 10000 |   224K|    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------


   1 - filter(NULL IS NOT NULL)


Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing
   - SQL profile test_profile used for this statement




25 rows selected.

No more pesky hints that interfere with our beloved CBO.

Mixing UDP and RDS on an Exadata

Recently i got a question from an Exadata customer what would happen if you would forget to relink the oracle for IPC RDS use for just one home. Would Oracle be smart enough to fall back on UPC for the complete RAC database? Time to do a little test and find out what will happen here:

First lets see how IPC communication between the nodes is done right now. We can use skgxpinfo to see how it is linked since version 11.2.0.3.0:

[oracle@dm01db01 [dbfs1] ~]$ dcli -g dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo
dm01db01: rds
dm01db02: rds
dm01db03: rds
dm01db04: rds
[oracle@dm01db01 [dbfs1] ~]$ 

Alternatively we can double check it if you want:

[oracle@dm01db01 [t11203] ~]$ dcli -g dbs_group -l oracle 'nm /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so | grep rds_enabled'
dm01db01: 00000000001de7f0 b skgxp_rds_enabled
dm01db02: 00000000001de7f0 b skgxp_rds_enabled
dm01db03: 00000000001de7f0 b skgxp_rds_enabled
dm01db04: 00000000001de7f0 b skgxp_rds_enabled
[oracle@dm01db01 [t11203] ~]$ 

So Oracle started with RDS support enabled, this can be seen in the alert log during the startup of the instance:

[oracle@dm01db01 [dbfs1] trace]$ grep -A4 "Cluster communication" alert_dbfs1.log |tail -5
Cluster communication is configured to use the following interface(s) for this instance
  192.168.100.1
cluster interconnect IPC version:Oracle RDS/IP (generic)
IPC Vendor 1 proto 3
  Version 4.1
[oracle@dm01db01 [dbfs1] trace]$ 

Lets change on the 4th node on the RAC cluster (this test system is half rack Exadata) the IPC library and relink it for UDP usage. Before relinking the library make sure that all instance running out of that home are stopped, when ready relink it with UDP usage (the output is trimmed a little bit for readability):

[oracle@dm01db04 [dbfs4] trace]$ make -C /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib -f ins_rdbms.mk ipc_g ioracle
make: Entering directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib'
rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
cp /u01/app/oracle/product/11.2.0.3/dbhome_1/lib//libskgxpg.so /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
chmod 755 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin

 - Linking Oracle 
rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle
gcc  -o /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle -m64 -L/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/stubs/ 
test ! -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle ||\
	   mv -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracleO
mv /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle
chmod 6751 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle
make: Leaving directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib'
[oracle@dm01db04 [dbfs4] trace]$ 

So now in the this 4th node my oracle 11.2.0.3.0 home now linked for UDP IPC communication, we can check it with skgxpinfo:

[oracle@dm01db01 [dbfs1] trace]$ dcli -g ~/dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo
dm01db01: rds
dm01db02: rds
dm01db03: rds
dm01db04: udp
[oracle@dm01db01 [dbfs1] trace]$ 

So indeed, the 4th node is now configured for UDP usage. Now what happens if I try to start a RAC database from this 11.2.0.3.0 home, will Oracle see that one node can’t use RDS and fall back to UDP for the whole database:

[oracle@dm01db04 [dbfs4] trace]$ srvctl start database -d dbfs
PRCR-1079 : Failed to start resource ora.dbfs.db
CRS-5017: The resource action "ora.dbfs.db start" encountered the following error: 
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.3/grid/log/dm01db04/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.dbfs.db' on 'dm01db04' failed
CRS-2632: There are no more servers to try to place resource 'ora.dbfs.db' on that would satisfy its placement policy
[oracle@dm01db04 [dbfs4] trace]$ srvctl status database -d dbfs
Instance dbfs1 is running on node dm01db01
Instance dbfs2 is running on node dm01db02
Instance dbfs3 is running on node dm01db03
Instance dbfs4 is not running on node dm01db04
[oracle@dm01db04 [dbfs4] trace]$ 

So not really a surprise there, the RAC instances can’t see communicate with each other, the alter logs of the instances are confirming that SKGCP libraries can’t communicate with each other, this from the alert log on the RDS instances:

ORA-27506: IPC error connecting to a port
ORA-27300: OS system dependent operation:proto mismatch failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcon
ORA-27303: additional information: Protocol of this IPC does not match remote (192.168.100.4). SKGXP IPC libraries must be the same version. [local: RDS,remote: UDP]

Node 4 with the UDP is showing a different error stack:

ORA-27550: Target ID protocol check failed. tid vers=1, type=1, remote instance number=1, local instance number=4
System state dump requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination].
LMON (ospid: 12700): terminating the instance due to error 481
System State dumped to trace file /u01/app/oracle/diag/rdbms/dbfs/dbfs4/trace/dbfs4_diag_12688.trc
Wed Aug 08 21:50:24 2012
ORA-1092 : opitsk aborting process
Dumping diagnostic data in directory=[cdmp_20120808215024], requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination].

Cluster communication is configured to use the following interface(s) for this instance
  192.168.100.4
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2

So basically, the message is don’t mix UDP and RDS boys and girls!

Adding an Exadata V2 as a target in Enterprise Manager 12c

Although Oracle says that with Enterprise Manager 12c it “provides the tools to effectively and efficiently manage your Oracle Exadata Database Machine” it is a bit of a challenge to get it all working correctly on an Exadata V2. It looks like when developing the Exadata plugin for Enterprise manager 12c they clearly developed it on a X2 only, getting a V2 as a target into Enterprise Manager does not work out of the box. In order to get Enterprise Manager 12c to discover your Exadata V2 you need to do some extra steps.

Exadata discovery is done using the first compute node in your Exadata rack (e.g. dm01db01). The agent uses a file called databasemachine.xml which is located in your One Command directory:

[oracle@dm01db01 [+ASM1] ~]$ ls -la /opt/oracle.SupportTools/onecommand/database*
-rw-r--r-- 1 root root 15790 May 10 22:07 /opt/oracle.SupportTools/onecommand/databasemachine.xml
[oracle@dm01db01 [+ASM1] ~]$

This file is being generated with dbm_configurator.xls in the One Command directory, unfortunately for V2 owners, early One Command versions did not generate these files so you have generated it yourself. Obviously you need Excel and a Windows pc to use dbm_configurator.xls as it uses VBA (Visual Basic for Applications) to generate the One Command files.

  • On the first node in the rack scp the following 2 files from /opt/oracle.SupportTools/onecommand:
    1. config.dat
    2. onecommand.params
  • Download OneCommand: Patch 13612149
  • Unzip the file p13612149_112242_Generic.zip windows host
  • Extract the tarbal onecmd.tar
  • Open dbm_configurator.xls in Excel
  • Enable macro’s withing excel
  • Click on the import button in the top left and locate the onecommand.params file (make sure that config.dat is in the same directory)
  • Check if the imported data is still correct
  • Click the generate button
  • Click the create config files button

After this upload at least the databasemachine.xml to /opt/oracle.SupportTools/onecommand on your first node in your rack.

Next step is to correct the Infiniband naming of the compute node HBA’s, right now on a V2 these are as follow:

[root@dm01db01 mlx4_0]# ibnodes | grep dm01db
Ca     : 0x00212800013f1242 ports 2 "dm01db04 HCA-1"
Ca     : 0x00212800013f12da ports 2 "dm01db02 HCA-1"
Ca     : 0x00212800013f111e ports 2 "dm01db03 HCA-1"
Ca     : 0x00212800013f2672 ports 2 "dm01db01 HCA-1"

Unfortunately the agent discovery process is looking for a naming convention that goes ‘hostname S ip-address HCA-1′. Fortunately Oracle provided us with a script to correct this: /opt/oracle.cellos/ib_set_node_desc.sh. When you run this script on a V2 not much will happen, it is broken on a V2 system. The problem is in the infiniband bond naming:

[root@dm01db01 ~]# grep IFCFG_BONDIB /opt/oracle.cellos/ib_set_node_desc.sh
  local IFCFG_BONDIB=/etc/sysconfig/network-scripts/ifcfg-bondib
        local addr=`awk -F= 'BEGIN {IGNORECASE=1} /^IPADDR=[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/ {print $2}' $IFCFG_BONDIB$id 2>/dev/null`
[root@dm01db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bondib
cat: /etc/sysconfig/network-scripts/ifcfg-bondib: No such file or directory
[root@dm01db01 ~]# 

So Exadata V2 IB bond has a different, it is actually called bond0 instead of bondib:

[root@dm01db01 ~]# ifconfig bond0
bond0     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:192.168.100.1  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::221:2800:13f:2673/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
          RX packets:55048256 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56638365 errors:0 dropped:21 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:12207158878 (11.3 GiB)  TX bytes:18646886557 (17.3 GiB)

[root@dm01db01 ~]# 

So instead of using the broken ib_set_node_desc.sh script, fix it manually:

[root@dm01db01 ~]# dcli -g dbs_group -l root "echo -n `hostname -s` S `ifconfig bond0 | grep 'inet addr' | cut -f2 -d: | cut -f1 -d' '` HCA-1 > /sys/class/infiniband/mlx4_0/node_desc"

If all went well you should end up with the following:

[root@dm01db01 ~]# ibnodes | grep dm01db
Ca     : 0x00212800013f1242 ports 2 "dm01db01 S 192.168.100.1 HCA-1"
Ca     : 0x00212800013f12da ports 2 "dm01db01 S 192.168.100.1 HCA-1"
Ca     : 0x00212800013f111e ports 2 "dm01db01 S 192.168.100.1 HCA-1"
Ca     : 0x00212800013f2672 ports 2 "dm01db01 S 192.168.100.1 HCA-1"

After these changes the guided discovery of your Exadata should now run as is described in the cloud control manual.

Peeking at your Exadata infiniband traffic

As a DBA you are probably very curious on what is going on, on your system. So when you have a shiny Exadata you probably had a look at the infiniband fabric that is connecting the compute nodes and storage nodes together. When you want to see what kind traffic is going from the compute nodes to the storage nodes, or on the RAC interconnects you can use tcpdump to do so (if it is not install you can do a ‘yum install tcpdump’):

[root@dm01db02 ~]# tcpdump -i bond0 -s 0 -w /tmp/tcpdump.pcap
tcpdump: WARNING: arptype 32 not supported by libpcap - falling back to cooked socket
tcpdump: listening on bond0, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
2073 packets captured
2073 packets received by filter
0 packets dropped by kernel
[root@dm01db02 ~]#

This will give you a dump file (/tmp/tcpdump.pcap) which you can analyze with your favorite network analyzer (probably Wireshark). If you are new to this you can download and install Wireshark here: http://www.wireshark.org/download.html

Using tcpdump you can sniff all the IPOIB traffic (ip over infiniband), but can you take a peak at the other traffic that is going on the Infiniband wire? Yes there is a way, you can use Mellanox’s ibdump. This tool is not installed by default on your compute nodes so need to download it and install it on the node of your choice (as a reminder: don’t install anything on your cellservers!):

[root@dm01db02 ~]# wget http://www.mellanox.com/downloads/tools/ibdump-1.0.5-4-rpms.tgz
--2012-02-11 15:13:27--  http://www.mellanox.com/downloads/tools/ibdump-1.0.5-4-rpms.tgz
Resolving www.mellanox.com... 98.129.157.233
Connecting to www.mellanox.com|98.129.157.233|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 486054 (475K) [application/x-gzip]
Saving to: `ibdump-1.0.5-4-rpms.tgz'

100%[==========================================================================================================================================>] 486,054      290K/s   in 1.6s

2012-02-11 15:13:29 (290 KB/s) - `ibdump-1.0.5-4-rpms.tgz' saved [486054/486054]
[root@dm01db02 ~]

Extract the tarball:

[root@dm01db02 ~]# tar -xvf ibdump-1.0.5-4-rpms.tgz
ibdump-1.0.5-4-rpms/
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.i386-rhel5.4.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.ppc64-rhel5.4.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.i386-rhel5.5.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.ppc64-rhel5.5.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.i386-rhel5.6.rpm
ibdump-1.0.5-4-rpms/ibdump_release_notes.txt
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-rhel5.4.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-rhel5.5.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.ppc64-rhel5.6.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-rhel5.6.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.i686-rhel6.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.ppc64-rhel6.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-rhel6.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.i586-sles10sp3.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.ppc64-sles10sp3.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-sles10sp3.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.i586-sles11.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.ppc64-sles11.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.i586-sles11sp1.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.ppc64-sles11sp1.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-sles11sp1.rpm
ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-sles11.rpm
[root@dm01db02 ~]#

Next step, install it. It will be placed into your /usr/bin folder:

[root@dm01db02 ~]# rpm -i ./ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-rhel`lsb_release -r|awk '{print $2}'`.rpm
[root@dm01db02 ~]# ls -la /usr/bin/ibdump
-rwxr-xr-x 1 root root 41336 Dec 19  2010 /usr/bin/ibdump
[root@dm01db02 ~]#

Now you are ready to play with ibdump, running it without parameters will make ibdump sniffing interface mlx4_0 (which is ib0) and writes the frames into a file called sniffer.pcap in your working directory. Some parameters can be added such as the dump file location:

[root@dm01db02 ~]# ibdump -o /tmp/ibdump.pcap
 ------------------------------------------------
 IB device                      : "mlx4_0"
 IB port                        : 1
 Dump file                      : /tmp/ibdump.pcap
 Sniffer WQEs (max burst size)  : 4096
 ------------------------------------------------

Initiating resources ...
searching for IB devices in host
Port active_mtu=2048
MR was registered with addr=0x1bc58590, lkey=0x8001c34e, rkey=0x8001c34e, flags=0x1
QP was created, QP number=0x60005b

Ready to capture (Press ^c to stop):
Captured:     11711 packets, 10978982 bytes

Interrupted (signal 2) - exiting ...

[root@dm01db02 ~]#

There are some drawback to ibdump though:

  • ibdump may encounter packet drops upon a burst of more than 4096 (or 2^max-burst) packets.
  • Packets loss is not reported by ibdump.
  • Outbound retransmitted and multicast packets may not be collected correctly.
  • ibdump may stop capturing packets when run on the same port of the Subnet Manager (E.G.: opensm). It is advised not to run the SM and ibdump on the same port.

Be aware of the issues above, besides that: Have fun peeking around at your Exadata infiniband fabric!

Golden Gate monitoring in OEM

I have migrated several database to Exadata using Golden Gate and i really liked the Golden Gate tool. The only thing i never really understood is why there was not a simple Enterprise Manager plug-in to monitor the Golden Gate status. So i decided to create a small script that can be used as a ‘User Defined Metric’ in OEM. For monitoring purposes i am only interested in knowing if my process is abended (status stopped means that i have deliberately stopped it, so i don’t need to know that) and what the lag time is. So i have created that small script i called ogg_status.sh, which i placed in my $GG_HOME directory:

#!/bin/bash
# set -x
########################################
#
# Usage ./ogg_status.sh -n GG_PROCES_NAME -t status|lagtime
# Klaas-Jan Jongsma 2011
#
# v.01
#
########################################
#Function to get info all from GGSCI
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
export LD_LIBRARY_PATH=/u01/app/oracle/product/10.2.0/db_1/lib
infoall() {
cd /u01/app/oracle/product/ogg
(
./ggsci << eof
info all
exit
eof
)}
########################################
# Commandline options:
while getopts n:t: opt
do
  case "$opt" in
    n) NAME="$OPTARG";;
    t) TYPE="$OPTARG";;
  esac
done
#########################################
# Check status of Golden Gate processes
status()
{
# infoall |grep "$NAME" | awk '{print $2}'
cSTATUS="`infoall |grep "$NAME" | awk '{print $2}'`"
if [ "${cSTATUS}" = "ABENDED" ]
  then
  echo "em_result=ABENDED"
  echo "em_message=Golden Gate process ${NAME} status: ${cSTATUS}"
else
    if [ "${cSTATUS}" = "STOPPED" ]
      then
      echo "em_result=STOPPED"
    else echo "em_result=RUNNING"
    fi
    unset cSTATUS
fi
}
#########################################
# Check lagtime of Golden Gate processes
lagtime()
{
cLAGTIME="`infoall | grep $NAME | awk '{print $4}'`"
# echo $cLAGTIME
cLAGTIME_HOURS=`echo $cLAGTIME | awk -F: '{print $1}'`
CLAGTIME_MINUTES=`echo $cLAGTIME | awk -F: '{print $2}'`
cLAGTIME_SECONDS=`echo $cLAGTIME | awk -F: '{print $3}'`
cLAGTIME_SEC_TOTAL=$(($cLAGTIME_HOURS*3600+$CLAGTIME_MINUTES*60+$cLAGTIME_SECONDS))
echo "em_result=${cLAGTIME_SEC_TOTAL}"
echo "em_message=Golden Gate process ${NAME} lagtime is: ${cLAGTIME} (${cLAGTIME_SEC_TOTAL} seconds), check Golden Gate infrastructure."
unset cLAGTIME cLAGTIME_HOURS CLAGTIME_MINUTES cLAGTIME_SECONDS cLAGTIME_SEC_TOTAL
}
#########################################
# MAIN
case "$TYPE" in
  status)
     status
     ;;
  lagtime)
     lagtime
     ;;
esac
# set +x

Now we have a script that tells something about our Golden Gate status we can create an UDM in OEM, we do this at the host target level. Now go to the top right corner and click on create, to make a new UDM:

If you want to create an UDM that monitors the lag of a process fill it in somewhat similar like below. It will create an UDM whenever the extract/datapump process gets a lag bigger then 5 seconds:

For monitoring the status of an extract or manager process create the following. It monitors if a process gets the status abended

We now have our UDM in place, next step would be to create a notification rule. For this go to your OEM preferences, this is located at the top right most place in your OEM screen. In the preferences screen click on rules (located under notifications):

This will brings in the notification rules window, add a new rule here and add you host as a target for the rule:

Next go the the metrics tab and add two metrics, one UDM String metric and UDM number metric. We need the string UDM to monitor process status, the number UDM is needed for lag monitoring:

Next check the critical severity status:

We are done on the Metrics tab, final step is to tell OEM to send us an e-mail. Go to the last tab, actions, do so and check ‘Send me an e-mail’:

If you have OEM alerting set up properly, you will now get mail if process gets abended or if the lag gets bigger then 5 seconds.