Using HCC on ZFS Storage Appliances

Hybrid Columnar Compression (HCC) is one of the Exadata features but lately Oracle has been pushing this featurei to other Oracle hardware like the ZFS Storage Appliance and Axiom Pillar Storage series. We recently got a ZFS Storage Appliance (ZFSSA) at VX Company, so we are now able to use HCC on the Oracle Database Appliance (ODA). To use HCC we need to create a tablespace with datafiles on a ZFS Storage Appliance, in order to de so we going to hookup our ODA using directNFS (dNFS). I am not going in the details of explaining dnfs in this blogpost, there are enough blogs around who have excellent background information on dnfs:

So lets setup dnfs for our HCC table on my ODA. First we create a mountpoint (on all the RAC nodes):

[root@vxoda12 ~]# mkdir /mnt/vxzfs
[root@vxoda12 ~]# chown oracle:oinstall /mnt/vxzfs

Add the NFS export to fstab (on all the RAC nodes), this should of course be a NFS share on our ZFSSA

[root@vxoda11 ~]# cat /etc/fstab|grep nfs  /mnt/vxzfs  nfs  rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,vers=3,timeo=600
[root@vxoda11 ~]# mount /mnt/vxzfs/
[root@vxoda11 ~]# mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) on /mnt/vxzfs type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,nfsvers=3,timeo=600,addr=

Next create a file called oranfstab, if you are on a RAC Instance don’t forget to change local IP address on the other nodes:

[oracle@vxoda11 [odadb011] trace]$ vi $ORACLE_HOME/dbs/oranfstab
export: /export/odatestdrive mount: /mnt/vxzfs

Change the Oracle ODM library to the Oracle ODM NFS library (on all the RAC nodes):

[oracle@vxoda11 [] ~]$ cd $ORACLE_HOME/lib
[oracle@vxoda11 [] lib]$ ls -al
lrwxrwxrwx 1 oracle oinstall 12 Nov  8 13:15 ->
[oracle@vxoda11 [] lib]$ ln -sf
[oracle@vxoda11 [] lib]$ ls -al
lrwxrwxrwx 1 oracle oinstall 14 Nov 13 12:54 ->

Restart the database:

[oracle@vxoda11 [odadb011] lib]$ srvctl stop database -d odadb01 -o immediate
[oracle@vxoda11 [odadb011] lib]$ srvctl start database -d odadb01
[oracle@vxoda11 [odadb011] lib]$ srvctl status database -d odadb01
Instance odadb011 is running on node vxoda11
Instance odadb012 is running on node vxoda12

Check alertlog you should now see:

Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0

Create a tablespace on the NFS share

SYS@odadb011 AS SYSDBA> create bigfile tablespace dnfs datafile '/mnt/vxzfs/dnfs.dbf' size 500G extent management local autoallocate;

Now you should see some entries in v$dnfs_servers:

SYS@odadb012 AS SYSDBA> col dirname form a50
SYS@odadb012 AS SYSDBA> col svrname form a50
SYS@odadb012 AS SYSDBA> select * from v$dnfs_servers;

   INST_ID        ID SVRNAME                               DIRNAME                                MNTPORT       NFSPORT      WTMAX       RTMAX
---------- ---------- -------------------------------------------------- -------------------------------------------------- ---------- ---------- ---------- ----------
      1         1                /export/odatestdrive                          59286          2049    1048576     1048576

SYS@odadb012 AS SYSDBA> 

If we then try to create a HCC table in our dNFS tablespace we see this:

SYS@kjj2 AS SYSDBA> create table t3 compress for archive low tablespace dnfs_kjj as select * from dba_objects;
create table t3 compress for archive low tablespace dnfs_kjj as select * from dba_objects
ERROR at line 1:
ORA-64307: hybrid columnar compression is not supported for tablespaces on this storage type

Apperently Oracle does not know that we are storing our data on a ZFS Appliance. To see what is going we can take a look at the traffic between my ODA en ZFSSA using tcpdump. When looking at the dump in wireshark we see these packages going between ODA and the ZFSSA:


So Oracle tries to do SNMP calls to the ZFSSA, lets see what happens if we do a get to the MIB displayed in the tcp frame above:

[root@vxoda11 ~]# snmpget -v1 -c public
Timeout: No Response from

We need to enable SNMP on the ZFSSA (needless to say you need to replace the e-maill address with proper one):

user@localhost:~$ ssh
Last login: Sun Nov 18 18:20:32 2012 from
Waiting for the appliance shell to start ... 
vxzfs:> configuration services snmp
vxzfs:configuration services snmp> set network=
                       network = (uncommitted)
vxzfs:configuration services snmp> set syscontact=<someusername>
                    syscontact = <someusername> (uncommitted)
vxzfs:configuration services snmp> enable
vxzfs:configuration services snmp> commit
vxzfs:configuration services snmp> show
                      <status> = online
                     community = public
                       network =
                    syscontact = <someusername>
                     trapsinks =

vxzfs:configuration services snmp>

Our snmpget should now work:

[root@vxoda11 ~]# snmpget -v1 -c public
SNMPv2-SMI::enterprises. = STRING: "Sun ZFS Storage 7120"

Now due to Unpublished Bug 12979161 we need to make some symlinks so that dNFS finds the correct snmp libraries, apparently this is fixed in

[oracle@vxoda11 [odadb011] ~]$ locate
[root@vxoda11 ~]# cd /usr/lib64/
[root@vxoda11 lib64]# ln -s

Last step is to restart our database so it can pickup symlinks for the snmp libraries:

[oracle@vxoda11 [kjj1] ~]$ srvctl stop database -d kjj -o immediate
[oracle@vxoda11 [kjj1] ~]$ srvctl start database -d kjj

Now we are able to create our HCC table on our ZFSSA tablespace:

SYS@odadb011 AS SYSDBA> create table t1 compress for archive low tablespace dnfs as select * from dba_objects;

Table created.

SYS@kjj2 AS SYSDBA> select compression, compress_for from dba_tables where table_name='T2';

-------- ------------

Happy compressing!

Peeking at your Exadata infiniband traffic

As a DBA you are probably very curious on what is going on, on your system. So when you have a shiny Exadata you probably had a look at the infiniband fabric that is connecting the compute nodes and storage nodes together. When you want to see what kind traffic is going from the compute nodes to the storage nodes, or on the RAC interconnects you can use tcpdump to do so (if it is not install you can do a ‘yum install tcpdump’):

[root@dm01db02 ~]# tcpdump -i bond0 -s 0 -w /tmp/tcpdump.pcap
tcpdump: WARNING: arptype 32 not supported by libpcap - falling back to cooked socket
tcpdump: listening on bond0, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
2073 packets captured
2073 packets received by filter
0 packets dropped by kernel
[root@dm01db02 ~]#

This will give you a dump file (/tmp/tcpdump.pcap) which you can analyze with your favorite network analyzer (probably Wireshark). If you are new to this you can download and install Wireshark here:

Using tcpdump you can sniff all the IPOIB traffic (ip over infiniband), but can you take a peak at the other traffic that is going on the Infiniband wire? Yes there is a way, you can use Mellanox’s ibdump. This tool is not installed by default on your compute nodes so need to download it and install it on the node of your choice (as a reminder: don’t install anything on your cellservers!):

[root@dm01db02 ~]# wget
--2012-02-11 15:13:27--
Connecting to||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 486054 (475K) [application/x-gzip]
Saving to: `ibdump-1.0.5-4-rpms.tgz'

100%[==========================================================================================================================================>] 486,054      290K/s   in 1.6s

2012-02-11 15:13:29 (290 KB/s) - `ibdump-1.0.5-4-rpms.tgz' saved [486054/486054]
[root@dm01db02 ~]

Extract the tarball:

[root@dm01db02 ~]# tar -xvf ibdump-1.0.5-4-rpms.tgz
[root@dm01db02 ~]#

Next step, install it. It will be placed into your /usr/bin folder:

[root@dm01db02 ~]# rpm -i ./ibdump-1.0.5-4-rpms/ibdump-1.0.5-4.x86_64-rhel`lsb_release -r|awk '{print $2}'`.rpm
[root@dm01db02 ~]# ls -la /usr/bin/ibdump
-rwxr-xr-x 1 root root 41336 Dec 19  2010 /usr/bin/ibdump
[root@dm01db02 ~]#

Now you are ready to play with ibdump, running it without parameters will make ibdump sniffing interface mlx4_0 (which is ib0) and writes the frames into a file called sniffer.pcap in your working directory. Some parameters can be added such as the dump file location:

[root@dm01db02 ~]# ibdump -o /tmp/ibdump.pcap
 IB device                      : "mlx4_0"
 IB port                        : 1
 Dump file                      : /tmp/ibdump.pcap
 Sniffer WQEs (max burst size)  : 4096

Initiating resources ...
searching for IB devices in host
Port active_mtu=2048
MR was registered with addr=0x1bc58590, lkey=0x8001c34e, rkey=0x8001c34e, flags=0x1
QP was created, QP number=0x60005b

Ready to capture (Press ^c to stop):
Captured:     11711 packets, 10978982 bytes

Interrupted (signal 2) - exiting ...

[root@dm01db02 ~]#

There are some drawback to ibdump though:

  • ibdump may encounter packet drops upon a burst of more than 4096 (or 2^max-burst) packets.
  • Packets loss is not reported by ibdump.
  • Outbound retransmitted and multicast packets may not be collected correctly.
  • ibdump may stop capturing packets when run on the same port of the Subnet Manager (E.G.: opensm). It is advised not to run the SM and ibdump on the same port.

Be aware of the issues above, besides that: Have fun peeking around at your Exadata infiniband fabric!