Mixing UDP and RDS on an Exadata

Recently i got a question from an Exadata customer what would happen if you would forget to relink the oracle for IPC RDS use for just one home. Would Oracle be smart enough to fall back on UPC for the complete RAC database? Time to do a little test and find out what will happen here:

First lets see how IPC communication between the nodes is done right now. We can use skgxpinfo to see how it is linked since version 11.2.0.3.0:

[oracle@dm01db01 [dbfs1] ~]$ dcli -g dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo
dm01db01: rds
dm01db02: rds
dm01db03: rds
dm01db04: rds
[oracle@dm01db01 [dbfs1] ~]$ 

Alternatively we can double check it if you want:

[oracle@dm01db01 [t11203] ~]$ dcli -g dbs_group -l oracle 'nm /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so | grep rds_enabled'
dm01db01: 00000000001de7f0 b skgxp_rds_enabled
dm01db02: 00000000001de7f0 b skgxp_rds_enabled
dm01db03: 00000000001de7f0 b skgxp_rds_enabled
dm01db04: 00000000001de7f0 b skgxp_rds_enabled
[oracle@dm01db01 [t11203] ~]$ 

So Oracle started with RDS support enabled, this can be seen in the alert log during the startup of the instance:

[oracle@dm01db01 [dbfs1] trace]$ grep -A4 "Cluster communication" alert_dbfs1.log |tail -5
Cluster communication is configured to use the following interface(s) for this instance
  192.168.100.1
cluster interconnect IPC version:Oracle RDS/IP (generic)
IPC Vendor 1 proto 3
  Version 4.1
[oracle@dm01db01 [dbfs1] trace]$ 

Lets change on the 4th node on the RAC cluster (this test system is half rack Exadata) the IPC library and relink it for UDP usage. Before relinking the library make sure that all instance running out of that home are stopped, when ready relink it with UDP usage (the output is trimmed a little bit for readability):

[oracle@dm01db04 [dbfs4] trace]$ make -C /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib -f ins_rdbms.mk ipc_g ioracle
make: Entering directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib'
rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
cp /u01/app/oracle/product/11.2.0.3/dbhome_1/lib//libskgxpg.so /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
chmod 755 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin

 - Linking Oracle 
rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle
gcc  -o /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle -m64 -L/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/stubs/ 
test ! -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle ||\
	   mv -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracleO
mv /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle
chmod 6751 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle
make: Leaving directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib'
[oracle@dm01db04 [dbfs4] trace]$ 

So now in the this 4th node my oracle 11.2.0.3.0 home now linked for UDP IPC communication, we can check it with skgxpinfo:

[oracle@dm01db01 [dbfs1] trace]$ dcli -g ~/dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo
dm01db01: rds
dm01db02: rds
dm01db03: rds
dm01db04: udp
[oracle@dm01db01 [dbfs1] trace]$ 

So indeed, the 4th node is now configured for UDP usage. Now what happens if I try to start a RAC database from this 11.2.0.3.0 home, will Oracle see that one node can’t use RDS and fall back to UDP for the whole database:

[oracle@dm01db04 [dbfs4] trace]$ srvctl start database -d dbfs
PRCR-1079 : Failed to start resource ora.dbfs.db
CRS-5017: The resource action "ora.dbfs.db start" encountered the following error: 
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.3/grid/log/dm01db04/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.dbfs.db' on 'dm01db04' failed
CRS-2632: There are no more servers to try to place resource 'ora.dbfs.db' on that would satisfy its placement policy
[oracle@dm01db04 [dbfs4] trace]$ srvctl status database -d dbfs
Instance dbfs1 is running on node dm01db01
Instance dbfs2 is running on node dm01db02
Instance dbfs3 is running on node dm01db03
Instance dbfs4 is not running on node dm01db04
[oracle@dm01db04 [dbfs4] trace]$ 

So not really a surprise there, the RAC instances can’t see communicate with each other, the alter logs of the instances are confirming that SKGCP libraries can’t communicate with each other, this from the alert log on the RDS instances:

ORA-27506: IPC error connecting to a port
ORA-27300: OS system dependent operation:proto mismatch failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcon
ORA-27303: additional information: Protocol of this IPC does not match remote (192.168.100.4). SKGXP IPC libraries must be the same version. [local: RDS,remote: UDP]

Node 4 with the UDP is showing a different error stack:

ORA-27550: Target ID protocol check failed. tid vers=1, type=1, remote instance number=1, local instance number=4
System state dump requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination].
LMON (ospid: 12700): terminating the instance due to error 481
System State dumped to trace file /u01/app/oracle/diag/rdbms/dbfs/dbfs4/trace/dbfs4_diag_12688.trc
Wed Aug 08 21:50:24 2012
ORA-1092 : opitsk aborting process
Dumping diagnostic data in directory=[cdmp_20120808215024], requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination].

Cluster communication is configured to use the following interface(s) for this instance
  192.168.100.4
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2

So basically, the message is don’t mix UDP and RDS boys and girls!

Leave a comment