Recently i got a question from an Exadata customer what would happen if you would forget to relink the oracle for IPC RDS use for just one home. Would Oracle be smart enough to fall back on UPC for the complete RAC database? Time to do a little test and find out what will happen here:
First lets see how IPC communication between the nodes is done right now. We can use skgxpinfo to see how it is linked since version 11.2.0.3.0:
[oracle@dm01db01 [dbfs1] ~]$ dcli -g dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo dm01db01: rds dm01db02: rds dm01db03: rds dm01db04: rds [oracle@dm01db01 [dbfs1] ~]$
Alternatively we can double check it if you want:
[oracle@dm01db01 [t11203] ~]$ dcli -g dbs_group -l oracle 'nm /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so | grep rds_enabled' dm01db01: 00000000001de7f0 b skgxp_rds_enabled dm01db02: 00000000001de7f0 b skgxp_rds_enabled dm01db03: 00000000001de7f0 b skgxp_rds_enabled dm01db04: 00000000001de7f0 b skgxp_rds_enabled [oracle@dm01db01 [t11203] ~]$
So Oracle started with RDS support enabled, this can be seen in the alert log during the startup of the instance:
[oracle@dm01db01 [dbfs1] trace]$ grep -A4 "Cluster communication" alert_dbfs1.log |tail -5 Cluster communication is configured to use the following interface(s) for this instance 192.168.100.1 cluster interconnect IPC version:Oracle RDS/IP (generic) IPC Vendor 1 proto 3 Version 4.1 [oracle@dm01db01 [dbfs1] trace]$
Lets change on the 4th node on the RAC cluster (this test system is half rack Exadata) the IPC library and relink it for UDP usage. Before relinking the library make sure that all instance running out of that home are stopped, when ready relink it with UDP usage (the output is trimmed a little bit for readability):
[oracle@dm01db04 [dbfs4] trace]$ make -C /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib -f ins_rdbms.mk ipc_g ioracle make: Entering directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib' rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so cp /u01/app/oracle/product/11.2.0.3/dbhome_1/lib//libskgxpg.so /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so chmod 755 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin - Linking Oracle rm -f /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle gcc -o /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle -m64 -L/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/ -L/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/stubs/ test ! -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle ||\ mv -f /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracleO mv /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle chmod 6751 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oracle make: Leaving directory `/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib' [oracle@dm01db04 [dbfs4] trace]$
So now in the this 4th node my oracle 11.2.0.3.0 home now linked for UDP IPC communication, we can check it with skgxpinfo:
[oracle@dm01db01 [dbfs1] trace]$ dcli -g ~/dbs_group -l oracle /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/skgxpinfo dm01db01: rds dm01db02: rds dm01db03: rds dm01db04: udp [oracle@dm01db01 [dbfs1] trace]$
So indeed, the 4th node is now configured for UDP usage. Now what happens if I try to start a RAC database from this 11.2.0.3.0 home, will Oracle see that one node can’t use RDS and fall back to UDP for the whole database:
[oracle@dm01db04 [dbfs4] trace]$ srvctl start database -d dbfs PRCR-1079 : Failed to start resource ora.dbfs.db CRS-5017: The resource action "ora.dbfs.db start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 0 Session ID: 0 Serial number: 0 . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.3/grid/log/dm01db04/agent/crsd/oraagent_oracle/oraagent_oracle.log". CRS-2674: Start of 'ora.dbfs.db' on 'dm01db04' failed CRS-2632: There are no more servers to try to place resource 'ora.dbfs.db' on that would satisfy its placement policy [oracle@dm01db04 [dbfs4] trace]$ srvctl status database -d dbfs Instance dbfs1 is running on node dm01db01 Instance dbfs2 is running on node dm01db02 Instance dbfs3 is running on node dm01db03 Instance dbfs4 is not running on node dm01db04 [oracle@dm01db04 [dbfs4] trace]$
So not really a surprise there, the RAC instances can’t see communicate with each other, the alter logs of the instances are confirming that SKGCP libraries can’t communicate with each other, this from the alert log on the RDS instances:
ORA-27506: IPC error connecting to a port ORA-27300: OS system dependent operation:proto mismatch failed with status: 0 ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: skgxpcon ORA-27303: additional information: Protocol of this IPC does not match remote (192.168.100.4). SKGXP IPC libraries must be the same version. [local: RDS,remote: UDP]
Node 4 with the UDP is showing a different error stack:
ORA-27550: Target ID protocol check failed. tid vers=1, type=1, remote instance number=1, local instance number=4 System state dump requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination]. LMON (ospid: 12700): terminating the instance due to error 481 System State dumped to trace file /u01/app/oracle/diag/rdbms/dbfs/dbfs4/trace/dbfs4_diag_12688.trc Wed Aug 08 21:50:24 2012 ORA-1092 : opitsk aborting process Dumping diagnostic data in directory=[cdmp_20120808215024], requested by (instance=4, osid=12700 (LMON)), summary=[abnormal instance termination]. Cluster communication is configured to use the following interface(s) for this instance 192.168.100.4 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2
So basically, the message is don’t mix UDP and RDS boys and girls!