Patching the Big Data Appliance

I have been patching engineered systems since the launch of the Exadata V2 and recently i had the opportunity to patch the BDA we have in house. As far as comparisons go, this is were the similarities stop between Exadata and a Big Data Appliance (BDA) patching.
Our BDA is a so called startes rack consisting of 6 nodes running a hadoop cluster, for more information about this read my First Impressions blog post. On Exadata patching consist of a whole set of different patches and tools like patchmgr and dbnodeupdate.sh, on the Big Data Appliance we are patching with a tool called Mammoth. In this blogpost we will use the the upgrade from BDA software release 4.0 to 4.1 as an example to describe the patching process. You can download the BDA Mammoth bundle patch through DocID 1485745.1 (Oracle Big Data Appliance Patch Set Master Note). This patchset contains of 2 zipfiles which you should upload to your primary node in the cluster (usually node number 1 or 7), if you are not sure which node to use, you can use bdacli to find out which node you should use:

[root@bda1node01 BDAMammoth-ol6-4.1.0]# bdacli getinfo cluster_primary_host
bda1node01

After determining which node to use you can upload the 2 zipfiles from MOS and unzip them somewhere on that specific node:

[root@bda1node01 patch]# for i in `ls *.zip`; do unzip $i; done
Archive:  p20369730_410_Linux-x86-64_1of2.zip
  inflating: README.txt
   creating: BDAMammoth-ol6-4.1.0/
  inflating: BDAMammoth-ol6-4.1.0/BDAMammoth-ol6-4.1.0.run
Archive:  p20369730_410_Linux-x86-64_2of2.zip
   creating: BDABaseImage-ol6-4.1.0_RELEASE/
...
..
.
  inflating: BDABaseImage-ol6-4.1.0_RELEASE/json-select
[root@bda1node01 patch]#

After unzipping these files you end up with a huge file containing a shell script with an uuencode added binary payload:

[root@bda1node01 BDAMammoth-ol6-4.1.0]# ll
total 3780132
-rwxrwxrwx 1 root root 3870848281 Jan 16 10:17 BDAMammoth-ol6-4.1.0.run
[root@bda1node01 BDAMammoth-ol6-4.1.0]# 

This .run files does a couple of checks, like determining on which version of Linux we are running (this BDA is on OEL6), it installs a new version of BDAMammoth and moves to previous version to /opt/oracle/BDAMammoth/previous-BDAMammoth/. It also updates the Cloudera parcels and yum repository in /opt/oracle/BDAMammoth/bdarepo and places a new basimeage iso file on this node which is needed if you are adding new nodes to the cluster are need to reimaging them. Finally it also updates all the puppet files in /opt/oracle/BDAMammoth/puppet/manifests, Oracle uses puppet a lot for deploying software on the BDA, this means that patching the BDA is also done by a collection of puppet scripts. When this is done we can start the patch process by running mammoth -p, so here we go:

[root@bda1node01 BDAMammoth]# ./mammoth -p
INFO: Logging all actions in /opt/oracle/BDAMammoth/bdaconfig/tmp/bda1node01-20150130103200.log and traces in /opt/oracle/BDAMammoth/bdaconfig/tmp/bda1node01-20150130103200.trc
INFO: all_nodes not generated yet, skipping check for password-less ssh
ERROR: Big Data SQL is not supported on BDA v4.1.0
ERROR: Please uninstall Big Data SQL before upgrade cluster
ERROR: Cannot continue with operation
INFO: Running bdadiagcluster...
INFO: Starting Big Data Appliance diagnose cluster at Fri Jan 30 10:32:12 2015
INFO: Logging results to /tmp/bda_diagcluster_1422610330.log
SUCCESS: Created BDA diagcluster zipfile on node bda1node01
SUCCESS: Created BDA diagcluster zipfile on node bda1node02
SUCCESS: Created BDA diagcluster zipfile on node bda1node03
SUCCESS: Created BDA diagcluster zipfile on node bda1node04
SUCCESS: Created BDA diagcluster zipfile on node bda1node05
SUCCESS: Created BDA diagcluster zipfile on node bda1node06
SUCCESS: bdadiagcluster_1422610330.zip created
INFO: Big Data Appliance diagnose cluster complete at Fri Jan 30 10:32:54 2015
INFO: Please get the Big Data Appliance cluster diagnostic bundle at /tmp/bdadiagcluster_1422610330.zip
Exiting...
[root@bda1node01 BDAMammoth]# 

This error is obvious, we have Big Data SQL installed on our cluster (which was introduced in the BDA4.0 software) but this version we are running is not supported for BDA4.1. Unfortunately we are running BDSQL 1.0 which is the only version of BDSQL there is at this point, this is was also a known bug described in MOS Doc ID 1964471.1. So we have 2 options wait for a new release of BDSQL and postpone the upgrade or remove BDSQL and continue with the upgrade. We decided to continue with the upgrade and deinstall BDSQL for now, it was not working on the Exadata site due to conflicting patches with the latest RDBMS 12.1.0.2 patch. Removing BDSQL can be done with bdacli or mammoth-reconfig, bdacli calls mammoth-reconfig so as far as i know it doesn’t really matter. So lets give that a try:

[root@bda1node01 BDAMammoth]# ./mammoth-reconfig remove big_data_sql
INFO: Logging all actions in /opt/oracle/BDAMammoth/bdaconfig/tmp/bda1node01-20150130114222.log and traces in /opt/oracle/BDAMammoth/bdaconfig/tmp/bda1node01-20150130114222.trc
INFO: This is the install of the primary rack
ERROR: Version mismatch between mammoth and params file
ERROR: Mammoth version: 4.1.0, Params file version: 4.0.0
ERROR: Cannot continue with install #Step -1#
INFO: Running bdadiagcluster...
INFO: Starting Big Data Appliance diagnose cluster at Fri Jan 30 11:42:25 2015
INFO: Logging results to /tmp/bda_diagcluster_1422614543.log
SUCCESS: Created BDA diagcluster zipfile on node bda1node01
SUCCESS: Created BDA diagcluster zipfile on node bda1node02
SUCCESS: Created BDA diagcluster zipfile on node bda1node03
SUCCESS: Created BDA diagcluster zipfile on node bda1node04
SUCCESS: Created BDA diagcluster zipfile on node bda1node05
SUCCESS: Created BDA diagcluster zipfile on node bda1node06
SUCCESS: bdadiagcluster_1422614543.zip created
INFO: Big Data Appliance diagnose cluster complete at Fri Jan 30 11:43:03 2015
INFO: Please get the Big Data Appliance cluster diagnostic bundle at /tmp/bdadiagcluster_1422614543.zip
Exiting...
[root@bda1node01 BDAMammoth]#

Because we have already extracted the new version of Mammoth when executed the .run file we now have a version mismatch between the existing the mammoth software and params file /opt/oracle/BDAMammoth/bdaconfig/VERSION. We know that when we ran BDAMammoth-ol6-4.1.0.run the old mammoth software was backed up to /opt/oracle/BDAMammoth/previous-BDAMammoth/. Our first thought was to replace to new version with the previous version:

[root@bda1node01 oracle]# mv BDAMammoth BDAMammoth.new
[root@bda1node01 oracle]# cp -R ./BDA
BDABaseImage/           BDABaseImage-ol6-4.0.0/ BDABaseImage-ol6-4.1.0/ BDAMammoth.new/
[root@bda1node01 oracle]# cp -R ./BDAMammoth.new/previous-BDAMammoth/ ./BDAMammoth

Running it again, resulted in lots more errors, i am just pasting parts of the output here, just to give you an idea:

[root@bda1node01 BDAMammoth]# ./mammoth-reconfig remove big_data_sql
INFO: Logging all actions in /opt/oracle/BDAMammoth/bdaconfig/tmp/bda1node01-20150130113539.log and traces in /opt/oracle/BDAMammoth/bdaconfig/tmp/bda1node01-20150130113539.trc
INFO: This is the install of the primary rack
INFO: Checking if password-less ssh is set up
...
..
.
json-select: no valid input found for "@CIDR@"

*** json-select *** Out of branches near "json object <- json value
     OR json array <- json value" called from "json select command line"
*** json-select *** Syntax error on line 213:
}
...
..
.
ERROR: Puppet agent run on node bda1node01 had errors. List of errors follows

************************************
Error [6928]: Report processor failed: Permission denied - /opt/oracle/BDAMammoth/puppet/reports/bda1node01.oracle.vxcompany.local
************************************

INFO: Also check the log file in /opt/oracle/BDAMammoth/bdaconfig/tmp/pagent-bda1node01-20150130113621.log
...
..
.

So after fiddling around with this we found out that the soluting was actually extremely simple, there was no need to move the old 4.0 mammoth software back into place. The solution was to simply run the mammoth-reconfig script directly from backup directory: /opt/oracle/BDAMammoth/previous-BDAMammoth/ and we finally have BDSQL (mainly the cellsrv software) disabled on the cluster:

[root@bda1node01 BDAMammoth]# bdacli status big_data_sql_cluster
ERROR: Service big_data_sql_cluster is disabled. Please enable the service first to run this command.
[root@bda1node01 BDAMammoth]#

With BDSQL disabled we van give mammoth -p another shot and try to upgrade the hadoop cluster and wait for all the puppet scripts the finish. The output of the mammoth script is long so i won’t post it in here for readability reasons. All the logging from the mammoth script and puppet agents are all written to this /opt/oracle/BDAMammoth/bdaconfig/tmp/ during the patch process and eventually it is all nicely put into a single zipfile when the patching is done in /tmp/cdhctr1-install-summary.zip. The patch process on our starter rack took about +2 hours in total and about 40 minutes into the patching process we get this:

WARNING: The OS kernel was updated on some nodes - so those nodes need to be rebooted
INFO: Nodes to be rebooted: bda1node01,bda1node02,bda1node03,bda1node04,bda1node05,bda1node06
Proceed with reboot? [y/n]: y


Broadcast message from root@bda1node01.oracle.vxcompany.local
	(unknown) at 13:09 ...

The system is going down for reboot NOW!
INFO: Reboot done.
INFO: Please wait until all reboots are complete before continuing
[root@bda1node01 BDAMammoth]# Connection to bda1node01 closed by remote host.

Don’t be fooled that the reboot is the end of the patch process, the interesting part is just about the start. After the system is back online you can see that we are still on the old BDA software version:

[root@bda1node01 ~]# bdacli getinfo cluster_version
4.0.0

This is not really documented clearly by oracle, there is no special command to continue the patch process (like dbnodeupdate.sh -c on Exadata), just simply enter mammoth -p again and mammoth will pick up the patching process were it left off. In this second part of the patching process mammoth will actually update the Cloudera stack and all it components like Hadoop, Hive, Impala etc. this will take about 90 minutes to finish with a whole bunch of tests runs from map reduce jobs, ooze jobs, teragen sorts etc. (in my case it reported that i had a flume agent in concerning health after the upgrade). After all this is done the mammoth script will finish and we can verify if we have an upgraded cluster or not:

[root@bda1node01 BDAMammoth]# bdacli getinfo cluster_version
4.1.0

[root@bda1node01 BDAMammoth]# bdacli getinfo cluster_cdh_version
5.3.0-ol6

Except from the issues surrounding BDSQL patching the BDA worked like a charm, although Oracle could have documented it a bit better on how to continue after the OS upgrade and reboot. Personally i really like the fact that oracle uses puppet for orchestration on the BDA. Next step would be waiting on the update for BDSQL so we can do some proper testing.