Recently we noticed that imageinfo was displaying the following information at a customers site. Imageinfo was not displaying data i suspected, so a quick little blogpost about what was going on there.
[root@dm01db01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root "imageinfo | grep 'Image version'" dm01db01: Image version: 11.2.3.2.1.130109 dm01db02: Image version: 11.2.3.2.1.130109 dm01db03: Image version: 11.2.3.2.1.130109 dm01db04: Image version: 11.2.3.2.1.130109 dm01db05: Image version: 11.2.3.2.0.120713 dm01db06: Image version: 11.2.3.2.0.120713 dm01db07: Image version: 11.2.3.2.0.120713 dm01db08: Image version: 11.2.3.2.0.120713
So what happened here, did those last four nodes not got updated when installing the latest CellOS QDPFE? Well it turns out that all the correct packages are installed after a quick investigation:
[root@dm0105 ~]# for i in `yum info | grep Name | grep exadata |cut -d: -f2`; do yum info $i ; done | grep 'Version\|Name' Name : exadata-applyconfig Version : 11.2.3.2.1.130109 Name : exadata-asr Version : 11.2.3.2.1.130109 Name : exadata-base Version : 11.2.3.2.1.130109 Name : exadata-commonnode Version : 11.2.3.2.1.130109 Name : exadata-exachk Version : 11.2.3.2.1.130109 Name : exadata-firmware-compute Version : 11.2.3.2.1.130109 Name : exadata-ibdiagtools Version : 11.2.3.2.1.130109 Name : exadata-ipconf Version : 11.2.3.2.1.130109 Name : exadata-onecommand Version : 11.2.3.2.1.130109 Name : exadata-oswatcher Version : 11.2.3.2.1.130109 Name : exadata-sun-computenode Version : 11.2.3.2.1.130109 Name : exadata-validations-compute Version : 11.2.3.2.1.130109 [root@dm0101 rpm-extraxt]#
So according to yum everything is installed here, rpm -qa also confirms this so no miss-match there:
[root@dm0105 ~]# rpm -qa | grep exadata exadata-asr-11.2.3.2.1.130109-1 exadata-validations-compute-11.2.3.2.1.130109-1 exadata-base-11.2.3.2.1.130109-1 exadata-exachk-11.2.3.2.1.130109-1 exadata-oswatcher-11.2.3.2.1.130109-1 exadata-ipconf-11.2.3.2.1.130109-1 exadata-firmware-compute-11.2.3.2.1.130109-1 exadata-applyconfig-11.2.3.2.1.130109-1 exadata-onecommand-11.2.3.2.1.130109-1 exadata-commonnode-11.2.3.2.1.130109-1 exadata-sun-computenode-11.2.3.2.1.130109-1 exadata-ibdiagtools-11.2.3.2.1.130109-1 [root@dm0101 ~]#
It looks like imageinfo is displaying incorrect information here, how come and where does it gets its information from? Well, imageinfo gets its information out of a file called image.id:
root@dm01db05 ~]# cat /opt/oracle.cellos/image.id id:1357731445 kernel:2.6.32-400.11.1.el5uek kernelstock:2.6.32-300.19.1.el5uek version:11.2.3.2.1.130109 type:production label:OSS_11.2.3.2.1_LINUX.X64_130109 cellrpmver:OSS_11.2.3.2.1_LINUX.X64_130109 cellswver:OSS_11.2.3.2.1_LINUX.X64_130109 mode:install created:2013-01-09 03:37:26 -0800 node:COMPUTE f331a2bfa27cc7371d936009289c23b6 *commonos.tbz 22212b4436b57fba6d53be4d7a6fd975 *dbboot.tbz c7cf016f2180ab820010a787b77c59fb *dbfw.tbz fd9d8b4f4a30588ca39f66015d5edc4e *dbrpms.tbz eaa1a7973efcdcd2fe40b09828ba0e01 *debugos.tbz 290909dec5b3995aed2298f20d345aea *exaos.tbz cfda22b7599e47bdc686acf41fd5d67c *kernel.tbz 18db43d6e94797710ffa771702be33e8 *ofed.tbz 4831ba742464caaa11c1c0153b823886 *sunutils.tbz activated:2013-11-24 04:41:03 +0100 status:success
you can see that this file contains all the informations about the current version of your Exadata software stack as well as a list of md5 checksums of the .tbz files in /opt/oracle.cellos/iso/cellbits. We checked the md5 checksums of these files as well so now we now 100% that we are on the correct level. Now how does this file gets updated, it seems that it is out-of-date or at least not inline with our system? Well it is being generated when you are patching your compute node to another level and it is being updated by the exadata-sun-computenode rpm. Lets peek a bit into that rpm and what it is deploying, first i am going to mount an ISO file as my yum repository:
[root@dm0101 ~]# mount -o loop /u01/app/oracle/stage.2288/112_latest_repo_130109.iso /var/www/html/yum/unknown/EXADATA/dbserver/11.2/latest.2288/ [root@dm0101 ~]# cp /var/www/html/yum/unknown/EXADATA/dbserver/11.2/latest.2288/x86_64/exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm
Next step is to extract the files from the RPM and see what we get:
[root@dm0101 ~]# mkdir /tmp/exadata-sun-computenode- [root@dm0101 ~]# cd /tmp/exadata-sun-computenode- [root@dm0101 exadata-sun-computenode-]# rpm2cpio ../exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm | cpio -idv ./etc/init.d/pciibmap ./etc/init.d/workaround_13083530 ./etc/yum.repos.d/Exadata-computenode.repo.sample ./etc/yum/pluginconf.d/installonlyn.conf ./opt/oracle.SupportTools/corecontrol ./opt/oracle.SupportTools/dbserver_backup.sh ./opt/oracle.SupportTools/sundiag.sh ./opt/oracle.cellos/exadata.computenode.post.sh ./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in 194 blocks [root@dm0101 exadata-sun-computenode-]#
Aha! there is a file called ./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in in that RPM, but it is not showing everything, it still doesn’t got the md5 checksum information in it:
[root@dm0101 exadata-sun-computenode-]# cat ./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in id:1362281283 kernel:2.6.32-400.21.1.el5uek kernelstock:2.6.32-300.19.1.el5uek version:11.2.3.2.1.130302 type:production label:OSS_11.2.3.2.1_LINUX.X64_RELEASE cellrpmver:OSS_11.2.3.2.1_LINUX.X64_130302 cellswver:OSS_11.2.3.2.1_LINUX.X64_130302 mode:install created:2013-03-02 19:28:04 -0800 node:COMPUTE [root@dm0101 exadata-sun-computenode-]#
To find out how that gets filled we need to take a look into the script that gets executed when we install this RPM. The output bellow is a bit truncated for readability:
[root@dm0101 rpm-extraxt]# rpm --scripts -qp exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm . .. fi # Don't start post script for the fresh imaging (on zero boot) # Don't install any files (remove them) if [ -f /.post.imaging ]; then rm -f /opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in rm -f /opt/oracle.cellos/exadata.computenode.post.sh else /sbin/service exachkcfg postrpm 11.2.3.2.1.130302 exadata-sun-computenode fi .. .
So when we install this RPM it does (among other things not shown here) remove my image.id file and a shell script called exadata.computenode.post.sh and it then executes a new version of exadata.computenode.post.sh. Now exadata.computenode.post.sh is quiet a big script and it does setup your compute node properly and then finally it reboots your compute node. In this huge exadata.computenode.post.sh script there is a function that is being called that actually calculates the md5 checksums of my cellbits directory and places that info into a variable:
md5sum_iso_cellbits () { local imageid_out=$1 [ "$imageid_out" == '' ] && imageid_out=$IMAGE_ID_DEF print_log "Please wait. Calculating checksums for $imageid_out ..." INFO # Remove any existing md5 sums from the output file perl -ne '/[a-f\d]\s+\*.+/ or print $_' -i $imageid_out local tmp_lst=/tmp/msd5sum.cellbits.lst pushd /opt/oracle.cellos/iso/cellbits md5sum -b * > $tmp_lst cat $tmp_lst >> $imageid_out rm -f $tmp_lst popd }
Finally the function db_update_imageinfo () in exadata.computenode.post.sh creates a new and up-to-date version of my image.id file and it makes imageinfo happy again. To make a long blogpost short, the solution to our misbehaving image info (and also imagehistory btw.) was to run
yum --enablerepo=exadata_dbserver_11.2.3.2.1_x86_64_base reinstall exadata-sun-computenode-11.2.3.2.1.130109-1
and let the nodes reboot. This solved our problem with the imageinfo command, unfortunately we could not determine the root cause for the image.id problem.