Info about Exadata’s imageinfo and rpm’s

Recently we noticed that imageinfo was displaying the following information at a customers site. Imageinfo was not displaying data i suspected, so a quick little blogpost about what was going on there.

[root@dm01db01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root "imageinfo | grep 'Image version'"
dm01db01: Image version: 11.2.3.2.1.130109
dm01db02: Image version: 11.2.3.2.1.130109
dm01db03: Image version: 11.2.3.2.1.130109
dm01db04: Image version: 11.2.3.2.1.130109
dm01db05: Image version: 11.2.3.2.0.120713
dm01db06: Image version: 11.2.3.2.0.120713
dm01db07: Image version: 11.2.3.2.0.120713
dm01db08: Image version: 11.2.3.2.0.120713

So what happened here, did those last four nodes not got updated when installing the latest CellOS QDPFE? Well it turns out that all the correct packages are installed after a quick investigation:

[root@dm0105 ~]# for i in `yum info | grep Name | grep exadata |cut -d: -f2`; do yum info $i ; done | grep 'Version\|Name'
Name       : exadata-applyconfig
Version    : 11.2.3.2.1.130109
Name       : exadata-asr
Version    : 11.2.3.2.1.130109
Name       : exadata-base
Version    : 11.2.3.2.1.130109
Name       : exadata-commonnode
Version    : 11.2.3.2.1.130109
Name       : exadata-exachk
Version    : 11.2.3.2.1.130109
Name       : exadata-firmware-compute
Version    : 11.2.3.2.1.130109
Name       : exadata-ibdiagtools
Version    : 11.2.3.2.1.130109
Name       : exadata-ipconf
Version    : 11.2.3.2.1.130109
Name       : exadata-onecommand
Version    : 11.2.3.2.1.130109
Name       : exadata-oswatcher
Version    : 11.2.3.2.1.130109
Name       : exadata-sun-computenode
Version    : 11.2.3.2.1.130109
Name       : exadata-validations-compute
Version    : 11.2.3.2.1.130109
[root@dm0101 rpm-extraxt]#

So according to yum everything is installed here, rpm -qa also confirms this so no miss-match there:

[root@dm0105 ~]# rpm -qa | grep exadata
exadata-asr-11.2.3.2.1.130109-1
exadata-validations-compute-11.2.3.2.1.130109-1
exadata-base-11.2.3.2.1.130109-1
exadata-exachk-11.2.3.2.1.130109-1
exadata-oswatcher-11.2.3.2.1.130109-1
exadata-ipconf-11.2.3.2.1.130109-1
exadata-firmware-compute-11.2.3.2.1.130109-1
exadata-applyconfig-11.2.3.2.1.130109-1
exadata-onecommand-11.2.3.2.1.130109-1
exadata-commonnode-11.2.3.2.1.130109-1
exadata-sun-computenode-11.2.3.2.1.130109-1
exadata-ibdiagtools-11.2.3.2.1.130109-1
[root@dm0101 ~]#

It looks like imageinfo is displaying incorrect information here, how come and where does it gets its information from? Well, imageinfo gets its information out of a file called image.id:

root@dm01db05 ~]# cat /opt/oracle.cellos/image.id
id:1357731445
kernel:2.6.32-400.11.1.el5uek
kernelstock:2.6.32-300.19.1.el5uek
version:11.2.3.2.1.130109
type:production
label:OSS_11.2.3.2.1_LINUX.X64_130109
cellrpmver:OSS_11.2.3.2.1_LINUX.X64_130109
cellswver:OSS_11.2.3.2.1_LINUX.X64_130109
mode:install
created:2013-01-09 03:37:26 -0800
node:COMPUTE
f331a2bfa27cc7371d936009289c23b6 *commonos.tbz
22212b4436b57fba6d53be4d7a6fd975 *dbboot.tbz
c7cf016f2180ab820010a787b77c59fb *dbfw.tbz
fd9d8b4f4a30588ca39f66015d5edc4e *dbrpms.tbz
eaa1a7973efcdcd2fe40b09828ba0e01 *debugos.tbz
290909dec5b3995aed2298f20d345aea *exaos.tbz
cfda22b7599e47bdc686acf41fd5d67c *kernel.tbz
18db43d6e94797710ffa771702be33e8 *ofed.tbz
4831ba742464caaa11c1c0153b823886 *sunutils.tbz
activated:2013-11-24 04:41:03 +0100
status:success

you can see that this file contains all the informations about the current version of your Exadata software stack as well as a list of md5 checksums of the .tbz files in /opt/oracle.cellos/iso/cellbits. We checked the md5 checksums of these files as well so now we now 100% that we are on the correct level. Now how does this file gets updated, it seems that it is out-of-date or at least not inline with our system? Well it is being generated when you are patching your compute node to another level and it is being updated by the exadata-sun-computenode rpm. Lets peek a bit into that rpm and what it is deploying, first i am going to mount an ISO file as my yum repository:

[root@dm0101 ~]# mount -o loop /u01/app/oracle/stage.2288/112_latest_repo_130109.iso /var/www/html/yum/unknown/EXADATA/dbserver/11.2/latest.2288/
[root@dm0101 ~]# cp /var/www/html/yum/unknown/EXADATA/dbserver/11.2/latest.2288/x86_64/exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm 

Next step is to extract the files from the RPM and see what we get:

[root@dm0101 ~]# mkdir /tmp/exadata-sun-computenode-
[root@dm0101 ~]# cd /tmp/exadata-sun-computenode-
[root@dm0101 exadata-sun-computenode-]# rpm2cpio ../exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm | cpio -idv
./etc/init.d/pciibmap
./etc/init.d/workaround_13083530
./etc/yum.repos.d/Exadata-computenode.repo.sample
./etc/yum/pluginconf.d/installonlyn.conf
./opt/oracle.SupportTools/corecontrol
./opt/oracle.SupportTools/dbserver_backup.sh
./opt/oracle.SupportTools/sundiag.sh
./opt/oracle.cellos/exadata.computenode.post.sh
./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in
194 blocks
[root@dm0101 exadata-sun-computenode-]#

Aha! there is a file called ./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in in that RPM, but it is not showing everything, it still doesn’t got the md5 checksum information in it:

[root@dm0101 exadata-sun-computenode-]# cat ./opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in
id:1362281283
kernel:2.6.32-400.21.1.el5uek
kernelstock:2.6.32-300.19.1.el5uek
version:11.2.3.2.1.130302
type:production
label:OSS_11.2.3.2.1_LINUX.X64_RELEASE
cellrpmver:OSS_11.2.3.2.1_LINUX.X64_130302
cellswver:OSS_11.2.3.2.1_LINUX.X64_130302
mode:install
created:2013-03-02 19:28:04 -0800
node:COMPUTE
[root@dm0101 exadata-sun-computenode-]#

To find out how that gets filled we need to take a look into the script that gets executed when we install this RPM. The output bellow is a bit truncated for readability:

[root@dm0101 rpm-extraxt]# rpm --scripts -qp exadata-sun-computenode-11.2.3.2.1.130302-1.x86_64.rpm
.
..
fi
# Don't start post script for the fresh imaging (on zero boot)
# Don't install any files (remove them)
if [ -f /.post.imaging ]; then
  rm -f /opt/oracle.cellos/tmpl/image.id.11.2.3.2.1.130302.in
  rm -f /opt/oracle.cellos/exadata.computenode.post.sh
else
  /sbin/service exachkcfg postrpm 11.2.3.2.1.130302 exadata-sun-computenode
fi
..
.

So when we install this RPM it does (among other things not shown here) remove my image.id file and a shell script called exadata.computenode.post.sh and it then executes a new version of exadata.computenode.post.sh. Now exadata.computenode.post.sh is quiet a big script and it does setup your compute node properly and then finally it reboots your compute node. In this huge exadata.computenode.post.sh script there is a function that is being called that actually calculates the md5 checksums of my cellbits directory and places that info into a variable:

md5sum_iso_cellbits ()
{
  local imageid_out=$1

  [ "$imageid_out" == '' ] && imageid_out=$IMAGE_ID_DEF
  print_log "Please wait. Calculating checksums for $imageid_out ..." INFO
  # Remove any existing md5 sums from the output file
  perl -ne '/[a-f\d]\s+\*.+/ or print $_' -i $imageid_out
  local tmp_lst=/tmp/msd5sum.cellbits.lst
  pushd /opt/oracle.cellos/iso/cellbits
  md5sum -b * > $tmp_lst
  cat $tmp_lst >> $imageid_out
  rm -f $tmp_lst
  popd
}

Finally the function db_update_imageinfo () in exadata.computenode.post.sh creates a new and up-to-date version of my image.id file and it makes imageinfo happy again. To make a long blogpost short, the solution to our misbehaving image info (and also imagehistory btw.) was to run

yum --enablerepo=exadata_dbserver_11.2.3.2.1_x86_64_base reinstall exadata-sun-computenode-11.2.3.2.1.130109-1

and let the nodes reboot. This solved our problem with the imageinfo command, unfortunately we could not determine the root cause for the image.id problem.