Getting started with PostgreSQL and Systemtap

Lets say wanted to understand how understand Postgres, beyond the regular trace file. Or you have this nasty performance issue that makes no sense to you and the regular logging and tracing are not giving the results, where do you go? Well, you could always try running perf and pipe the export to Brendan Gregg’s flamegraph tool to get a better understanding. If that doesn’t work for you, don’t worry PostgreSQL has you covered. PostgreSQL facilitates the use of dynamic tracing within the database. Dynamic tracing supports on PostgreSQL supports 2 utilities: Dtrace and Systemtap.

PostgreSQL dynamic tracing comes with probes/trace points already build in to the code, however in most cases these probes are not compiled into PostgreSQL so you need explicitly tell your configure script you want to have these probes compiled with PostgreSQL. To compile PostgreSQL for Systemtap you need to give the configure script the --enable-dtrace flag.

Before you start compiling your PostgreSQL deployment, you must make sure that you have the development tools installed as well as a set of header files and systemtap dtrace compatibility packages

dnf -y group install "Development Tools"
dnf -y install wget libtermcap-devel readline-devel systemtap-sdt-devel systemd-devel

Then next step would be to download the PostgreSQL as normal, in this case I am downloading 14.4

wget https://ftp.postgresql.org/pub/source/v14.4/postgresql-14.4.tar.gz
tar -xvf postgresql-14.4.tar.gz
cd postgresql-14.4/

Now we have to source code we can start compiling, besides the already mentioned I am configuring PostgreSQL with some additional flags as personal preference and for use of use. --enable-debug compiles PostgreSQL with all debugging symbols to make life easier if you want attache a debugger at some point. The --enable-profiling flag is enabled just in case I want to do some profiling. Enabling the –with-systemd does the obvious thing, personal preference to let PostgreSQL be controlled through systemd.

./configure --enable-dtrace --enable-debug --enable-profiling --with-systemd
make
make install

After this we have our postgres binaries ready and we can continue configuring PostgreSQL as you normally would do by creating a postgres user, creating a postgres systemd service, create your initDB etc.

To test if your PostgreSQL now has the probes configured you can run a quick test. Open one session as root and run the systemtap online below and make sure it points to your postgres binary and keep this session open. In a second session run a simple query, eg. a select from pg_stat_activity. After this query is done you should see a backtrace in your root session. You can cancel out of this stap command now if you want.

[root@postgres-host ~]# stap --ldd -ve 'probe process("/usr/local/pgsql/bin/postgres").mark("transaction__start") { printf ("PID: %d, CPU: %d, BACKTRACE^ %s\n", pid(), cpu(), print_ubacktrace()) }'
Pass 1: parsed user script and 480 library scripts using 295156virt/95944res/17692shr/79972data kb, in 150usr/20sys/171real ms.
Pass 2: analyzed script: 1 probe, 3 functions, 0 embeds, 0 globals using 296740virt/98392res/18544shr/81556data kb, in 10usr/0sys/6real ms.
Pass 3: using cached /root/.systemtap/cache/69/stap_69a99e95568af9ea57019817040be984_2398.c
Pass 4: using cached /root/.systemtap/cache/69/stap_69a99e95568af9ea57019817040be984_2398.ko
Pass 5: starting run.
 0x574bb0 : StartTransaction+0x1a0/0x2d0 [/usr/local/pgsql/bin/postgres]
 0x576025 : StartTransactionCommand+0x55/0x60 [/usr/local/pgsql/bin/postgres]
 0x7d7255 : start_xact_command+0x95/0xc0 [/usr/local/pgsql/bin/postgres]
 0x7d9030 : exec_simple_query+0x50/0x620 [/usr/local/pgsql/bin/postgres]
 0x7daaad : PostgresMain+0x14ad/0x2320 [/usr/local/pgsql/bin/postgres]
 0x75d40a : ServerLoop+0xd9a/0xfa0 [/usr/local/pgsql/bin/postgres]
 0x75e1a4 : PostmasterMain+0xb14/0xd80 [/usr/local/pgsql/bin/postgres]
 0x4ec195 : main+0x445/0x4b0 [/usr/local/pgsql/bin/postgres]
 0x7feef3ea7cf3 : __libc_start_main+0xf3/0x1c0 [/usr/lib64/libc-2.28.so]
 0x4ec22e : _start+0x2e/0x30 [/usr/local/pgsql/bin/postgres]
PID: 5755, CPU: 3, BACKTRACE^
^CPass 5: run completed in 20usr/40sys/8911real ms.

This systemtap oneliner dumps a user backtrace for all sessions, in this particular sampling window we only captured my PostgreSQL session. We can verify it is the same, the statement that I did in the other session that triggered this backtrace tells us the PID of that session:

postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
           5755
(1 row)

The PID that pg_backend_pid() gives is 5755, which is the same as what systemtap gives. The simple one-liner marks a static probe in the PostgreSQL, in this case its marks the transaction_start call in PostgreSQL. We then print the backtrace for the stack for the process with print_ubacktrace function.

We can expand a bit on this by adding variables, operators such as counts so make systemtap a bit more flexible in its use cases. Have a look at the following simple systemtap script:

global count_calls
global calls_pids
global hist


probe process("/usr/local/pgsql/bin/postgres").function(@1)

  { 
    count_calls[pid()]++
  }

probe end

{ 
  printf("\n")
  foreach (count in count_calls)
    printf( "PID: %d - Call count: %d\n", count, count_calls[count] ) 
}

This scripts probes a user-space function in the PostgreSQL process. The function that we are calling is @1 which is means we are expecting a command line argument here. Sytemtap makes a difference between string arguments and integer arguments. A string argument is being revered to with a @, and integer needs a $. The script then continues by counting the times the function name that was given through the command line argument in variable count_calls with the PID that executed that function.

}[root@postgres-host vagrant]# stap test.stp StartTransaction
^C
PID: 6733 - Call count: 3
[root@postgres-host vagrant]# 

These are just some of the basic things you can with Systemtap to monitor and trace your PostgreSQL environment. These scripts can be expanded with additional build-in functions or even expand with embedded your custom C functions into Systemtap if you run Systemtap in guru (-g) mode but that is for another blogpost. Have fun playing around with Systemtap and PostgreSQL.

Advertisement

PostgreSQL TOAST tables and referencing rows

When inserting data into a PostgreSQL table, data will be inserted into that row as long as it fits inside the fixed page size. In most cases this will be an 8KB page size but may differ per deployment.

If your chunk of data doesn’t fit into a page buffer Postgres will first try to compress this data using the algorithm from default_toast_compression. This compression kicks in when a row exceeds the TOAST_TUPPLE_THRESHOLD until it hits the limit set in TOAST_TUPLE_TARGET (which normally is the same value). If compression can’t keep the data within the TOAST_TUPLE_TARGET it will move the data out-of-line

Data for that row will then be moved to a so called TOAST (The Oversized-Attribute Storage Technique) table. A table that needs to store out-of-line TOAST data will have its own unique TOAST table. The TOAST table will be in the pg_toast schema with the following naming convention: pg_toast_<oid of table>. Lets dive into this a bit deeper to find out how this works and start with creating test table T1 with data that will definitely be bigger then the TOAST_TUPPLE_TAGET:

test=# create table t1 (toast text);
CREATE TABLE
test=#insert into t1 select (SELECT string_agg(substring('0123456789bcdfghjkmnpqrstvwxyz', round(random() * 30)::integer, 1), '') FROM generate_series(1, 100000)) FROM generate_series(1, 100);
INSERT 0 100

Now we have our T1 test table, we can check to see if this created our T1 TOAST table. the pg_class table has reltoastrelid column that we can use to verify if there is a TOAST table for T1 and what the name is of the TOAST table:

test=# select oid, reltoastrelid::regclass from pg_class where relname='t1';
  oid  |      reltoastrelid
-------+-------------------------
 32975 | pg_toast.pg_toast_32975
(1 row)

Great, we have our TOAST table with the OID of the T1 table, lets have a quick look on what is in there. TOAST tables have 3 columns: chunk_id, chunk_seq and chunk_data. The last column contains the actual data for the out-of-line data. Chunk_id is the OID that uniquely identifies TOASTed data and chunk_seq is a sequence number for all chunks belonging to the same chunk_id. Lets get some information about data in these columns to find out how Postgres actually handles out-of-line data in TOAST tables:

test=# select count(*) from pg_toast.pg_toast_32975;
 count
-------
  5000
(1 row)

test=# select count(distinct chunk_id) from pg_toast.pg_toast_32975;
 count
-------
   100
(1 row)

test=# select max(chunk_seq) from pg_toast.pg_toast_32975;
 max
-----
  49
(1 row)

For the T1 table there are 5000 rows with of data stored in this TOAST table. There are 100 unique chunk_id which corresponds to the 100 rows of data we generated for the T1 test table. We have 50 chunk segments for a chunk_id (chunk_seq starts at 0) for every chunk_id. We can say this because we know that all of our 100 rows have the exact same size (which we generated when we created the T1 table).

In the case of table T1 it easy to find what row in t1 correlates to a row in the TOAST table. We know how the data was inserted, all rows have data in the pg_toast table. But what to do when the data for rows are variable in length, which row then is what chunk_id in the pg_toast table. For this we need to look at table T1 and figure out what row is pointing to a specific chunk_id. This data is stored in the line pointers of the heap page. We can get this information with the pageinspect module:

test=# create extension pageinspect;
CREATE EXTENSION
test=# \dx+ pageinspect
                Objects in extension "pageinspect"
                        Object description
-------------------------------------------------------------------
 function brin_metapage_info(bytea)
 function brin_page_items(bytea,regclass)
 function brin_page_type(bytea)
 function brin_revmap_data(bytea)
 function bt_metap(text)
 function bt_page_items(bytea)
 function bt_page_items(text,bigint)
 function bt_page_stats(text,bigint)
 function fsm_page_contents(bytea)
 function get_raw_page(text,bigint)
 function get_raw_page(text,text,bigint)
 function gin_leafpage_items(bytea)
 function gin_metapage_info(bytea)
 function gin_page_opaque_info(bytea)
 function gist_page_items_bytea(bytea)
 function gist_page_items(bytea,regclass)
 function gist_page_opaque_info(bytea)
 function hash_bitmap_info(regclass,bigint)
 function hash_metapage_info(bytea)
 function hash_page_items(bytea)
 function hash_page_stats(bytea)
 function hash_page_type(bytea)
 function heap_page_item_attrs(bytea,regclass)
 function heap_page_item_attrs(bytea,regclass,boolean)
 function heap_page_items(bytea)
 function heap_tuple_infomask_flags(integer,integer)
 function page_checksum(bytea,bigint)
 function page_header(bytea)
 function tuple_data_split(oid,bytea,integer,integer,text)
 function tuple_data_split(oid,bytea,integer,integer,text,boolean)
(30 rows)

Now we have the tools to have a look at the line pointers for a row. The heap_page_items function returns a bunch of columns. For this specific query we don’t really need to use all the columns for finding the corresponding pg_toast row. lp, lp_off, lp_flags and lp_len definitions can be found in itemid.h. All other columns are described in htup_details.h and htup.h. For now we are only interested in the pointers for the first line pointer:

test=# select lp, t_data from heap_page_items (get_raw_page ('t1', 0)) where lp=1;
 lp |                 t_data
----+----------------------------------------
  1 | \x01122580010021800100d4800000d2800000
(1 row)

To make something out of this pointer we need to know how it is defined, the TOAST pointer structure definition can be found in postgres.h

/*
 * struct varatt_external is a traditional "TOAST pointer", that is, the
 * information needed to fetch a Datum stored out-of-line in a TOAST table.
 * The data is compressed if and only if the external size stored in
 * va_extinfo is less than va_rawsize - VARHDRSZ.
 *
 * This struct must not contain any padding, because we sometimes compare
 * these pointers using memcmp.
 *
 * Note that this information is stored unaligned within actual tuples, so
 * you need to memcpy from the tuple into a local struct variable before
 * you can look at these fields!  (The reason we use memcmp is to avoid
 * having to do that just to detect equality of two TOAST pointers...)
 */
typedef struct varatt_external
{
	int32		va_rawsize;		/* Original data size (includes header) */
	uint32		va_extinfo;		/* External saved size (without header) and
								 * compression method */
	Oid			va_valueid;		/* Unique ID of value within TOAST table */
	Oid			va_toastrelid;	/* RelID of TOAST table containing it */
}

The first 4 bytes are a reference to the toastrelid, meaning that d2 80 00 00 points to the TOAST table. Reversing this gives is 0x80d2 which 32978 in decimal. When we verify this, it is indeed the pointer to the T1 TOAST table:

test=# select reltoastrelid, reltoastrelid::regclass from pg_class where relname='t1';
 reltoastrelid |      reltoastrelid
---------------+-------------------------
         32978 | pg_toast.pg_toast_32975
(1 row)

The next 4 bytes are d4 80 00 00, reversed this is 0x80d4. In postgres.h this points to  va_valueid, the  Unique ID of value within TOAST table. 0x80d4 is 32980 in decimal , 32980 is the chunk_id in the toast table for T1:

test=# select chunk_id, count(chunk_seq) seq_count from pg_toast.pg_toast_32975 where chunk_id=32980 group by chunk_id;
 chunk_id | seq_count
----------+-----------
    32980 |        50
(1 row)

The remaining 2 bytes are for the pointer, the pointer for a TOAST row is described in postgres.h which consists of a header and a datum tag:

{
	uint8		va_header;		/* Always 0x80 or 0x01 */
	uint8		va_tag;			/* Type of datum */
	char		va_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-specific data */
} varattrib_1b_e;

To summarise, finding which row relates to what row in a toast table is possible. It does need some work and some additional extensions to be installed.

Docker and the VirtualBox host-only networks

This blogpost is mostly written for myself as a reminder on how I solved this annoying issue I had after upgrading VirtualBox in my MacBook Pro.

I have recently migrated VirtualBox from release 6.1.26 to release 6.1.30 on my MacBook Pro running MacOS 12.0.1. When trying to launch Docker (version 20.10.11) containers I started to get a bunch of errors. The error occurs in multiple of the Docker command line tools, for example in the docker-machine command. The output shows that docker-machine has issues reaching the host-only network of VirtualBox:

Starting "default"…
(default) Check network to re-create if needed…
(default) Creating a new host-only adapter produced an error: /usr/local/bin/VBoxManage hostonlyif create failed:
(default) 0%…
(default) Progress state: NS_ERROR_FAILURE
(default) VBoxManage: error: Failed to create the host-only adapter
(default) VBoxManage: error: VBoxNetAdpCtl: Error while adding new interface: failed to open /dev/vboxnetctl: No such file or directory
(default) VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component HostNetworkInterfaceWrap, interface IHostNetworkInterface
(default) VBoxManage: error: Context: "RTEXITCODE handleCreate(HandlerArg *)" at line 95 of file VBoxManageHostonly.cpp
(default)
(default) This is a known VirtualBox bug. Let's try to recover anyway…
Error setting up host only network on machine start: The host-only adapter we just created is not visible. This is a well known VirtualBox bug. You might want to uninstall it and reinstall at least version 5.0.12 that is is supposed to fix this issue

This is not the first time I have had network issues similar to this one. The fix in these cases was simple, remove all the host-only networks and recreate your docker machine. Unfortunately this time around, that fix didn’t work, we are still getting the issue with the host-only network after we removed all existing host-only networks:

docker-machine create --driver virtualbox --virtualbox-memory "2048" default
Running pre-create checks…
Creating machine…
(default) Copying /Users/klaas-jan.jongsma/.docker/machine/cache/boot2docker.iso to /Users/klaas-jan.jongsma/.docker/machine/machines/default/boot2docker.iso…
(default) Creating VirtualBox VM…
(default) Creating SSH key…
(default) Starting the VM…
(default) Check network to re-create if needed…
(default) Found a new host-only adapter: "vboxnet0"
Error creating machine: Error in driver during machine creation: Error setting up host only network on machine start: /usr/local/bin/VBoxManage hostonlyif ipconfig vboxnet0 --ip 192.168.99.1 --netmask 255.255.255.0 failed:
VBoxManage: error: Code E_ACCESSDENIED (0x80070005) - Access denied (extended info not available)
VBoxManage: error: Context: "EnableStaticIPConfig(Bstr(pszIp).raw(), Bstr(pszNetmask).raw())" at line 242 of file VBoxManageHostonly.cpp

After scanning through the VirtualBox documentation on host-only networks, you will see that for Solaris, Linux and MacOS the allowed ip range for host-only networks has changed. VirtualBox will now only accept IP addresses to be assigned in the 192.168.56.0/21 range. The errors above show that Docker is trying to create and assign a 192.168.99.1/24 address and mask.

There are now 2 obvious solutions, one would be changing the way how docker creates your machine so it fits in the “new” address space that VirtualBox now uses:

docker-machine create --driver virtualbox --virtualbox-memory "2048" --virtualbox-hostonly-cidr 192.168.56.1/21 default

We can also solve this at the other side of the problem, that is changing the behaviour of VirtualBox. In order to do this we need to create the file networks.conf in /etc/vbox. In the network.confs we can tell VirtualBox what networks we are allowing:

sudo mkdir /etc/vbox
sudo vi /etc/vbox/networks.conf

cat /etc/vbox/networks.conf
* 10.0.0.0/8 192.168.0.0/16
* 2001::/64

Exploring the iTerm2 Python API

A few months ago, iTerm2 released a new version that included a Python API. At first I didn’t pay much attention to it until I recently tried to configure sessions using Profiles, realising that this was way easier to do when I could actually script all of this. This reminded me that there is a Python API build in into iTerm2 which could just that.

The Python API is not enabled by default, you must enable it by going to preference > magic and check “Enable Python API” and agree with the warning it gives on what this API can do.

After you have enabled the Python API you can use the menu items under the script pulldown menu. The first time you are going to use the API, iTerm2 will download a Python Runtime environment for you. After start building your scripts, the iTerm2 documentation also includes a bunch of example scripts for inspiration. iTerm2 uses a virtualenv to separate all the scripts, this also means that any non-default libraries should be added to the virtualenv.

So lets get going, I often find myself in a situation where I am opening a new session, cd to a working dir, startup vagrant, ssh into the box, open a new session and ssh as a different user into that box. I want to automate that process so I don’t have to repeat those steps every time. Here is the complete script, we will break down the most important parts:

#!/usr/bin/env python3
import iterm2
import os, subprocess

# Variables
vagrantDir = '/my/path/to/my/vagrantfile'

# extend path to find vagrant and vbox binaries
my_path=os.environ.get('PATH')
my_path = '/usr/local/bin/:' + str(my_path)

# Check Vagrant status
def vagrant_stat():
    p = subprocess.check_output(['vagrant', 'status', '--machine-readable'], cwd=vagrantDir, env={"PATH": my_path})
    p1 = p.decode("utf-8")
    return p1

async def main(connection):
    app = await iterm2.async_get_app(connection)

    # Get the current Terminal window
    window = app.current_terminal_window

    # Make sure that we are in a iTerm2 window
    if window is not None:

	    # Create a new tab, split panes into 2 sessions, set profiles
	    tab = await window.async_create_tab()
	    split = await tab.current_session.async_split_pane(vertical=True)

	    # Startup vagrant box if it is not running
	    vagrant_action = vagrant_stat()
	    if (vagrant_action.find('poweroff') != -1):
	    	subprocess.call(['vagrant', 'up', '--machine-readable'], cwd=vagrantDir, env={"PATH": my_path})

	    # Change the prompt to the VagrantFile and SSH into the box
	    for session in tab.sessions:
	    	await session.async_send_text(f"cd {vagrantDir}\n")
	    	await session.async_send_text("vagrant ssh\n")

	    # Sudo to oracle on the left window:
	    await tab.sessions[0].async_send_text("sudo su - oracle\n")

	    # Sudo to root on the right window:
	    await tab.sessions[1].async_send_text("sudo -i\n")

    else:
        # You can view this message in the script console.
        print("No current window")

iterm2.run_until_complete(main)

The first few lines are importing some libraries, with iterm2 (available on PyPi) being the obvious one, setting some variables and making sure that python can find tools look vagrant and vboxmanage. Also a helper function if I need to check later on if the vagrant box is up-and-running.

async def main(connection):
    app = await iterm2.async_get_app(connection)

    # Get the current Terminal window
    window = app.current_terminal_window

Here we begin our main function, as you can see this uses this uses python’s asyncio libraries. Most of iTerm2’s functions are using async, which can be handy as we can continue even if we are still waiting for other steps to finish.

# Create a new tab, split panes into 2 sessions, set profiles
     tab = await window.async_create_tab()
     split = await tab.current_session.async_split_pane(vertical=True)

After we have defined our current iTerm2 window we can call the async_create_tab() function to create a new tab in the current window. We then will split that tab into 2 vertical session with async_split_pane()

 # Startup vagrant box if it is not running
	    vagrant_action = vagrant_stat()
	    if (vagrant_action.find('poweroff') != -1):
	    	subprocess.call(['vagrant', 'up', '--machine-readable'], cwd=vagrantDir, env={"PATH": my_path})

We then want to check if the vagrant box is actually up-and-running, if it is not then we issue a subprocess.call() to startup the box. I am using subprocess.call() here to make sure that python waits until the vagrant command is done.

# Change the prompt to the VagrantFile and SSH into the box
for session in tab.sessions:
    	await session.async_send_text(f"cd {vagrantDir}\n")
    	await session.async_send_text("vagrant ssh\n")

Vagrant is now up and we can now ssh into the box. By looping through all the sessions we can send text into the sessions as it was a user typing into the iTerm2 window.

# Sudo to oracle on the left window:
await tab.sessions[0].async_send_text("sudo su - oracle\n")

# Sudo to root on the right window:
await tab.sessions[1].async_send_text("sudo -i\n")

Finally I want to sudo the session in the vagrant box to oracle and the other session to root.

iterm2.run_until_complete(main)

Now are main function is defined, we can call the main function and let iTerm2 configure our tab, start the vagrant box, set the working dir, ssh into the box and sudo to oracle and root.

Of course this is a very simple example of what is possible with the Python API.pytho

Oracle stress testing using Apache JMeter

When looking for an Oracle stress test tool to do performance measurements, one of the first tools to popup is Domic Giles Swingbench tool. Now there is absolutely nothing wrong with Swingbench, it has helped me lots in the past but the last view years i have used Apache JMeter as a stress test tool for non-oracle data platforms. JMeter is extremely configurable stress testing platform that not only supports JDBC calls but a whole variety of other tests too like HTTP(S), Java, TCP, IMAP/POP, OS commands etc. It also has a whole range of build-in functions and variables that makes it easy to script and scale-out. JMeter also supports distributed testing, meaning that you have one command JMeter, controlling multiple other clients that can connect to your application.

JMeter exists of a couple of building blocks that together will make your stress test. All these building blocks can be used in a modulair fashion, making your stress test flexible in setup. At the root level there is the Test Plan, you can add one or more of the following components to this plan:

  • Thread Groups: Basically the beginning of a stress test, this is where you define how many threads (users) your stress will startup, the startup delay between the tests, how often it will loop through the tests etc.
  • Config Elements: This is where you define how samplers are configured. In our Oracle RDBMS example it is the place where we configure JDBC.
  • Samplers: The units that do the actual, in our case this is where we define what call we are going to make over JDBC.
  • Controllers: Allows us to control the flow on a test, for example the onlyonce controller lets us run a test only once instead of the loop number. Controllers also gives us the possibilities to randomness, loops etc.
  • Timers: Does what it appears to be doing, making delays in a loop before we go on to a new element
  • Assertions: Allow you to assert facts about responses received from the server being tested.
  • Pre/post processors: Gives you the possibility to prepare an environment and clean it up afterwards.
  • Listeners: Displays the data that you get back from a test. Listeners can be Graphing listeners, debugging, Summary Reports etc.

Installing JMeter is easy, go to the download page and download the most recent (version 5.1.1 at the time of writing). Unpack it, go to the /bin directory and start JMeter:

 klaasjan@exabook > ~/bin/apache-jmeter-5.1.1/lib > ../bin/jmeter
WARNING: package sun.awt.X11 not in java.desktop
================================================================================
Don't use GUI mode for load testing !, only for Test creation and Test debugging.
For load testing, use CLI Mode (was NON GUI):
   jmeter -n -t [jmx file] -l [results file] -e -o [Path to web report folder]
& increase Java Heap to meet your test requirements:
   Modify current env variable HEAP="-Xms1g -Xmx1g -XX:MaxMetaspaceSize=256m" in the jmeter batch file
Check : https://jmeter.apache.org/usermanual/best-practices.html
================================================================================
Warning: Nashorn engine is planned to be removed from a future JDK release

Now we have JMeter up-and-runnign and we can start building our custom Oracle stress test. In this example in am running JMeter on OSX and have a vagrant box with a single instance oracle database. Vagrant has port 1521 exposed to OSX so i can connect to Oracle at localhost:1521. We also need the Oracle JDBC driver jar file on our machine. I am using the Oracle Instant Client for this example.

First step is to tell our Test Plan where our Oracle JDBC jar file is located, this can be done at the bottom of the Test Plan screen bu extending the class path of JMeter:

Add Oracle JDBC Jar to JMeter class path

Because all tests run under at thread group, this is the first item we need to create.

I have configured mine to use 50 users, with a wait of 10 seconds after each session starts. I am looping through the test a 100 times.

Now we have our threads configured we need to tell JMeter what and how we are going to test. In our case it will be over a JDBC connection, so we first need to a Config Element that configures our JDBC connection

On the JDBC configuration page we need to configure a few things, for new set “Variable Name for created pool” to something like “oracle_db”. This is a variable that we need reference later on so we know what our JDBC configuration is. At the bottom make sure that you fill in your JDBC connection URL to your database and set the JDBC driver class to oracle.jdbc.OracleDriver. Finally set the username/password you want to use.

The last step is add a JDBC sampler by right clicking on the Thread Group and go to add -> Sampler -> JDBC Requests. At the line “Variable Name of Pool declared in JDBC Connection Configuration”, fill in the variable name that you have given it, in my example oracle_db. Leave the query type as it is for now and add a simple query, something like select count(*) from dual. Don’t add a trailing semi-colon add the end of the query, that is not needed and if you do, your tests will fail.

Now finally we need to add a listener so we can actually see some results. Do this by right clicking on Test Plan, then add -> listener -> Summary Report.

We can now run this very basic JMeter test. Now click on the play button and wait for the results to come in.

Now doing a very simple test as this might not be very interesting but as soon as you start expanding your tests it becomes more interesting. When making a bigger JMeter test like this, which is a bit more complex you can see the features and power of JMeter. You can download the linked Gist file and open it in JMeter to watch it. This test will do the following steps:

  • Create a tables jmeter_1 until jmeter_n, where n is the amount of threads configured.
  • Each thread will then loop through its own table doing
    • an insert append of 1000 records into that table
    • a select count(*) on that table
  • Jmeter then will delete all the jmeter_n tables.
  • Create graphs and reports of your test

For setting up, removing my test tables i am using the setup and teardown Thread groups in this example. I also want to have the number of threads the same in all thread groups so i have all the tables for my test and are also removing them. I have created a variable in JMeter so i can set the number of threads in all Thread groups. Under TestPlan.user_defined_variables you can see all configured variables for this test.

  <elementProp name="threads" elementType="Argument">
    <stringProp name="Argument.name">threads</stringProp>
    <stringProp name="Argument.value">${__P(threads,1)}</stringProp>
    <stringProp name="Argument.metadata">=</stringProp>
  </elementProp>

At line 2 there is a variable defined called threads, i am going to use this to reference this in the configuration of all thread groups. You can see on line 3 that this variable has a value called ${__P(threads,1)}, this means that the thread variable is defined by a build in function __P. With this function i can define the variable of threads through the command line. It has 2 parameters, the first one being the name referenced from the command line. In this example we can set it with -Jthreads=N. The 2nd variable is the default value, if it is not being called from the command line, use 1 as a default. I can now set the number of threads with this variable in every thread group by giving it the value ${threads}

<stringProp name="ThreadGroup.num_threads">${threads}</stringProp>

We have now set the number of threads for creating and removing the tables as well as running the tests the same over all thread groups. Next step is to make sure that every thread is using its own table. JMeter has a build-in variable that keeps track of running thread, we can use that in our create table statement:

<stringProp name="ThreadGroup.num_threads">${threads}</stringProp>

By referencing ${__threadNum} in the create table statement, JMeter will create a table called jmeter_n where n=threadNum. If we set ${threads} to 50 then every thread will create its own table. The teardown thread will do the same after the test is done but then with a drop table statement.

We can then use the same ${__threadNum} variable in our test code, for example in a PL/SQL block

<JDBCSampler guiclass="TestBeanGUI" testclass="JDBCSampler" testname="JDBC: Fill table" enabled="true">
  <stringProp name="dataSource">oracle_db</stringProp>
  <stringProp name="query">declare
sql_stmt    varchar(200);
begin
for i in 1..1000 loop
sql_stmt := &apos;insert /*+ append_values */ into jmeter_${__threadNum} select lpad( &apos;&apos;x&apos;&apos;,&apos;||i||&apos;,&apos;&apos;x&apos;&apos;) from dual&apos;;
execute immediate sql_stmt;
commit;
end loop;
end;</stringProp>
  <stringProp name="queryArguments"></stringProp>
  <stringProp name="queryArgumentsTypes"></stringProp>
  <stringProp name="queryTimeout"></stringProp>
  <stringProp name="queryType">Callable Statement</stringProp>
  <stringProp name="resultSetHandler">Store as String</stringProp>
  <stringProp name="resultVariable"></stringProp>
  <stringProp name="variableNames"></stringProp>
</JDBCSampler>

Every thread will stay in its own table for the inserts as well as the selects. Also, at line 15 in the fragment above, we must set the queryType to “Callable Statement” for PL/SQL to function properly.

We can now run this test from the command line:

../bin/apache-jmeter-5.1.1/bin/jmeter -n -t db_test.jmx -o reports -l results.txt -Jthreads=50 -Jloop=10

We have now started a test that will span 50 threads, looping through the test 10 times, write the results of the tests to results.txt and create a HTML based report in the reports directory for further analysis.

A GoldenGate profile & environment script

I have been working a lot with Oracle’s GoldenGate the last year or so and i am loving the product more and more. I really enjoy it’s modulair format where you can actually form it into exactly the thing that you need. However one thing about working with GoldenGate is something that always annoyed me a bit: There is no good way of finding out where the binaries are if the GoldenGate processes are not running. Coming from an Oracle database background, i am used to an oratab file or a cluster registry to see where a database is running (or not running).

I often have to logon to a server with GoldenGate installed on it, i forget where it is running and find myself doing something like a ps -ef | grep mgr to see what the $OGG_HOME is. It becomes a little bit more of a challenge when all processes from GoldenGate are down and you can’t really remember the $OGG_HOME anymore. One option would be to take a look into the inventory.xml:

<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.3.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraGI12Home1" LOC="/u01/app/12.1.0.2/grid" TYPE="O" IDX="1">
   <NODE_LIST>
      <NODE NAME="rac1"/>
      <NODE NAME="rac2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraDB12Home1" LOC="/u01/app/oracle/product/12.1.0.2/dbhome_1" TYPE="O" IDX="2">
   <NODE_LIST>
      <NODE NAME="rac1"/>
      <NODE NAME="rac2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraHome1" LOC="/u01/app/oracle/product/12.1.2/oggcore_12c" TYPE="O" IDX="3"/>
</HOME_LIST>
<COMPOSITEHOME_LIST>
</COMPOSITEHOME_LIST>
</INVENTORY>

In the example above it is obvious that the GoldenGate software was installed in the 3rd home by looking at the name of the directory, there is of course nothing that stops someone from installing GoldenGate in an a completely anonymous path. The Oracle Inventory does not specify what type of software is installed in a specific home, it just records that a home is installed in that location.

So i decided to create a small script that keeps track of your GoldenGate environment on a server. After setting up this script you will see the following when you have logged on to your server:

=Alias MgrUp OGG_HOME
ogg1 No /u01/app/oracle/product/12.1.2/oggcore_11g
ogg2 No /u01/app/oracle/product/12.1.2/oggcore_12c

The script has created 2 aliases ogg1 and ogg2, each alias will setup the GoldenGate environment for that specific home. The setup of the environment consists of a couple things:

  • It sets $OGG_HOME to the home that was just selected
  • Expands $PATH with $OGG_HOME so you can run the tools directly
  • Does an check if it can find the shared libraries of the Oracle RDBMS and immediately warns if no RDBMS home has been set. It specifically checks if the libnnz*.so libraries (GoldenGate needs this for cipher/wallet support) are there.
  • Setting up a bunch of handy aliases:
    • ggstart: Starts the MGR process
    • gg-startall: Starts all processes that are configured
    • ggstop: Stops the MGR process
    • gg-stopall: Stops all processes that are configured
    • gginfo: Shows the output of info all from ggsci
    • ggversion: Shows some version information of the system
    • ggcom: Generic wrapper for ggsci, handy for scripting. Usage is like this: ggcom “info all”
    • ggreprofile: Reloads the oggprofile script for you add an additional home or made some modifications to the script.
    • rgg: Wraps ggsci with rlwrap so you have command line history in ggsci (please oracle update ggsci)
    • rlogdump: rlwrap alias for logdump, same as with rgg.
    • gglog: Opens the Golden Gate logfile for viewing
    • gghome: Goes to the $OGG_HOME

How does this script work? After it is copied to a directory on your server, change variable OGG_TOOLS_HOME in the top of the script to correct path:

### OGG Tools home location - CHANGE THIS ###
OGG_TOOLS_HOME=/u01/tools/ogg

Then add the full location of the script to the user profile that runs GoldenGate like this: . /u01/tools/ogg/oggprofile (note it is: point[space]{path]) and log out and in again. As soon as you have logged in again the script will check in the $OGG_TOOLS_HOME location if a file called oggtab exists, if it doesn’t exists it will create the file:

if ! [[ -f $OGG_TOOLS_HOME/oggtab ]] ; then touch $OGG_TOOLS_HOME/oggtab ; fi;

In the oggtab the profile script will keep track of which GoldenGate home it is has detected. It does this by comparing the contents of the oggtab with the running MGR processes. If it finds a new home it will add it to the oggtab:

V_OGG_LOC=`ps -ef | grep [\.]\/mgr | awk '{print $10}' | sed 's/\/dirprm\/mgr\.prm//'`

if ! [[ -z $V_OGG_LOC ]]; then 
	for OGG_HOME in $V_OGG_LOC;
	do
		grep -q -F  "$OGG_HOME" $OGG_TOOLS_HOME/oggtab || echo $OGG_HOME >> $OGG_TOOLS_HOME/oggtab
	done
fi;

Be aware that the script has now way of knowing if a home is deleted, if you remove an $OGG_HOME you must also manually remove the home from the oggtab file. With the oggtab file the script can loop through each of the homes and check if the MGR is up and build up the alias for the $OGG_HOME:

for TAB_HOME in `cat $OGG_TOOLS_HOME/oggtab`;
do
	if [[ " ${V_OGG_LOC[*]} " == *" $TAB_HOME "* ]]; then
		MGR_UP="Yes"
	else
		MGR_UP="No"
	fi
	
	printf " $(tput bold)%-16s$(tput sgr0) %-11s %-16s \n" $(echo ogg$HOME_COUNT) $(echo $MGR_UP) $(echo $TAB_HOME)
	alias ogg$HOME_COUNT="export OGG_HOME=$TAB_HOME; export PATH=$PATH:$OGG_HOME; export JAGENT_HOME=$V_JAGENT; echo \"GoldenGate home set to: $TAB_HOME\"; f_dbhome_set"
		if [ -n "$JAGENT_HOME" ]; then
			echo "Jagent location set to: $JAGENT_HOME"
		fi;
		((HOME_COUNT++))
done

The script prints out all the information and immediately shows you in the output (see the top of the post) with the alias name is, if the manager is up-and-running and what the path location is. Personally this scripts solves a bunch of small nuisances i have with GoldenGate and makes management a lot easier for me. I find myself making additional aliases for specific repeating tasks at customers with ggcom function quiet regularly.

You can find the GoldenGate environment script here

Can we trust the Exadata ISO image?

Recently i had to reimage a couple of Exadata compute nodes for different reasons like an almost simultaneous failure of 2 harddisks in the compute node.

After the compute got back online with a fresh image and restoring settings like cellip.ora, DNS, NTP and cellinit.ora we got to the point where we need to reinstall the Oracle components, this clusters has an 12.1.0.2 Grid Infrastructure, and 11.2.0.4. RDBMS home as well as an 12.1.0.2 home. Adding this new node back to the cluster and installing the new home should not provide much issue, the image versions of the compute nodes are the same:

[root@dm01dbadm01 ~]# dcli -g /root/dbs_group -l root imageinfo | grep “Image version: “
dm01dbadm01: Image version: 12.1.2.2.1.160119
dm01dbadm02: Image version: 12.1.2.2.1.160119
dm01dbadm03: Image version: 12.1.2.2.1.160119
dm01dbadm04: Image version: 12.1.2.2.1.160119
dm01dbadm05: Image version: 12.1.2.2.1.160119
dm01dbadm06: Image version: 12.1.2.2.1.160119
dm01dbadm07: Image version: 12.1.2.2.1.160119
dm01dbadm08: Image version: 12.1.2.2.1.160119
[root@dm01dbadm01 ~]#

Let’s run the cluster verification utility and see if there are indeed no issues:

[oracle@dm01dbadm01 [+ASM1] ~]$ cluvfy comp peer -refnode dm01dbadm02 -n dm01dbadm02 -orainv oinstall -osdba dba | grep -B 3 -A 2 mismatched
Compatibility check: Package existence for "glibc (x86_64)" [reference node: dm04dbadm02]
Node Name Status Ref. node status Comment
------------ ------------------------ ------------------------ ----------
dm01dbadm02 glibc-2.12-1.166.el6_7.3 (x86_64) glibc-2.12-1.166.el6_7.7 (x86_64) mismatched
Package existence for "glibc (x86_64)" check failed

Compatibility check: Package existence for "glibc-devel (x86_64)" [reference node: dm04dbadm02]
Node Name Status Ref. node status Comment
------------ ------------------------ ------------------------ ----------
dm01dbadm02 glibc-devel-2.12-1.166.el6_7.3 (x86_64) glibc-devel-2.12-1.166.el6_7.7 (x86_64) mismatched
Package existence for "glibc-devel (x86_64)" check failed

That is a bit of a nasty surprise, cluvfy has found some mismatches between the glibc versions between our reference node and our freshly imaged node. It looks like our re-imaged node has a lower release of glibc installed then the rest of the cluster:

[root@dm01dbadm01 ~]# dcli -g /root/dbs_group -l root "yum info glibc | grep ^Release"
dm01dbadm01: Release : 1.166.el6_7.7
dm01dbadm01: Release : 1.192.el6
dm01dbadm01: Release : 1.192.el6
dm01dbadm02: Release : 1.166.el6_7.3
dm01dbadm03: Release : 1.166.el6_7.7
dm01dbadm03: Release : 1.192.el6
dm01dbadm03: Release : 1.192.el6
dm01dbadm04: Release : 1.166.el6_7.7
dm01dbadm04: Release : 1.192.el6
dm01dbadm04: Release : 1.192.el6
dm01dbadm05: Release : 1.166.el6_7.7
dm01dbadm05: Release : 1.192.el6
dm01dbadm05: Release : 1.192.el6
dm01dbadm06: Release : 1.166.el6_7.7
dm01dbadm06: Release : 1.192.el6
dm01dbadm06: Release : 1.192.el6
dm01dbadm07: Release : 1.166.el6_7.7
dm01dbadm07: Release : 1.192.el6
dm01dbadm07: Release : 1.192.el6
dm01dbadm08: Release : 1.166.el6_7.7
dm01dbadm08: Release : 1.192.el6
dm01dbadm08: Release : 1.192.el6

[root@dm01dbadm01 ~]# dcli -g /root/dbs_group -l root "yum info glibc | grep ^Version | head -1"
dm01dbadm01: Version : 2.12
dm01dbadm02: Version : 2.12
dm01dbadm03: Version : 2.12
dm01dbadm04: Version : 2.12
dm01dbadm05: Version : 2.12
dm01dbadm06: Version : 2.12
dm01dbadm07: Version : 2.12
dm01dbadm08: Version : 2.12
[root@dm01dbadm01 ~]#

The glibc release is lower on our new node then on every node in the rack. This customer has been using Exadata for a long time and it quiet experienced in maintaining them. We have a good insight in what additional software was installed and we are sure that no installed components have updated the glibc release to this higher version. The most logical explanation would that this was update during dbnodeupdate.sh patching. After some digging around we found out that the new version was indeed installed by dbnodeupdate.sh due to this glibc vulnerability (Oracle support account is needed):

glibc vulnerability (CVE-2015-7547) patch availability for Oracle Exadata Database Machine (Doc ID 2108582.1)

After manually installing the updated glibc, clufvy was happy and we managed to install Grid Infrastructure again. Unfortunately our issues didn’t stop there, after successfully installing GI and RDBMS 11.2.0.4 we went tried installing the 12c RDBMS home:

[oracle@dm01dbadm01 [oracle] addnode]$ ./addnode.sh -silent -waitforcompletion "CLUSTER_NEW_NODES={dm01dbadm02}"
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB. Actual 13549 MB Passed
Checking swap space: must be greater than 150 MB. Actual 24176 MB Passed
[FATAL] [INS-30160] Installer has detected that the nodes [dm01dbadm02] specified for addnode operation have uncleaned inventory.
ACTION: Ensure that the inventory location /u01/app/oraInventory is cleaned before performing addnode procedure.
[oracle@dm01dbadm01 [oracle] addnode]$

Wait a minute, i am the only person working on this node and 15 minutes ago i managed to install an 11g home and now it is suddenly corrupted? A check confirmed that we did not have malformed XML in our inventory.xml or any non-printable ASCII characters in it. To completely rule out a corrupted XML we rebuild the Oracle Inventory, all without success.  Time for some debugging the OUI:

[oracle@dm01dbadm01 [oracle] addnode]$ ./addnode.sh -silent -waitforcompletion "CLUSTER_NEW_NODES={dm01dbadm02}" -debug -logLevel finest

Hidden somewhere in the output there are these few lines:

[main] [ 2017-02-08 11:33:17.523 CET ] [RuntimeExec.runCommand:207] runCommand: Waiting for the process
[Thread-158] [ 2017-02-08 11:33:17.523 CET ] [StreamReader.run:61] In StreamReader.run
[Thread-157] [ 2017-02-08 11:33:17.523 CET ] [StreamReader.run:61] In StreamReader.run
[Thread-158] [ 2017-02-08 11:33:17.601 CET ] [StreamReader.run:65] ERROR>bash: /usr/bin/ksh: No such file or directory
[main] [ 2017-02-08 11:33:17.602 CET ] [RuntimeExec.runCommand:209] runCommand: process returns 127
[main] [ 2017-02-08 11:33:17.603 CET ] [RuntimeExec.runCommand:226] RunTimeExec: output>
[main] [ 2017-02-08 11:33:17.603 CET ] [RuntimeExec.runCommand:235] RunTimeExec: error>
[main] [ 2017-02-08 11:33:17.603 CET ] [RuntimeExec.runCommand:238] bash: /usr/bin/ksh: No such file or directory
[main] [ 2017-02-08 11:33:17.604 CET ] [RuntimeExec.runCommand:257] Returning from RunTimeExec.runCommand
[main] [ 2017-02-08 11:33:17.604 CET ] [UnixSystem.pathExists:1013] pathExists..RuntimeExec returned 127

The OUI is trying to find the korn shell at /usr/bin/ksh and a double check on our new image confirms that the ISO gave us the korn shell in /bin/ksh:

[root@dm02dbadm02 ~]# ls -l /bin/ksh*
lrwxrwxrwx 1 root root 21 okt 30 02:35 /bin/ksh -> /etc/alternatives/ksh
-rwxr-xr-x 1 root root 1501800 sep 22 2015 /bin/ksh93
[root@dm02dbadm02 ~]# ls -l /etc/alternatives/ksh
lrwxrwxrwx 1 root root 10 okt 30 02:35 /etc/alternatives/ksh -> /bin/ksh93
[root@dm02dbadm02 ~]# ls -l /usr/bin/ksh
ls: cannot access /usr/bin/ksh: No such file or directory

It looks that the OUI for 12c is hard coded to look for ksh in /usr/bin, having ksh in /bin will result in the 12c OUI complaining about an issue with the inventory instead of throwing an error on not being able to locate ksh. After consulting with Oracle Support, we heard that:

After ol5 to ol6 migration, we need to ensure two things:
a. ol6’s ksh rpm package is installed. And /bin/ksh exists.
b. /usr/bin/ksh exists. Otherwise, create a softlink to /bin/ksh.

Support is referring to an issue that appeared when Exadata moved from OEL5 to OEL6 which is quiet some while ago. It is surprising that a fresh image has this bug making it impossible to install 12c without fixing the ksh location. The easy fix is to make a softlink to ksh in /usr/bin:

[root@dm01dbadm02 ~]# ln -s /bin/ksh /usr/bin/ksh
[root@dm01dbadm02 ~]# ls -l /usr/bin/ksh
lrwxrwxrwx 1 root root 8 feb 8 18:37 /usr/bin/ksh -> /bin/ksh

Et voila… our addnode.sh works and we have our 12c home back. What worries me is that we can’t trust the ISO image of an Exadata release to be up-to-date with what is being delivered by the regular patch cycles. Both the CVE issue with glibc and the ksh issue have already been patched.

So please Oracle make the Exadata ISO great again.

Golden Gate Activity Logging Tracing

A lot of customers of mine who are starting with Oracle Golden Gate have issues when it comes to troubleshooting Golden Gate. It is the most common question i get: “I have setup Golden Gate, it was working fine but now process X abandoned… now what do i do?”.  The OGG error log is a good starting point but is not always showing the real issue. There are a lot different way to obtain more information on why you are running into an issue. In this blogpost i want to introduce a maybe not so well known feature called Golden Gate Activity Logging Tracing and yes that is how they are really calling this feature.

The most obvious way to enable tracing are the TRACE(1)/TRACE2 commands which are fairly well documented in OGG documentation. You can either send the command directly to a running process like this:

GGSCI (rac1) 3> send extract EXT1 trace /tmp/trace_me.trc

Sending trace request to EXTRACT EXT1 ...
Trace file /tmp/trace_me.trc opened.

If we then do an insert into a table that is being monitored by this Extract process will write the following piece of data in /tmp/trace_me.trc:

12:20:26.259 (397290) * --- entering DataSource::readLCR() --- *
12:20:28.161 (399192) Extract pointer info reused [0], [file 0][src 0][tgt 0][tab PDB1.PDB1ADMIN.T1]. File [PDB1.PDB1ADMIN.T1], targ_idx [4294967295], object type [2], num_extracts [1]
12:20:28.163 (399194) exited DataSource::readLCR (stat=0, seqno=0, rba=0 io_type=5)
12:20:28.163 (399194) processing record for PDB1.PDB1ADMIN.T1
12:20:28.163 (399194) Writing seqno [21], rba [1465], table [PDB1.PDB1ADMIN.T1]
12:20:28.164 (399195) * --- entering DataSource::readLCR() --- *

The TRACE parameter shows what the extract is reading from the database and writing into the OGG trail file. This gives us some useful information we could use for troubleshooting OGG. To disable tracing from our extract we send it the TRACE OFF command:

GGSCI (rac1) 5> send extract EXT1 trace off

Sending trace request to EXTRACT EXT1 ...
Closing all trace files..

Keep in mind though that on a busy system with lots DML tracing can generate a huge amount of data. Besides sending the TRACE/TRACE2 command to a process you can add it to a parameter file directly to start tracing as soon as the the extract or replicat has finished parsing the parameter file. The actual tracing output from TRACE is limited, the main purpose is mainly to see where Golden Gate is spending most of its time on:

SUMMARY STATISTICS

General statistics:
0.00% Checking messages (includes checkpointing)
0.00% Checking periodic tasks
0.00% Waiting for more data
0.00% Converting ASCII data to internal
10.45% Reading input records
0.00% Writing output records (replicate_io)
0.00% Mapping columns
0.00% Outputting data records
0.00% Performing SQL statements
0.00% Executing BATCHSQL statements
0.00% Executing SQL statements
0.00% Preparing SQL statements
0.00% Commit operations
0.00% Rollback operations
0.00% Checkpointing

The difference between TRACE and TRACE2 is minimal, it showing you how long your process in spending in de different code segments:

17:00:07.961 (2814715) * --- entering DataSource::readLCR() --- *
17:00:07.961 (2814715) Reading input records 10.357% (execute=291.531,total=2814.713,count=15230)
17:00:07.961 (2814715) Checking messages (includes checkpointing) 0.005% (execute=0.143,total=2814.713,count=15230)

If you want tracing in Golden Gate you should look elsewhere, in the past Golden Gate had features such as tltrace and traceini but they are deprecated or being deprecated. Golden Gate has introduced Activity Logging Tracing in release 11.1.1.0.0. This feature has been out for a long while but is only documented in DocID 1204284.1 and can be very handy at some points. Activity Logging Tracing is enabled by default in OGG kernel but must be enabled which it can do on 2 ways. Activity Logging Tracing can generate a dump file as soon as one of your OGG binaries  abends. The other way to enable Activity Logging Tracing is by placing an XML file in $OGG_HOME that has this syntax: gglog-<process_name>.xml where process_name is can by the name of the group you want trace or the name of the binary. Doc ID 1204284.1 is not very clear about this but you can actuall trace almost every OGG binary including ggsci and server. Some examples to clarify how to name your Activity Logging Tracing xml file:

  • If you want to trace your extract EXT1 name the file gglog-EXT1.xml
  • If you want to trace an replicat called REP1 call it gglog-REP1.xml
  • For tracing every extract name it gglog-extract.xml
  • Similar for tracing ggsci call it gglog-ggsci.xml
  • If you want to trace everything name the file gglog.xml however this will generate a huge, huge, huge amount of data.

As soon as you place your xml file in $OGG_HOME Golden Gate will start tracing and depending on your XML file  contents these files can become very big very quickly. Activity Logging Tracing will generate a file gglog__.log, so tracing extract EXT1 will generate the file gglog-EXT1_extract.log and tracing ggsci will generate gglog-ggsci_ggsci.log. This naming behaviour can be changed in the xml-file if you want the trace files to be generated on eg. a different filesystem:

<param name="File" value="gglog-%I_%A.log"/>

The default logging format is OGG is log4j and the documentation gives some clue on the general working of this XML file. When a process sees the XML files it calls the functions form libgglog.so and starts processing whatever is configured in the XML file. Doc ID 1204284.1 gives you some hints on what can by used for tracing. The example file gglog-full.xml shows that with only adding this:

<level value="all"/>

Golden Gate will immediately starts tracing everything, but be warned this will output a lot of trace information. Personally i tend to use de gglog-debug.xml as a starting off point for Golden Gate tracing as it already gives some more fine grained control. As an example, the values below gives you a fine grained-control on the redo generation tracing:

<logger name="er.redo.ora">
<level value="all"/>
</logger>
<logger name="er.redo.ora.data">
<level value="all"/>
</logger>
<logger name="er.redo.ora.rtc">
<level value="off"/>
</logger>
<logger name="er.redo.ora.rtcfm">
<level value="off"/>
</logger>
<logger name="er.redo.ora.thread">
<level value="all"/>
</logger>
<logger name="er.redo.ora.thread.checkpoint">
<level value="off"/>
</logger>
<logger name="er.redo.ora.thread.blocks">
<level value="off"/>
</logger>
<logger name="er.redo.ora.thread.logswitch">
<level value="all"/>
</logger>
<logger name="er.redo.ora.thread.unpack">
<level value="all"/>

Enabling all of them enables you to trace redo information in realtime. It dumps data in an output something similar to this:

IXAsyncTrans 2907 oracle/ltwtcapture.c | >>>>>> lcrcnt=1031 received lcr type Internal ROW
IXAsyncTrans 1254 oracle/ltwtcapture.c | kngl_flag 0x00 kngl_xflag 0x0000 krvx_dflag 0x0002
IXAsyncTrans 1079 oracle/ltwtcapture.c | lcrvsn: [5] unpacked scn: [0x0000000000477cee]
IXAsyncTrans 1848 oracle/ltwtcapture.c | Ignore opcode 1 (orig op: 7)
IXAsyncTrans 2950 oracle/ltwtcapture.c | received commit, return
IXAsyncTrans 455 oracle/IXAsyncTrans.cpp | Queueing committed record with tid '2.30.1940'
IXAsyncTrans 580 oracle/IXAsyncTrans.cpp | Queueing result (DataBuffer,452,0x0000000004456b80)
IXAsyncTrans 2115 oracle/redooraix.c | Try to get record from logmining server.
IXAsyncTrans 2968 oracle/ltwtcapture.c | end of lcr buffer, return nodata
IXAsyncTrans 2973 oracle/ltwtcapture.c | Release lcr buffer
IXAsyncTrans 2151 oracle/redooraix.c | no lcr returned
IXAsyncTrans 2115 oracle/redooraix.c | Try to get record from logmining server.
main 325 oracle/IXAsyncTrans.cpp | Data buffer 0x0000000004456b80 returned for 0x1c4 bytes
main 10848 oracle/redoora.c | Processing next transaction with xid '2.30.1940'
main 10626 oracle/redoora.c | Processing first record for committed transaction 2.30.1940, scn 4685038
main 4530 oracle/redooraix.c | Try to get committed lcr from COM 0x00000000031f5750 with TID 0002.01e.00000794
main 1705 oracle/redooraix.c | maintain_rc_count(-1), xid=0x0002.01e.00000794 op=INSERT lcr_flag=0x8 xflag=0x0 row_cnt=0 chunk_cnt=0 txn_flag=0x0 groupRBA=0 groupthread=0
main 11032 oracle/redooraix.c | retrieved item# 1 with op 2 (INSERT), still have 0 row, 0 chunk in COM, source COM
main 5006 oracle/redooraix.c | Retrieved item# 1 from COM with TID 0x0002.01e.00000794, skip 0, more_record 0, tind 3
main 5015 oracle/redooraix.c | Exhausted COM, no more items, delete tran now.
main 10751 oracle/redoora.c | first record from commit txn with scn 4685038, seqno 53, rba 32348688, thread_id 1, jts_time_stamp 212351410734000000
main 7753 oracle/redoora.c | FORMAT XOUT LCR:
main 1638 oracle/redooraix.c | === NEW column list (num_columns=1) ===
main 1650 oracle/redooraix.c | Col[1](id,dty,len,uid,sid,iid,flag,flg,flg2,flg3,type): 0 0 1 0 1 1 0 0 0 0 1
main 1664 oracle/redooraix.c | 000000: 58 |X |
main 1638 oracle/redooraix.c | === OLD column list (num_columns=0) ===
main 6837 oracle/redooraix.c | XOUT_INSERT: format insert
main 5566 oracle/redooraix.c | Format col[1] NEW (id,dty,len,key,flag,flg3,type): 0 1 1 1 0x0 0x0 1
main 6396 oracle/redooraix.c | AFC/CHR after format value: 000000: 58 |X |
main 7010 oracle/redooraix.c | After format lcr, len 9
main 7012 oracle/redooraix.c | 000000: 00 00 05 00 00 00 01 00 58 |........X |
main 5388 oracle/redooraix.c | After format record at 53 - 32346988, get trail record with len 9
main 5389 oracle/redooraix.c | 000000: 00 00 05 00 00 00 01 00 58 |........X |
main 349 oracle/IXAsyncTrans.cpp | Data buffer 0x0000000004456b80 released
main 567 oracle/IXAsyncTrans.cpp | Queueing free buffer 0x0000000004456b80
IXAsyncTrans 2863 oracle/ltwtcapture.c | Receive 336 bytes from LCRCaptureReader
IXAsyncTrans 2907 oracle/ltwtcapture.c | >>>>>> lcrcnt=1032 received lcr type Internal ROW

There are more loggers then mentioned by Oracle in Doc ID 1204284.1 and as far i know they are not publicly available. Some of the loggers are quiet easy to figure out what they are doing like tracing DDL in a extract process for example:

<logger name="gglog.ddl.std">
<level value="all"/>

But mostly it is just guess work and/or a lot of testing to see what they actually do (and what they are). In general, you don’t really need them, in most cases the files with loggers as they are enough to troubleshoot most of the issues.

Creating a RAC cluster using Ansible (part 2)

In my previous 2 blogposts i explained how to setup a Vagrantfile that deploys 2 virtualbox machines that can be used as a basis for a RAC cluster and how you could use Ansible to deploy RAC on those VM’s. In this post i want to dive a bit deeper into how i setup Ansible and why, keep in mind that this just one way of doing this.

The Github repository containing all the files can be found here

I am not going to go over the contents of the commons role as it probably speaks for itself on what it does. After the common role, we first need to setup networking which we need for interconnects. Because we have added extra network devices and we later on need to make sure that the device we configured as interconnect interface always keeps being the interconnect. Reason for this is that we have configured Virtualbox so that these interfaces are on their own network. To keep the device persistency we configure udev. This will work because in our Vagrantfile we have set the MAC addresses of the interfaces to a fixed value:

- name: Setup udev for network devices
  replace: dest=/etc/udev/rules.d/70-persistent-net.rules regexp='ATTR.*{{ item.device }}' replace='ATTR{address}=="{{ item.mac|lower }}", ATTR{type}=="1", KERNEL=="eth*", NAME="{{ item.device }}'
  with_items: "{{ network_ether_interfaces }}"
  when: network_ether_interfaces is defined
  register: udev_net

The array the is being referenced at the with_items line here is a host specific setting so it will take this value from the host_vars/ file. This means that the line above will make an udev rule that is specific for every network interface on the host that ansible is running on.

Ansible uses jinja2 as the format to dynamically format files. We user this format for our network configuration files. With the ansible template command, ansible will read the ethernet.j2 file and the jinja engine will use the template to create the correct file. In case of the action below, the ethernet.j2 will be uploaded as a ifcfg- file and it will do so on every host as long it can find data for that host in host_vars directory:

- name: Create the network configuration for ethernet devices
  template: src=ethernet.j2 dest=/etc/sysconfig/network-scripts/ifcfg-{{ item.device }}
  with_items: "{{ network_ether_interfaces }}"
  when: network_ether_interfaces is defined
  register: ether_result

The jinja2 code is quiet simple to read, the ethernet.j2 file looks like this:

# {{ ansible_managed }}
{% if item.bootproto == 'static' %}
DEVICE={{ item.device }}
BOOTPROTO=static
{% if item.address is defined %}
IPADDR={{ item.address }}
{% endif %}
{% if item.onboot is defined %}
ONBOOT={{ item.onboot }}
{% endif %}
{% if item.peerdns is defined %}
PEERDNS={{ item.peerdns }}
{% endif %}
{% if item.defroute is defined %}
DEFROUTE={{ item.defroute }}
{% endif %}
{% if item.netmask is defined %}
NETMASK={{ item.netmask }}
{% endif %}
{% if item.gateway is defined %}
GATEWAY={{ item.gateway }}
{% endif %}
{% if item.mac is defined %}
HWADDR={{ item.mac }}
{% endif %}
{% endif %}

{% if item.bootproto == 'dhcp' %}
DEVICE={{ item.device }}
BOOTPROTO=dhcp
{% endif %}

It is basically divided into to part, the first part tells jinja2 what to do when there is static device configuration and the other one for DHCP enabled devices. Jinja2 will create a line for every item found in the host_vars. The results of this actions are being registered by Ansible with the line “register: ether_result” We are using these results in the next action:

- name: bring up network devices
  shell: ifdown {{ item.item.device }}; ifup {{ item.item.device }}
  with_items: "{{ ether_result.results }}"
  when: ether_result is defined and item.changed

Here we are only restarting those interfaces which are registered in the ether_result action and are changed. A more complex use of ansible is in the template for the hosts file. The hosts file is the basis for DNSMasq which is being used as a simpler alternative for Bind DNS. The template for the hosts file is being created with information from the ansible facts. These facts are being gathered automatically by Ansible as soon as a playbook begins but we have changed things already in this run so right after we have brought up the network interfaces we have gathered our facts again, with these updated facts we can now build our hosts file.

127.0.0.1 localhost
{% for host in groups['all'] %}
{{ hostvars[host]['ansible_eth2']['ipv4']['address'] }} {{ hostvars[host]['ansible_hostname'] }}
{{ hostvars[host]['ansible_eth2']['ipv4']['address'] | regex_replace('(^.*\.).*$', '\\1') }}{{ hostvars[host]['ansible_eth2']['ipv4']['address'].split('.')[3] | int + 10 }} {{ hostvars[host]['ansible_hostname'] }}-vip
{{ hostvars[host]['ansible_eth1']['ipv4']['address'] }} {{ hostvars[host]['ansible_hostname'] }}-priv
{% endfor %}
{% set count = 1 %}
{% for i in range(1,4) %}
{{ hostvars[inventory_hostname]['ansible_eth2']['ipv4']['address'] | regex_replace('(^.*\.).*$', '\\1') }}{{ count | int +250 }} rac-cluster-scan
{% set count = count + 1 %}
{% endfor %}

The for-loop will go through all the facts and get the IP4 address and generates the correct hosts file entry for every node in the cluster. It then takes take the IP4 address and add 10 to the last octet the create the address for the vip interface. The eth1 address i used for the interconnect in our cluster. The last part the the file is a loop the generate 3 additional ip4 addresses based on the address of eth2 and adds 250 to it plus the integer of the loop. These are the our SCAN addresses of our cluster. Now we have the hosts file setup we can install dnsmasq and have DNS ready. We are pinging the interfaces just to make sure they are up, if the ping failes ansible will stop the runbook.

Our network is now setup as we want it to be and we can go on to configure storage. Vagrant already created four shared disks for us which are presented to both virtual machines. We now have to make sure that we have device persistence storage which we can do with both ASMLib and Udev. For both methods there need to be partition, i am using sfdisk as an easy way to create partitions on all 4 disks:

- name: Create disk partitions
  shell: echo "0," | sfdisk -q {{item.1}}
  with_indexed_items: "{{ device_list.stdout_lines }}"
  when: "{{ item.0 }} > 0"
  become: true
  run_once: true
  register: sfdisk_output

The with_indexed_items command gives me an index number which is used to make sure we are not destroying the partitions on /dev/sda where the OS is installed on. Because the OS is installed on the the first disk, we can start sfdisk at index 1. When ansible needs to install asmlib, it then installs the needed RPM’s and uploads and runs a shell script. The reason for this script is that for ASMLib there is some interactivity needed to configure it. This can be solved with regular shell scripting with a here document, as far as i know there is no such thing in Ansible. For Udev we can more or less copy what we did the make our network interfaces persistent.

After we have installed a toolset we can go start with the installation of the Grid Infrastructure. The ASM diskstring and prefix depends on if we have configured it earlier with udev or with asmlib. In order to use the correct valuables i am adding these values to the ansibles facts as custom fact. Depending on which role it finds Ansible well load these values into the facts:

- set_fact: asm_diskstring="/dev/oracleasm/disks"
  when: "'configure_asmlib' in role_names"

- set_fact: asm_diskstring="/dev/ASMDISK*"
  when: "'configure_udev' in role_names"

Now we can create our SSH keys on both nodes, download the public key back to our Virtual Box host. After we have uploaded the public keys to all the hosts we can add the to our know_hosts using ssh-keyscan.

We upload the Grid Infrastructure zipfiles, extract them and install the cvuqdisk rpm file on all nodes. For the installation of Grid Infrastructure software itself we network_ether_interfaces array to set the client and interconnect interfaces correctly. The response file is made out of a jinja2 template so we can easily customise some settings depending on our needs.  Because we are doing a silent install from a response file we need to run the configToolAllCommands which is normally an interactive part when you use the OUI. Finally we create the FRA diskgroup and we can clean up everything we uploaded and now longer need.

The installation of the RDBMS software with Ansible is now very straightforward. It is just a case of adjusting the limits.conf, creating the response file using the jinja2 template and install the software like your regular silent install. The same goes for the actual creation of the database, there are the checks, the response file creation and the database creation itself. For database creation right now i use the response file method, as soon as i have some time i will switch this out for database creation using a template to have more fine-grain control on the created database.

If you take just the Vagrant file and the ansible file and start installing from scratch it will take you a couple of hours (about 3) to download the vagrant box, the software, creating the machines and the provisioning with Ansible.

  • The Github repository can be found here
  • The blogpost about Vagrant and the vagrantfile can be found here
  • Creating a RAC cluster using Ansible (part 1) can be found here

Creating a RAC cluster using Ansible (part 1)

In my previous blog post i explained how Vagrant can be used to create a 2-node VM environment for deploying a RAC cluster on it. In this post i want to dive into how to actually do the RAC install using ansible so you can easily create a RAC test environment on eg. your laptop.

I have created a git repo with the files that i am using for this blog post, this git repo can be found here

I am not going over over the basic of Ansible in this blogpost, i am assuming you have a basic understanding of Ansible and the YAML markup language. If you need a quick refresh on how Ansible works you can take a look here or here for example.

When trying to automate a RAC deployment you need to think about which steps to take and in what order. Also out-of-the-box Vagrant will do a serial deployment of your cluster which will for obvious reasons not work, the solutions for this is in my previous blogpost. It explains on how to setup Vagrant so ansible can do a “parallel” deployment. I will use a slightly modified version of the Vagrantfile i created in that blogpost.

If you go through the new vagrantfile you will see a few changes, the most obvious one are all the new parameters on top they can be used to easily tweak the RAC setup to your liking. You can change between things like CDB or not, memory settings, domain name etc. These parameters are all being passed over to ansible:

ansible.extra_vars = {
devmanager: "#{devmanager}",
db_create_cdb: "#{db_create_cdb}",
db_pdb_amount: "#{db_pdb_amount}",
db_name: "#{db_name}",
db_total_mem: "#{db_total_mem}",
gi_first_node: "#{host_prefix}1",
gi_last_node: "#{host_prefix}#{servers}",
domain_name: "#{domain_name}"
}

The final change i made to my vagrantfile is at the network section, i am providing the VM’s with a fix MAC address so i can refer to those later on when using udev as the device manager. I will get back to the MAC addresses later (in the next post) when we get to the part that configures device persistency

vb.customize ['modifyvm', :id, '--macaddress2', "080027AAAA#{rac_id}1"]
vb.customize ['modifyvm', :id, '--macaddress3', "080027AAAA#{rac_id}2"]

Vagrant is now able to create 2 VM’s that we can use for our RAC cluster. As you can imagine building 2 node RAC cluster with ansible will result to a rather large playbook that will be a bit cumbersome to work with. Also you might want to re-use certain parts later on for different deployments, in order to solve these issues Ansible has to ability to create roles where every role is defined set of actions all with its own set of files, templates and variables. For this RAC playbook i  have created a set of roles to handle specific parts of the installation:

  • common – setups some common linux setting we want to have no matter what
  • configure_network – setups linux network and configures stuff like DNS
  • configure_udev – If selected in the vagrantfile this will configure the system for udev
  • configure_asmlib – If selected in the vagrantfile this will configure asmlib
  • tools_install – installs a toolset with several scripts, install rlwrap etc.
  • rac_install_gi – installs and configures Oracle Grid Infrastructure
  • rac_install_db – installs and configures the Oracle Database home
  • rac_create_db – Creates the RAC database

The reason to use roles is to be keep a better overview of your ansible scripts but even more you can reuse roles for other playbooks. These roles are being called in the main Ansible file that is defined in the vagrantfile at line 88:

ansible.playbook   = "ansible/rac_gi_db.yml"

This YAML file kicks of the entire installation of our RAC cluster, if look in rac_gi_db.yml you will notice that there only a few lines there. This file is referencing the different roles. If Ansible is seeing role common, it expects that there is a subdirectory called roles, which should have a subdirectory in it with the role name, with another nested subdirectory tasks in it. In that subdirectory Ansible will look for a file called main.yml and runs that. So to visualize this, Ansible expects the following directory structure as a minimum for a role:

rac_gi_db.yml/roles/<em></em>/tasks/main.yml

The Ansible documentation itself has a more in-depth explanation on this. Another thing you might notice are the roles for configuring either asmlib or udev:

- { role: configure_udev, when: devmanager == "udev" }
- { role: configure_asmlib, when: devmanager == "asmlib" }

Like mentioned above, a role is a set of subdirectories that holds the information needed to perform the tasks of the specific role. Directories must have a specific name in order for Ansible to recognise them. Roles can have the following subdirectories, depending on your need.

  • defaults/ – where variables local this role are stored
  • files/ – needed if you want to upload files with ansible to a host. Eg. zipfiles for an Oracle GI install
  • tasks/ – This directory contains at least a file named main.yml which is holds all the tasks that this role has to execute.
  • templates/ – As the name suggests, it contains templates to be converted into files by Ansible. For Example a template for a responsefile that needs to be created with the correct database name and settings or a hosts file with the correct host names. These files are in the Jinja2 format.

If you look at line 11 in the vagrantfile you can see that there is a variable devmanager defined which is being parsed into Ansible at line 90. So depending on the value of the variable devmanager Ansible will either configure udev or asmlib. Global variables are furthermore set in the groupvars/all.yml file, these variables are being passed over too all roles in the group ‘all’.  In this ansible setup variables are being set at different levels, i made it so that the most important variables can be set in top of the vagrant file in lines 11 to 16 this can of course be expanded if needed. Besides passing variable direct to Ansible, specific roles can have their own variables, these are variables are set in rac_gi_db.yml/roles//defaults/main.yml. These are variables local to the execution of this specific role and can’t be referenced outside it. The final group of variables we need are host specific variable like ip address and network configurations in general. For these settings we have a file per host, were every host has its own file in the sub directory host_vars.

Now we know how Ansibles gets the parameters, we know what the roles are doing we can can actually start deploying. As soon as your start vagrant and it is done creating the initial VM’s (covered in my previous blogpost) it will start ansible and go through the roles and runs all the YAML files in the tasks directories. At one point you will find yourself in the need of getting some configuration files during installs but these files are dependent on eg. IP addresses, hostnames, database name etc. For example you can you use this task to create a dnsmasq config based on variables in an array called ether_result.results (which we created somewhat earlier in the task.

- name: Create dnsmasq configuration
template: src=dnsmasq.conf.j2 dest=/etc/dnsmasq.conf
with_items: "{{ ether_result.results }}"

In order to get these dynamic configurations we need to write these files with a language called Jinja2. You can take your normal configuration file and replace the variable with something that Jinja2 can work with. As an example let’s look at the dnsmasq.conf file in Jinja form:

# {{ ansible_managed }}
#/etc/dnsmasq.conf
domain-needed
bogus-priv

domain={{ domain_name }}
expand-hosts
local=/{{ domain_name }}/

server=10.0.2.3

interface=eth0
{% for msg in ether_result.results %}interface={{ msg.item.device }}
{% endfor %}

As you can see it is almost your regular dnsmasq.conf file but with some changes. At the top we have a variable  {{ ansible_managed }} which will be replaced with a string containing source file name, creation date and from which host it was created. The {{ domain_name }} variable is being passed on to ansible from vagrant and is actually set in the Vagrantfile.  the last lines of the files are a jinja2 loop going through whatever is in ether_result.results and finding every msg in msg.item.device, which will contain all the names of the network devices i have configured and placing it after the string “interface=”. After ansible is done with the task above you will end up with a file that looks something like this:

# Ansible managed: /my/path/t0/dnsmasq.conf.j2 modified on 2016-08-09 12:24:40 by klaasjan on exabook.local
#/etc/dnsmasq.conf
domain-needed
bogus-priv

domain=mydomain.local
expand-hosts
local=/mydomain.local/

server=10.0.2.3

interface=eth0
interface=eth1
interface=eth2

If everything is configured correctly, installation of the software can begin. So we need to move some zip files from the your host to the guest vm. Ansible does this with the copy command:

- name: Copy RDBMS zipfiles
copy: src={{ item }} dest={{ ora_stage }} owner={{ oracle_user }} group={{ oracle_group }} mode=0644
with_items: "{{ rdbms_zips }}"
when: inventory_hostname == "{{ gi_first_node }}" and rdbms_check.stdout != rdbms_home

The command above will copy the files in the variable {{ item }}, which is being read from the array {{ rdbms_zips }} on the line “with_items:”, it then copies these files to the whatever location is set in {{ ora_stage }} and with the correct user and group permissions. The final line means that it will only copy these files to the first node (we do not need them twice) and when there is not already and oracle home installed.

I mentioned above that ansible will look for the file item.rdbms_zips in the ./files subdirectory. So before installing make sure that all the needed files are there. To make this a bit easier for myself i have created a small script called rebuild.sh which will download these files for you (you need to have an MOS account). It will do so using Maris Elsin excellent getMOSPatch command, which is a jar files now and no longer a shell script. After downloading rebuild.sh will do what the name suggests and remove and rebuild your vagrant RAC environment. Use at your own risk, if it breaks a network config for you in VirtualBox or something in Vagrant… i have warned you.

In the next blogpost i will explain in somewhat more detail what the different steps are doing.