Golden Gate Activity Logging Tracing

A lot of customers of mine who are starting with Oracle Golden Gate have issues when it comes to troubleshooting Golden Gate. It is the most common question i get: “I have setup Golden Gate, it was working fine but now process X abandoned… now what do i do?”.  The OGG error log is a good starting point but is not always showing the real issue. There are a lot different way to obtain more information on why you are running into an issue. In this blogpost i want to introduce a maybe not so well known feature called Golden Gate Activity Logging Tracing and yes that is how they are really calling this feature.

The most obvious way to enable tracing are the TRACE(1)/TRACE2 commands which are fairly well documented in OGG documentation. You can either send the command directly to a running process like this:

GGSCI (rac1) 3> send extract EXT1 trace /tmp/trace_me.trc

Sending trace request to EXTRACT EXT1 ...
Trace file /tmp/trace_me.trc opened.

If we then do an insert into a table that is being monitored by this Extract process will write the following piece of data in /tmp/trace_me.trc:

12:20:26.259 (397290) * --- entering DataSource::readLCR() --- *
12:20:28.161 (399192) Extract pointer info reused [0], [file 0][src 0][tgt 0][tab PDB1.PDB1ADMIN.T1]. File [PDB1.PDB1ADMIN.T1], targ_idx [4294967295], object type [2], num_extracts [1]
12:20:28.163 (399194) exited DataSource::readLCR (stat=0, seqno=0, rba=0 io_type=5)
12:20:28.163 (399194) processing record for PDB1.PDB1ADMIN.T1
12:20:28.163 (399194) Writing seqno [21], rba [1465], table [PDB1.PDB1ADMIN.T1]
12:20:28.164 (399195) * --- entering DataSource::readLCR() --- *

The TRACE parameter shows what the extract is reading from the database and writing into the OGG trail file. This gives us some useful information we could use for troubleshooting OGG. To disable tracing from our extract we send it the TRACE OFF command:

GGSCI (rac1) 5> send extract EXT1 trace off

Sending trace request to EXTRACT EXT1 ...
Closing all trace files..

Keep in mind though that on a busy system with lots DML tracing can generate a huge amount of data. Besides sending the TRACE/TRACE2 command to a process you can add it to a parameter file directly to start tracing as soon as the the extract or replicat has finished parsing the parameter file. The actual tracing output from TRACE is limited, the main purpose is mainly to see where Golden Gate is spending most of its time on:

SUMMARY STATISTICS

General statistics:
0.00% Checking messages (includes checkpointing)
0.00% Checking periodic tasks
0.00% Waiting for more data
0.00% Converting ASCII data to internal
10.45% Reading input records
0.00% Writing output records (replicate_io)
0.00% Mapping columns
0.00% Outputting data records
0.00% Performing SQL statements
0.00% Executing BATCHSQL statements
0.00% Executing SQL statements
0.00% Preparing SQL statements
0.00% Commit operations
0.00% Rollback operations
0.00% Checkpointing

The difference between TRACE and TRACE2 is minimal, it showing you how long your process in spending in de different code segments:

17:00:07.961 (2814715) * --- entering DataSource::readLCR() --- *
17:00:07.961 (2814715) Reading input records 10.357% (execute=291.531,total=2814.713,count=15230)
17:00:07.961 (2814715) Checking messages (includes checkpointing) 0.005% (execute=0.143,total=2814.713,count=15230)

If you want tracing in Golden Gate you should look elsewhere, in the past Golden Gate had features such as tltrace and traceini but they are deprecated or being deprecated. Golden Gate has introduced Activity Logging Tracing in release 11.1.1.0.0. This feature has been out for a long while but is only documented in DocID 1204284.1 and can be very handy at some points. Activity Logging Tracing is enabled by default in OGG kernel but must be enabled which it can do on 2 ways. Activity Logging Tracing can generate a dump file as soon as one of your OGG binaries  abends. The other way to enable Activity Logging Tracing is by placing an XML file in $OGG_HOME that has this syntax: gglog-<process_name>.xml where process_name is can by the name of the group you want trace or the name of the binary. Doc ID 1204284.1 is not very clear about this but you can actuall trace almost every OGG binary including ggsci and server. Some examples to clarify how to name your Activity Logging Tracing xml file:

  • If you want to trace your extract EXT1 name the file gglog-EXT1.xml
  • If you want to trace an replicat called REP1 call it gglog-REP1.xml
  • For tracing every extract name it gglog-extract.xml
  • Similar for tracing ggsci call it gglog-ggsci.xml
  • If you want to trace everything name the file gglog.xml however this will generate a huge, huge, huge amount of data.

As soon as you place your xml file in $OGG_HOME Golden Gate will start tracing and depending on your XML file  contents these files can become very big very quickly. Activity Logging Tracing will generate a file gglog__.log, so tracing extract EXT1 will generate the file gglog-EXT1_extract.log and tracing ggsci will generate gglog-ggsci_ggsci.log. This naming behaviour can be changed in the xml-file if you want the trace files to be generated on eg. a different filesystem:

<param name="File" value="gglog-%I_%A.log"/>

The default logging format is OGG is log4j and the documentation gives some clue on the general working of this XML file. When a process sees the XML files it calls the functions form libgglog.so and starts processing whatever is configured in the XML file. Doc ID 1204284.1 gives you some hints on what can by used for tracing. The example file gglog-full.xml shows that with only adding this:

<level value="all"/>

Golden Gate will immediately starts tracing everything, but be warned this will output a lot of trace information. Personally i tend to use de gglog-debug.xml as a starting off point for Golden Gate tracing as it already gives some more fine grained control. As an example, the values below gives you a fine grained-control on the redo generation tracing:

<logger name="er.redo.ora">
<level value="all"/>
</logger>
<logger name="er.redo.ora.data">
<level value="all"/>
</logger>
<logger name="er.redo.ora.rtc">
<level value="off"/>
</logger>
<logger name="er.redo.ora.rtcfm">
<level value="off"/>
</logger>
<logger name="er.redo.ora.thread">
<level value="all"/>
</logger>
<logger name="er.redo.ora.thread.checkpoint">
<level value="off"/>
</logger>
<logger name="er.redo.ora.thread.blocks">
<level value="off"/>
</logger>
<logger name="er.redo.ora.thread.logswitch">
<level value="all"/>
</logger>
<logger name="er.redo.ora.thread.unpack">
<level value="all"/>

Enabling all of them enables you to trace redo information in realtime. It dumps data in an output something similar to this:

IXAsyncTrans 2907 oracle/ltwtcapture.c | >>>>>> lcrcnt=1031 received lcr type Internal ROW
IXAsyncTrans 1254 oracle/ltwtcapture.c | kngl_flag 0x00 kngl_xflag 0x0000 krvx_dflag 0x0002
IXAsyncTrans 1079 oracle/ltwtcapture.c | lcrvsn: [5] unpacked scn: [0x0000000000477cee]
IXAsyncTrans 1848 oracle/ltwtcapture.c | Ignore opcode 1 (orig op: 7)
IXAsyncTrans 2950 oracle/ltwtcapture.c | received commit, return
IXAsyncTrans 455 oracle/IXAsyncTrans.cpp | Queueing committed record with tid '2.30.1940'
IXAsyncTrans 580 oracle/IXAsyncTrans.cpp | Queueing result (DataBuffer,452,0x0000000004456b80)
IXAsyncTrans 2115 oracle/redooraix.c | Try to get record from logmining server.
IXAsyncTrans 2968 oracle/ltwtcapture.c | end of lcr buffer, return nodata
IXAsyncTrans 2973 oracle/ltwtcapture.c | Release lcr buffer
IXAsyncTrans 2151 oracle/redooraix.c | no lcr returned
IXAsyncTrans 2115 oracle/redooraix.c | Try to get record from logmining server.
main 325 oracle/IXAsyncTrans.cpp | Data buffer 0x0000000004456b80 returned for 0x1c4 bytes
main 10848 oracle/redoora.c | Processing next transaction with xid '2.30.1940'
main 10626 oracle/redoora.c | Processing first record for committed transaction 2.30.1940, scn 4685038
main 4530 oracle/redooraix.c | Try to get committed lcr from COM 0x00000000031f5750 with TID 0002.01e.00000794
main 1705 oracle/redooraix.c | maintain_rc_count(-1), xid=0x0002.01e.00000794 op=INSERT lcr_flag=0x8 xflag=0x0 row_cnt=0 chunk_cnt=0 txn_flag=0x0 groupRBA=0 groupthread=0
main 11032 oracle/redooraix.c | retrieved item# 1 with op 2 (INSERT), still have 0 row, 0 chunk in COM, source COM
main 5006 oracle/redooraix.c | Retrieved item# 1 from COM with TID 0x0002.01e.00000794, skip 0, more_record 0, tind 3
main 5015 oracle/redooraix.c | Exhausted COM, no more items, delete tran now.
main 10751 oracle/redoora.c | first record from commit txn with scn 4685038, seqno 53, rba 32348688, thread_id 1, jts_time_stamp 212351410734000000
main 7753 oracle/redoora.c | FORMAT XOUT LCR:
main 1638 oracle/redooraix.c | === NEW column list (num_columns=1) ===
main 1650 oracle/redooraix.c | Col[1](id,dty,len,uid,sid,iid,flag,flg,flg2,flg3,type): 0 0 1 0 1 1 0 0 0 0 1
main 1664 oracle/redooraix.c | 000000: 58 |X |
main 1638 oracle/redooraix.c | === OLD column list (num_columns=0) ===
main 6837 oracle/redooraix.c | XOUT_INSERT: format insert
main 5566 oracle/redooraix.c | Format col[1] NEW (id,dty,len,key,flag,flg3,type): 0 1 1 1 0x0 0x0 1
main 6396 oracle/redooraix.c | AFC/CHR after format value: 000000: 58 |X |
main 7010 oracle/redooraix.c | After format lcr, len 9
main 7012 oracle/redooraix.c | 000000: 00 00 05 00 00 00 01 00 58 |........X |
main 5388 oracle/redooraix.c | After format record at 53 - 32346988, get trail record with len 9
main 5389 oracle/redooraix.c | 000000: 00 00 05 00 00 00 01 00 58 |........X |
main 349 oracle/IXAsyncTrans.cpp | Data buffer 0x0000000004456b80 released
main 567 oracle/IXAsyncTrans.cpp | Queueing free buffer 0x0000000004456b80
IXAsyncTrans 2863 oracle/ltwtcapture.c | Receive 336 bytes from LCRCaptureReader
IXAsyncTrans 2907 oracle/ltwtcapture.c | >>>>>> lcrcnt=1032 received lcr type Internal ROW

There are more loggers then mentioned by Oracle in Doc ID 1204284.1 and as far i know they are not publicly available. Some of the loggers are quiet easy to figure out what they are doing like tracing DDL in a extract process for example:

<logger name="gglog.ddl.std">
<level value="all"/>

But mostly it is just guess work and/or a lot of testing to see what they actually do (and what they are). In general, you don’t really need them, in most cases the files with loggers as they are enough to troubleshoot most of the issues.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s