End To End Metrics: A bridge between the developer and the DBA

performance_featureFirst time I heard about end-to-end metrics was in a presentation given by Cary Millsap. The name of the presentation was “Thinking Clearly About Performance”. The presentation was great. One of the quotes that I really enjoyed was “Performance does not happen by accident! It’s a feature.”. One of the main topics in the presentation was how you could – and should – instrument your code, for instance using end-to-end metrics. With the extended use of web application servers and connection pooling, most DBAs have experienced the frustration and difficulty in how to make a good trace of the problem sessions. End-to-end metrics really solves this issue (and a lot more). So after Cary’s presentation I was really thrilled about this new feature in Oracle 10g. But the eager didn’t last very long. I very quicly realized that even though being a fantastic feature, no developers in our company knew about it, and clearly enough, was not implementing it into their code. So I started on a long and burdensome process of educating our developers.

The first hazle was to convince our developers that they actually needed to know something about Oracle. They knew SQL … well, at least they thought they knew SQL. The Oracle stuff was left to the DBA, if they was (un)lucky enough to have a DBA in their project. You might wonder why i added the “(un)” – as in unlucky. This is actually another story that I’m planning to blog about. But in short; if I needed a brain surgery, and the surgant was my usual doctor I visit everytime I got a sore throat, I would not carry out the surgery. The same way I see DBAs being used whenever there is a need for Oracle knowledge. Very few take the time to ask “what kind of Oracle knowledge do we need?”, and which ressource actually has this knowledge. To me it seams like most developers and project leaders think that every DBA knows what’s worth knowing about Oracle. This is usually not the case. Well … let’s get back to the end-to-end metrics.

After being involved in several performance taskforces, people started to realize that they might be doing things a little wrong, and that some more knowledge about our databases might be smart. But even after spending weeks lecturing about Oracle (included end-to-end metrics) in different development departments, the metrics didn’t show up in the code. Our developers are very skilled, and was actually very excited about the end-to-end metrics, but this might not be enough to start using it. Both the DBAs and the developers work in a very busy environment, and sometimes we all need an extra push to start using a new feature. So after finally positioning myself just in the middle of our developers, working daily togheter with the developers, things started to get moving. Being seated right next to one of our senior developers, we tested and implemented the needed (end-to-end metrics) code into our company template code. The testing thought me that the end-to-end metrics is a lot more than enabling trace in the database. In addition I learned a lot from our developers. They represent another way of working, and I really think DBA’s could learn a lot from the developers.

In this article I’ll discuss some of the great advantages of using this kind of instrumentation in your Java code. Yes. This article discusses the java implementation, but this could also be implemented in any programming language using the DBMS_APPLICATION_INFO package in the PLSQL library. The advantage of using the end-to-end metrics features in the jdbc driver, is that the metric communication is piggy-backed into the existing network communication. This means that there is very little overhead using the end-to-end metric functionallity.

What is End-to-End Metrics?

End-to-end metrics is actually about leaving traces (not Oracle Trace) of the running code in the Oracle database. There is actually 4 tags in the Oracle database, where you can leave information about the application code running against the database in question. These tags are named “module”, “action”, “client_identifier” and “client_info”, and are set on the session level. The tags can – of course – be found by querying the v$session view. Oracle can then start trace or collect statistics on the basis of one or more of these tag values. This collection is not limited to one session, but will work on any session having the actual tags set.

How?

Now we’ll have a short look at how we can set this end-to-end metric tags. The general way is to call the DBMS_APPLICATION_INFO package:

SQL> BEGIN -- set tags
 2     DBMS_APPLICATION_INFO.SET_MODULE(
 3        module_name => 'demo', 
 4        action_name => 'cntObjects');
 5   END;
 6   /

PL/SQL procedure successfully completed.

SQL> SELECT module, action -- tags for current session
 2   FROM v$session
 3   WHERE sid = SYS_CONTEXT('userenv', 'SID');

MODULE     ACTION
---------- ----------
demo       cntObjects

For further instruction of using the DBMS_APPLICATION_INFO package, do a describe on the package and use the Oracle documentation. A quick search on google can also be a good start.

Then back to the Java implementation. The actual code for setting the metrics is very simple. All you need is a reference to your connection object:

String metrics [] = null;

metrics = new String[OracleConnection.END_TO_END_STATE_INDEX_MAX];
metrics[OracleConnection.END_TO_END_ACTION_INDEX] = "cntObjects";
metrics[OracleConnection.END_TO_END_MODULE_INDEX] = "demo";
metrics[OracleConnection.END_TO_END_CLIENTID_INDEX] = "lasse.jenssen(a)evry.com";

((OracleConnection) connection).setEndToEndMetrics(metrics, (short) 0);

How to actually use this piece of code in your application, is of course also a very interesting question? But the answer to this depends on your application. It might be that you want to use AOP techniques, or you might want to set this directly in your DAO’s. But these things the developers know how to handle, and is not anything the Oracle resources usually need to worry about.

When?
The simple answer to this is “ASAP” (As Soon As Possible). If we mean “when”, as in when to call this in the code, this is a developer issue. And most Java developers will know where and when to call this code. One issue is that you want to remove the tags when your calling code finishes, as in:

metrics[OracleConnection.END_TO_END_ACTION_INDEX] = "";
metrics[OracleConnection.END_TO_END_MODULE_INDEX] = "";
metrics[OracleConnection.END_TO_END_CLIENTID_INDEX] = "";

((OracleConnection) connection).setEndToEndMetrics(metrics, (short) 0);

If your code is the only code using the connection pool, you might not need to do this. This might work as long as all code running against these connections (or sessions) overwrite the old tags. The problem with not reseting the tags, is that you now need to interpret the meaning of waits for your client (“SQLNet message for client”). Are these waits caused by your application code or have your application code finished (just leaving the tags in the used session)? The best solution would definitly be if your code could reset the tags after finishing the execution.

Why use end-to-end metrics?

Oracle trace
So – why do you want to do this? Well. The first reason, which I learned from Cary Millsap (and allready mentioned), is the aid in tracing problem code. With the DBMS_MONITOR package you could either turn on trace for:

  • Service Name
  • Service Name, Module
  • Service Name, Module, Action

The service names in use could be found by querying either v$session or dba_services:

SQL> select * from dba_services;

NAME                                                                            
--------------------------------------------------------------------------------
SYS$BACKGROUND                                                                  
SYS$USERS                                                                       
ORCLXDB                                                                         
ORCL.edb.com                                                                    

4 rows selected.

SQL> select username, service_name from v$session where module='demo'

USERNAME             SERVICE_NAME                  
-------------------- ------------------------------
DEMOD                SYS$USERS                     

1 row selected.

To enable and disable trace, you could call the following procedures:

exec dbms_monitor.serv_mod_act_trace_enable(
      service_name =>'SYS$USERS',
      module_name =>'demo',         // If null then not set (action must be null)
      action_name =>'cntObjects',   // If null then not set 
      waits =>TRUE,                 // default: TRUE
      binds =>TRUE,                 // default: FALSE
      plan_stat => 'ALL_EXECUTIONS' // Alternativs: NEVER, FIRST_EXECUTION(default/ if null)
);

After you have ran the procedure you can verify by running:

SQL> select trace_type, primary_id service, qualifier_id1 module, qualifier_id2 action,
 2          waits, binds, plan_stats
 3   from SYS.DBA_ENABLED_TRACES;

TRACE_TYPE             SERVICE    MODULE  ACTION      WAITS   BINDS   PLAN_STATS
---------------------- ---------- ------- ----------- ------- ------- ----------
SERVICE_MODULE_ACTION  SYS$USERS  demo    cntObjects  TRUE    TRUE    ALL_EXEC  

1 row selected.

To turn of the trace, run the following command:

exec dbms_monitor.serv_mod_act_trace_disable(
      service_name =>'SYS$USERS',
      module_name =>'demo',       // If null then not set (action must be null)
      action_name =>'cntObjects'  // If null then not set          
);

If instance_name is used when enabling, then it must be used when disabling.

When turning on trace by using end-to-end metrics tags, you’ll probably end up with several different trace files. Remember – you have not turned on trace for a single session, but for any session running with the tags module=’demo’ and action=’cntObjects’. To collect all this trace information into one trace file, we use the trcsess utility:

oracle@edbwp003222:/u01/app/oracle/diag/rdbms/orcl/ORCL/trace$ grep cntObjects *
ORCL_ora_10616.trc:*** ACTION NAME:(cntObjects) 2013-03-20 16:51:42.408
ORCL_ora_29618.trc:*** ACTION NAME:(cntObjects) 2013-03-26 15:08:11.650
ORCL_ora_5689.trc:*** ACTION NAME:(cntObjects) 2013-04-01 09:17:52.316
oracle@edbwp003222:/u01/app/oracle/diag/rdbms/orcl/ORCL/trace$ trcsess output=mytrace.trc module=demo *
oracle@edbwp003222:/u01/app/oracle/diag/rdbms/orcl/ORCL/trace$ vi mytrace.trc 

*** [ Unix process pid: 10616 ]
*** 2013-03-20 16:51:42.408
*** 2013-03-20 16:51:42.408

=====================
PARSING IN CURSOR #140445102540864 len=64 dep=0 uid=167 oct=47 lid=167 tim=1363794702408320 hv=1969341072 ad='1b71f7b68' sqlid='g6wmnb1uq3hnh'
begin :1 := dbms_pickler.get_type_shape(:2,:3,:4,:5,:6,:7); end;
END OF STMT
PARSE #140445102540864:c=0,e=101,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=1363794702408316
BINDS #140445102540864:
 Bind#0
...
WAIT #140445102512336: nam='SQL*Net message from client' ela= 809594 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1363794936847553
*** [ Unix process pid: 29618 ]
*** 2013-03-26 15:08:11.650
*** 2013-03-26 15:08:11.650

=====================
PARSING IN CURSOR #140167171049536 len=64 dep=0 uid=167 oct=47 lid=167 tim=1364306891650303 hv=1969341072 ad='1b713f2e0' sqlid='g6wmnb1uq3hnh'
...
WAIT #140167171049536: nam='SQL*Net message from client' ela= 816023 driver id=1952673792 #bytes=1 p3=0 obj#=102748 tim=1364307407097287
*** [ Unix process pid: 5689 ]
*** 2013-04-01 09:17:52.316
*** 2013-04-01 09:17:52.316

=====================
PARSING IN CURSOR #140072449544176 len=64 dep=0 uid=167 oct=47 lid=167 tim=1364800672316437 hv=1969341072 ad='1b653c5c0' sqlid='g6wmnb1uq3hnh'
...

The generated tracefile will now include the related trace information from the three trace files (ORCL_ora_10616.trc, ORCL_ora_29618.trc, ORCL_ora_5689.trc). We find the corresponding traces bye searching for “Unix process pid” in the generated trace file (see above). Now we can use the regular TKPROF utility to generate the trace report:

oracle@edbwp003222:/u01/app/oracle/diag/rdbms/orcl/ORCL/trace$ tkprof mytrace.trc out.txt 

TKPROF: Release 11.2.0.2.0 - Development on Mon Apr 1 09:52:14 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


oracle@edbwp003222:/u01/app/oracle/diag/rdbms/orcl/ORCL/trace$ vi out.txt

SQL ID: 8n44v1z19sc3u Plan Hash: 3119367398

SELECT /*+ LEADING( b1 ) USE_NL( b2 ) */      distinct b1.object_id 
FROM
 big_table b1, big_table b2 WHERE b1.object_id=b2.object_id   AND 
  b1.object_id IN (SELECT * FROM TABLE(CAST(:1 As NUMBER_LIST_TYPE)))


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.01          3         93          0           0
Fetch       57    911.69     917.61       3670  177746549          0         564
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       63    911.70     917.62       3673  177746642          0         564

Misses in library cache during parse: 2
Misses in library cache during execute: 2
Optimizer mode: ALL_ROWS
Parsing user id: 167  
Number of plan statistics captured: 3

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
       188        188        188  HASH UNIQUE (cr=59248850 pr=1223 pw=0 time=305869772 us cost=16455452 size=12 card=1)
       188        188        188   HASH JOIN  (cr=59248850 pr=1223 pw=0 time=11513166 us cost=16455451 size=2400 card=200)
       200        200        200    COLLECTION ITERATOR PICKLER FETCH (cr=0 pr=0 pw=0 time=83 us cost=29 size=400 card=200)
     64330      64330      64330    NESTED LOOPS  (cr=59248850 pr=1223 pw=0 time=318844493 us cost=16455421 size=643300 card=64330)
     64330      64330      64330     TABLE ACCESS FULL BIG_TABLE (cr=920 pr=612 pw=0 time=57642 us cost=258 size=321650 card=64330)
     64330      64330      64330     TABLE ACCESS FULL BIG_TABLE (cr=59247930 pr=611 pw=0 time=305604784 us cost=256 size=5 card=1)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      57        0.00          0.00
  Disk file operations I/O                        2        0.00          0.00
  asynch descriptor resize                        8        0.00          0.00
  SQL*Net message from client                    57        1.00         56.57
  db file sequential read                         2        0.00          0.00
  direct path read                               42        0.00          0.02
  db file scattered read                         58        0.02          0.28
  latch: cache buffers chains                     1        0.00          0.00
 
...

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        6      0.00       0.00          0          0          0           0
Execute      6      0.03       0.04          5        111          0           3
Fetch       57    911.69     917.61       3670  177746549          0         564
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       69    911.72     917.66       3675  177746660          0         567

This time it was quite easy to find the bottlenecks. Actually there are two potential issues here.
Above we see that the SQL in question is ran 3 times (one execution per trace file). The elapsed time accounts for just about 100 % of all the non-recursive statements. Well … no wonder. This is a silly SQL made just for the purpose of taking some time, doing a nested loop over to table scans on relativly large tables.

Another thing to notice is the 56 seconds spent waiting for “SQL*Net message from client”. This is the database waiting for the client. If we didn’t reset the end-to-end metric tags this would be expected. The wait time after the client finishes would then be included in the trace. But in this occation the code did reset the tags. Therefore these waits should not be expected. We also notice that the database waited 57 times. This is an exact match to the fetch number for the query. So there is obviously something going on around every fetch from the client. And if we look inside the demo code, we’ll see the following:

public int cntObjects() throws Exception {
		etem.setEndToEndMetrics("demo", "cntObjects", "lassej");
		
		// Debug
	    ArrayDescriptor oracleCollection = ArrayDescriptor.createDescriptor("NUMBER_LIST_TYPE",conn);
	    PreparedStatement stmt = conn.prepareStatement(
	         "/* Count Objects */" +
	         "SELECT /*+ LEADING( b1 ) USE_NL( b2 ) */" +
	         "      distinct b1.object_id FROM big_table b1, big_table b2 " +
	         "WHERE b1.object_id=b2.object_id " +
	         "  AND b1.object_id IN (SELECT * FROM TABLE(CAST(? As NUMBER_LIST_TYPE)))");


	    int[] javaArray1 = new int[200];

	    for (int i=0;i<200;i++){
	       javaArray1[i] = i*10/3;
	    }

	    ARRAY jdbcArray1 = new ARRAY (oracleCollection, conn, javaArray1);
	                                                        // define our oracle array
	    stmt.setObject(1,jdbcArray1);                      // bind array to bind variable 

	    ResultSet r=stmt.executeQuery();                   // execute the query

	    System.out.println("Found values:");
	    System.out.println("-----------------------------------");

	    int obj_id = 0;
	    int count = 0;
	    
	    r.setFetchSize(fetchSize);
	    
	    while(r.next()){
	       count++;
	       obj_id = r.getInt(1);
	       Thread.currentThread().sleep(100);
	       System.out.println(obj_id);
	    }                                                  // browse the result

	    System.out.println("-----------------------------------");
	    System.out.println("Total: " + count);
	    
	    etem.resetEndToEndMetrics();
		
		return count;
	}

Ok – lets have a smal look at the out put from this code:

Counting objects .... (Waiting) 
Found values:
-----------------------------------
43
...
656
660
-----------------------------------
Total: 188

The query found a total of 188 rows. The number of fetches was 57 for 3 exections, which gives 19 fetches and waits (for client) each execution. This indicate a fetchsize of 10. If we look in the code we see that this is true:

public class DemoService {
	private Connection conn;
	private EndToEndMetrics etem;
	private int fetchSize = 10; 
        ...

Each fetch will then have approximatly 10 rows, giving a sleep for (100*10) 1000 ms = 1 sec. This fits perfectly with the wait time seen in the tracefiles. 1 second waited for every 19 fetches (=19 seconds), with a totol of 3 exections (3x19sec) gives a total of 57 seconds.

“Bridge between the developer and the DBA”/ Real-time monitoring

Another very good reason for using end-to-end metrics is that it ties the application code to the session. Many times I’ve spent quite some time togheter with our developers, trying to figur out what kind of code is running in the different sessions. With the end-to-end metrics tags set, I can easily locate which sessions are running the different code paths. The tags becomes a bridge between the developer (or application) and the DBA (Oracle resource). This gives a possibility of doing real-time monitoring. In addtion to the v$session view, the end-to-end metric tags are also found in the v$sql, dba_hist_sqlstat, v$active_session_history and dba_hist_active_sess_history views. The last three views requires a diagnostic pack license.

Historical View

The ASH views (dba_hist_sqlstat, v$active_session_history and dba_hist_active_sess_history) gives a historical view on performance metrics. Very often someone reports slowness in the database, and when this reaches the Oracle ressource, the problem has vanished. With the ASH views we have a possibility of looking back in time – for instance on the basis of our end-to-end metrics tags:

SQL> select snap_id, sql_id, fetches_total fetch#, disk_reads_total pio#, buffer_gets_total lio#, 
 2          rows_processed_total rows# 
 3   from dba_hist_sqlstat where module='demo' and action='cntObjects'

   SNAP_ID SQL_ID                  FETCH#       PIO#       LIO#      ROWS#
---------- ------------------- ---------- ---------- ---------- ----------
      8242 g6wmnb1uq3hnh                0          4        143          1
      8242 8n44v1z19sc3u               19       1838   59248905        188
      8182 g6wmnb1uq3hnh                0          8        143          1
      8182 8n44v1z19sc3u               19       1835   59248886        188

4 rows selected.

From the query above we see that only two of the exectuions are still kept in the history table. This view identify the same trouble query as the trace did (sql_id=8n44v1z19sc3u). But we do not see the wait for the client (SQL*Net message from client) in the same way as we did in the trace.

Evaluate SQL code made by ORM APIs like Hibernate

When using ORM APIs like Hibernate, the developer might not be responsible for writing the actual SQLs that are ran towards the database. Using metric tag the developer could easily find the SQL’s associated with his/her code by quering the v$sql (after running the code towards a test database):

SQL> col my_sql_text for a52
SQL> select sql_id, child_number, substr(sql_text,1,50) my_sql_text 
from v$sql where module='demo' and action='cntObjects'

SQL_ID              CHILD_NUMBER MY_SQL_TEXT                                         
------------------- ------------ ----------------------------------------------------
8n44v1z19sc3u                  0 /* Count Objects */SELECT /*+ LEADING( b1 ) USE_NL  
8n44v1z19sc3u                  1 /* Count Objects */SELECT /*+ LEADING( b1 ) USE_NL  
g6wmnb1uq3hnh                  0 begin :1 := dbms_pickler.get_type_shape(:2,:3,:4,:  

3 rows selected.

This show something that was a little hidden in the java code – the call for dbms_pickler.get_type_shape.
This is generated by the array descriptor call:

ArrayDescriptor oracleCollection = ArrayDescriptor.createDescriptor("NUMBER_LIST_TYPE",conn);

Looking into the Oracle database looking at the generated code, the developers sometimes get surprised. You should always know what’s going on in the database. Don’t trust a third party developer. He or she might have missed something about how Oracle works, or some new feature in a patch or new version of Oracle. Or you just didn’t know your ORM API to well. Eighter way it’s not going to look good in a performance taskforce, when the Oracle expert points at your code.

Enough preaching … Then to find the actuall execution plans the developer could follow up with:

SQL> select * from table(dbms_xplan.display_cursor('8n44v1z19sc3u',null,'TYPICAL'));

PLAN_TABLE_OUTPUT                                                                                
-------------------------------------------------------------------------------------------------
SQL_ID  8n44v1z19sc3u, child number 0                                                            
-------------------------------------                                                            
/* Count Objects */SELECT /*+ LEADING( b1 ) USE_NL( b2 ) */                                      
distinct b1.object_id FROM big_table b1, big_table b2 WHERE                                      
b1.object_id=b2.object_id   AND b1.object_id IN (SELECT * FROM                                   
TABLE(CAST(:1 As NUMBER_LIST_TYPE)))                                                             
Plan hash value: 3119367398                                                                      
-------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |           |       |       |    16M(100)|          |
|   1 |  HASH UNIQUE                        |           |     1 |    12 |    16M  (1)| 54:51:06 |
|*  2 |   HASH JOIN                         |           |   200 |  2400 |    16M  (1)| 54:51:06 |
|   3 |    COLLECTION ITERATOR PICKLER FETCH|           |   200 |   400 |    29   (0)| 00:00:01 |
|   4 |    NESTED LOOPS                     |           | 64330 |   628K|    16M  (1)| 54:51:06 |
|   5 |     TABLE ACCESS FULL               | BIG_TABLE | 64330 |   314K|   258   (1)| 00:00:04 |
|*  6 |     TABLE ACCESS FULL               | BIG_TABLE |     1 |     5 |   256   (1)| 00:00:04 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):                                              
---------------------------------------------------                                              
   2 - access("B1"."OBJECT_ID"=VALUE(KOKBF$))                                                    
   6 - filter("B1"."OBJECT_ID"="B2"."OBJECT_ID")                                                 

SQL_ID  8n44v1z19sc3u, child number 1
-------------------------------------                                                            
/* Count Objects */SELECT /*+ LEADING( b1 ) USE_NL( b2 ) */                                      
distinct b1.object_id FROM big_table b1, big_table b2 WHERE                                      
b1.object_id=b2.object_id   AND b1.object_id IN (SELECT * FROM                                   
TABLE(CAST(:1 As NUMBER_LIST_TYPE)))                                                             
Plan hash value: 3119367398                                                                      
-------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |           |       |       |    16M(100)|          |
|   1 |  HASH UNIQUE                        |           |     1 |    12 |    16M  (1)| 54:51:06 |
|*  2 |   HASH JOIN                         |           |   200 |  2400 |    16M  (1)| 54:51:06 |
|   3 |    COLLECTION ITERATOR PICKLER FETCH|           |   200 |   400 |    29   (0)| 00:00:01 |
|   4 |    NESTED LOOPS                     |           | 64330 |   628K|    16M  (1)| 54:51:06 |
|   5 |     TABLE ACCESS FULL               | BIG_TABLE | 64330 |   314K|   258   (1)| 00:00:04 |
|*  6 |     TABLE ACCESS FULL               | BIG_TABLE |     1 |     5 |   256   (1)| 00:00:04 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):                                              
---------------------------------------------------                                              
   2 - access("B1"."OBJECT_ID"=VALUE(KOKBF$))                                                    
   6 - filter("B1"."OBJECT_ID"="B2"."OBJECT_ID")                                                 

54 rows selected.

Using “null” for the cursor_number gives you the execution plan for all the sql_id cursors in your library cache. Should the developer do this for every SQL in the code? Probably not. But the developer should have an idea of which SQLs could end up as an bottleneck, and investigate these SQLs. This could be SQLs where the result set is expected to be very small, but the query is – for instance – executed thousands of times every minute. Or this could be SQLs that is not ran that often (and therefore the related oracle blocks are not expected to be in cache), but the result set is relativly large. Having focus on performance these are issues the developers should consider.

Statistic Collection

The end-to-end metric tags also gives the possibility of collecting cumulative statistics based on the tag values. The statistic collection is enabled by the dbms_monitor package, the same way we enabled trace:

exec dbms_monitor.serv_mod_act_stat_enable(
      service_name =>'SYS$USERS',
      module_name =>'demo',
      action_name =>'cntObjects'
);

This is verified by running the following query:

SQL> col primary_id heading 'SERVICE' for a15
SQL> col qualifier_id1 heading 'MODULE' for a15
SQL> col qualifier_id2 heading 'ACTION' for a15
SQL> select *
from SYS.DBA_ENABLED_AGGREGATIONS

AGGREGATION_TYPE                SERVICE         MODULE          ACTION         
------------------------------- --------------- --------------- ---------------
SERVICE_MODULE_ACTION           SYS$USERS       demo            cntObjects     

1 row selected.

If I start the demo code once more, we can follow the aggregation of the enabled statistics:

--  Before started
SQL> select aggregation_type, stat_name, value 
from V$SERV_MOD_ACT_STATS 
where module='demo' and action='cntObjects'
  and stat_name in (
       'physical reads',
       'session logical reads',
       'user calls',
       'DB time',
       'user I/O wait time',
       'application wait time')

AGGREGATION_TYPE                STAT_NAME                           VALUE
------------------------------- ------------------------------ ----------
SERVICE_MODULE_ACTION           user calls                              0
SERVICE_MODULE_ACTION           DB time                                 0
SERVICE_MODULE_ACTION           session logical reads                   0
SERVICE_MODULE_ACTION           physical reads                          0
SERVICE_MODULE_ACTION           application wait time                   0
SERVICE_MODULE_ACTION           user I/O wait time                      0

-- Then we start 

SQL> select ...

AGGREGATION_TYPE                STAT_NAME                           VALUE
------------------------------- ------------------------------ ----------
SERVICE_MODULE_ACTION           user calls                              2
SERVICE_MODULE_ACTION           DB time                           3574294
SERVICE_MODULE_ACTION           session logical reads             1046762
SERVICE_MODULE_ACTION           physical reads                          0
SERVICE_MODULE_ACTION           application wait time                   0
SERVICE_MODULE_ACTION           user I/O wait time                     30

-- still running

SQL> select ...

AGGREGATION_TYPE                STAT_NAME                           VALUE
------------------------------- ------------------------------ ----------
SERVICE_MODULE_ACTION           user calls                              2
SERVICE_MODULE_ACTION           DB time                          31589296
SERVICE_MODULE_ACTION           session logical reads             9164243
SERVICE_MODULE_ACTION           physical reads                          0
SERVICE_MODULE_ACTION           application wait time                   0
SERVICE_MODULE_ACTION           user I/O wait time                     30

-- Still running

SQL> select ...

AGGREGATION_TYPE                STAT_NAME                           VALUE
------------------------------- ------------------------------ ----------
SERVICE_MODULE_ACTION           user calls                              2
SERVICE_MODULE_ACTION           DB time                         167657699
SERVICE_MODULE_ACTION           session logical reads            48438332
SERVICE_MODULE_ACTION           physical reads                          0
SERVICE_MODULE_ACTION           application wait time                   0
SERVICE_MODULE_ACTION           user I/O wait time                     30

-- Finished

SQL> select ...;

AGGREGATION_TYPE                STAT_NAME                           VALUE
------------------------------- ------------------------------ ----------
SERVICE_MODULE_ACTION           user calls                             21
SERVICE_MODULE_ACTION           DB time                         204981329
SERVICE_MODULE_ACTION           session logical reads            59248871
SERVICE_MODULE_ACTION           physical reads                          0
SERVICE_MODULE_ACTION           application wait time                   0
SERVICE_MODULE_ACTION           user I/O wait time                     30

For a long time we saw the “user calls” was left on 2. This was the parse and execution calls. The next 19 user calls was the client fetches. I did hope I would see some “application wait time” tied to the wait on “SQL*Net message from client”, but I clearly didn’t see any such wait. This I’ve to investigate furter.

Summary
End-to-end metrics are really a bridge between the developer and the DBA. Using these metrics really makes it easier for the DBA to do real-time monitoring of application code. I believe that Oracle resources (DBAs or not) should be seated next to the developers. Both roles have a lot to learn from each other, and could really work together to help generating one of the most important feature of an application: PERFORMANCE. It’s definitly a feature, and does not happen by accident!

In addition, one way or another, Murphy’s law will show up in your system. Then you better have some instrumentation ready in your application, making it easy to find your bottlenecks. Most probably they are generated by your code – one way or another. Then I really hope you have your end-to-end metric tags available!

Post a Comment

Your email is never published nor shared. Required fields are marked *