Oracle and Hibernate: Not always best friends

I’m definitely not a Java or Hibernate expert. But I have worked with Oracle for the last 15 years since version 7.3.4. I’ve also spend the last 10 years focusing and working mainly with performance issues within Oracle databases or within the code accessing Oracle databases. Lately many of the projects I’ve been working on have been using Java, Spring and Hibernate. I’m definitely not an opponent towards the use of Hibernate. I definitely see how such API’s lowers the cost and ease of code when it comes to database activities. But I also think it’s important to know the limitations of the Hibernate API when it comes to performance. In this article I’ll explain some of my views on the topic. Please bear in mind that I’m not a Hibernate or Java expert. Be happy to comment and inform me if I’m mistaken in my thoughts.

When working in small projects with small applications for a small group of people you might afford to do the big mistakes. When working on enterprise solutions, where performance is very important and you really need the application to scale, you just can’t afford doing these mistakes.

Hibernate problem: Bind values and varying IN-lists
One of the issues I’ve seen is the way Hibernate handles bind values and varying IN-lists. In Oracle and an OLTP system binds are very important to make the application scale. In Java or JDBC this is the same as using a preparedStatement. First, what happens when we do not use binds:

Statement simple = conn.createStatement();
String mySQL1 = "UPDATE emp SET sal=1000 WHERE empno=7369";
String mySQL2 = "UPDATE emp SET sal=1920 WHERE empno=7499";

int res = simple.executeUpdate(mySQL1);

res = res + simple.executeUpdate(mySQL2);

System.out.println("Number of rows updated:" + res);

Oracle uses a hash of the SQL to identify the execution plan created in the library cache. When not using bind, the different literals used causes Oracle to hard parse every “new” statement. If we look at the library cache, this is what we’ll see:

SQL> select sql_text from v$sql s where lower(sql_text) like 'update emp%' and module like 'JDBC%'

SQL_TEXT
--------------------------------------------------------------------------------
UPDATE emp SET sal=1920 WHERE empno=7499
UPDATE emp SET sal=1000 WHERE empno=7369
</pre>

When heavy execution of such statements you will end up flushing the shared pool and you’ll get contention on the internal resources in Oracle (in this case the library cache latches). Such applications will never scale.
We could rewrite this using a prepared statement:

PreparedStatement stmt =
conn.prepareStatement("UPDATE emp SET sal=? WHERE empno=?");
stmt.setInt(1, 1500);
stmt.setInt(2, 7521);
res = stmt.executeUpdate();

stmt.setInt(1, 3570);
stmt.setInt(2, 7566);
res = res + stmt.executeUpdate();

System.out.println("Number of rows updated: " + res);

If we look in the library cache again we’ll find:

SQL> select sql_text from v$sql s where lower(sql_text) like 'update emp%' and module like 'JDBC%'

SQL_TEXT
--------------------------------------------------------------------------------
UPDATE emp SET sal=1920 WHERE empno=7499
UPDATE emp SET sal=1000 WHERE empno=7369
UPDATE emp SET sal=:1 WHERE empno=:2

Now it doesn’t matter how many times we run the update. As long as it’s not aged out and flushed from the library cache, Oracle will make a hit in the cache and reuse the same execution plan. This is a soft parse (instead of a hard parse).

So what is the problem with Hibernate? Doesn’t Hibernate use bind variables? Yes it does. But one problem area when dealing with binds is when using varying IN-lists. For instance – you might need to run queries with different number of items in the IN-list. For example – take a look at these two queries:

SELECT * FROM EMP WHERE EMPNO IN (1000, 1010, 1020);
SELECT * FROM EMP WHERE EMPNO IN (1000, 1010, 1020, 1030, 1040);

Using binds this will become:

SELECT * FROM EMP WHERE EMPNO IN (:1, :2, :3);
SELECT * FROM EMP WHERE EMPNO IN (:1, :2, :3, :4, :5);

This is actually how Hibernate solves varying IN-lists. Yes it does use binds. But if the queries uses different number of items in the list, you’ll end up hard parsing for every variation regarding number of items in the IN-list. If these queries are run very often you might end up with contention on the library cache latches and a application that just don’t scale. This IN-list problem can be solved using using different features in Oracle. Tom Kyte describes this on his blog.

Hibernate problem: Default behavior when using sequences
The default behavior when using sequences in INSERT and/or DELETE is to run the following two statements:

SELECT my_seq.nextval FROM dual;
INSERT INTO mytable VALUES(:my_seq,....);

This might not look so bad. But having to run two statements towards the Oracle database (even thought this is a soft parse) is not really very effective. When running thousands of inserts and updates this really becomes a matter. It is possible in Hibernate to use the RETURNING INTO clause, but not being default it’s not very often used:

INSERT INTO mytable(col1,...) VALUES(myseq.nextval,....) RETURNING col1 INTO :b1;

Here :B1 is an out variable transporting the next sequencevalue inserted into the table.

Hibernate problems: Doing batch jobs
This summer I worked with a project experiencing bad performance in a batch job. The job loaded some data (with SQL Loader) into a staging table, and then started processing the data row-by-row. The code responsible for the processing was some java code running on an application server on the same network. This immediately introduces some network latency. When doing several inserts and updates, hibernate was caching up some data, and inserting/updating 50 rows at a time. Processing thousands of rows this still leave some network latency, that really could be avoided. It also turned out that when using a sequence for the insert, Hibernate did not cache anything, and there was a network round-trip for every call for a new sequence number. The default behavior in Hibernate is to query towards the DUAL table (See example above).

I would definitely suggest using PL/SQL code for this processing instead of doing the processing somewhere on the network. In this case there was a strategic decision made by the solution architects that PLSQL should not be used. The arguments was to keep the solution database independent. This was an enterprise solution, with a lot of logic written into the java code. A re-write of the code from Java to PL/SQL would introduce very high costs and take a very long time. The developers made a rough estimate of 1 year for 2-3 developers. Bad design decisions can end up being very disastrous for a project and a system. In this case we managed to do some changes in the java code, and to do some changes in the database giving the needed reduction in the batch times. But because of the design choices the solution is not as scalable as it could have been.

Post a Comment

Your email is never published nor shared. Required fields are marked *