Well, well, well. Statement batching. One of my favorite topics when it comes to Hibernate. I’ve seen so many applications doing this wrong. Almost everybody thinks using statement batching in Hibernate is pretty much a property switch.
hibernate.jdbc.batch_size=50
That’s it right? Magically everything will work.
Searching for the truth
A lot of devs wouldn’t even check whether batching is working after the property change but even if some do, they don’t know how to define “working”.
The initial thought is to check how many statements have been sent out to the database. One could check the logs for that using the org.hibernate.sql
logger.
In a Spring app, putting the following line in the application.properties
will do exactly that.
logging.level.org.hibernate.SQL=DEBUG
A simple test case for batching could look the following.
txRunner.executeInTransaction(entityManager -> { Product p1 = new Product("p1"); Product p2 = new Product("p2"); Product p3 = new Product("p3"); Product p4 = new Product("p4"); Product p5 = new Product("p5"); entityManager.persist(p1); entityManager.persist(p2); entityManager.persist(p3); entityManager.persist(p4); entityManager.persist(p5); });
The log would look like this in case of batching disabled:
2021-10-13 10:59:55.119 DEBUG 18004 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 10:59:55.174 DEBUG 18004 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 10:59:55.178 DEBUG 18004 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 10:59:55.179 DEBUG 18004 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 10:59:55.179 DEBUG 18004 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?)
Now let’s turn on statement batching with a batch size of 5. Modifying the application.properties
.
spring.jpa.properties.hibernate.jdbc.batch_size=5 logging.level.org.hibernate.SQL=DEBUG
Restarting the test would produce the following logs:
2021-10-13 11:04:57.705 DEBUG 6012 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 11:04:57.730 DEBUG 6012 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 11:04:57.731 DEBUG 6012 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 11:04:57.731 DEBUG 6012 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?) 2021-10-13 11:04:57.731 DEBUG 6012 --- [Test worker] org.hibernate.SQL: insert into products (name, id) values (?, ?)
Ehm. What?
Even though you can’t see it – the log is identical to the non-batched version – batching is working behind the scenes. It’s just not visible on Hibernate level because if you didn’t know, statement batching is a JDBC level optimization, meaning that Hibernate will not do the heavylifting to group the statements but will defer this to the underlying JDBC driver.
Question is, then how to verify? One way is to use a JDBC datasource proxy which will add logging to the right place.
I usually use the ttddyy/datasource-proxy implementation for this purpose. Easy to configure, no big deal. I’ve already written an article about it Configuring datasource-proxy in Spring Boot.
After configuration, the log will be a tiny bit different.
2021-10-13 11:22:53.185 INFO 9108 --- [Test worker] n.t.d.l.l.SLF4JQueryLoggingListener: Name:dataSource, Connection:4, Time:7, Success:True, Type:Prepared, Batch:True, QuerySize:1, BatchSize:5, Query:["insert into products (name, id) values (?, ?)"], Params:[(p1,[B@60486fc2),(p2,[B@45ab8e1e),(p3,[B@7a2a88cf),(p4,[B@694ef9a1),(p5,[B@4adbbd35)]
2 parameters are important in the log message. Batch: True
and BatchSize: 5
. That clears up the air. Statement batching is working.
That’s settled. In this article I won’t go into details for the exotic cases when you have versioned entities or when you need to do statement reordering or anything like that.
The impact of naive batching
What if you have to insert more than 5 entities in a single transaction? How about a million records?
txRunner.executeInTransaction(entityManager -> { for (int i = 1; i <= 1_000_000; i++) { Product p = new Product("p" + i); entityManager.persist(p); } });
Batching in itself will work fine, but what happens to the application?
This is a memory consumption chart taken from JVisualVM while I started this test. Clearly, the heap gets filled up in the first few seconds and stays that way.
The CPU chart looks similar. It gets to the highest point in the first few seconds along with the GC activity then gradually decreases when the app is unable to allocate memory.
Eventually the application fails with
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Test worker"
Who would’ve thought?
That’s right. Just turning on batching and leaving the application as it is could be enough. Emphasis on the could. There are certain cases where there won’t be any issue, but as you can see, even in this simple case, there’s a much bigger problem which could cause service outage in case the application fails completely.
The solution
The solution is not that complicated though. The thing is, this happens because JPA stores the states of entities internally to make change recording possible. And that’s why the memory keeps growing and growing.
A proper implementation would produce a normal CPU and memory chart.
As you can see, there’s definitely a solution and I executed the same test scenario, persisting 1 million entities.
On the memory chart, you can clearly see the memory consumption increasing and when the GC kicks in, it frees up the memory.
txRunner.executeInTransaction(entityManager -> { for (int i = 1; i <= 1_000_000; i++) { if (i % 50 == 0) { entityManager.flush(); entityManager.clear(); } Product p = new Product("p" + i); entityManager.persist(p); } });
For further details, check out my Effective Hibernate with Spring course. I’m explaining every concept in detail when it comes to batching, connecting it back to the roots of entity state transitions.
If you liked the article, share it and make sure to follow me on Twitter and Facebook.