During quite a few projects I’ve ran into issues with the performance of fetching collections in Hibernate. In most cases these performance problems could be fixed by switching from the default fetching strategy to a more suitable alternative. To explain the different ways of fetching collections I’ve created an ‘explained by example’ guide. For those not familiar with the subject it could provide a good place to start tweaking Hibernate performance.
The way Hibernate fetches data when accessing a collection is highly customizable. Most commonly a decision is made to either load the collection when loading the entity (FetchType.EAGER) or to delay the loading until the collection is used (FetchType.LAZY).
It is also possible to specify how the data should be fetched by setting the Hibernate FetchMode. This will customize the amount of queries generated and how much data will be retrieved. Therefore it might be an important tool to optimize your Hibernate application.
A customer has zero or more invoices and each invoice has a total amount. This example will retrieve all customers, get their invoices, and calculate the total amount over all customers.
You can explicitly set the Hibernate FetchMode in the Customer class by annotating the invoices collection:
Running the example gives the following output:
The Hibernate FetchMode SELECT generates a query for each Invoice collection loaded. In total that gives 1 query to load the Customers and 50 additional queries to load the Invoice collections.
This behavior is commonly named the N + 1 select problem. Executing 1 query will trigger N additional queries, where N is the amount of results returned from the first query.
SELECT has an optional configuration annotation called BatchSize:
This changes the output to:
In this example BatchSize reduces the total amount of queries to 3. One to load the Customers and two additional queries to load the Invoice collections for all customers. To put it simple: When an Invoice collection is loaded for a specific Customer, Hibernate will try to load the Invoice collection for up to 25 additional Customer entities which are currently in the session. The example has 50 Customers so loading the 50 collections of Invoices takes 2 queries.
Hibernate FetchMode JOIN tries to load the Customers and the Invoice collections in one query
The amount of queries has been reduced to 1 by joining the Customers and Invoice collections. FetchMode JOIN always triggers an EAGER load so the Invoices are loaded when the Customers are. But when we look at the result there seems to be something wrong. There where 71 Customers found and the total amount seems to be different as well. This can be explained by the fact that FetchMode JOIN returns duplicate results when an entity has more then one record in the joined collection! In this example the Customer named Sylvia Ringer has 3 Invoices so she is included 3 times in the result. You’ll have to remove the duplicates yourself (e.g. storing the result in a Set).
The final Hibernate FetchMode available is the SUBSELECT
A SUBSELECT generates one query to load the Customers and one additional query to fetch all the Invoice collections. It is important to notice that all Invoices are loaded for which there is a corresponding Customer in the database. So even Invoice collections for who there are no matching Customers in the session will be retrieved.
Which FetchMode to use depends heavily on the application, environment and typical usage. The following guideline should be seen as a rough indication of where to start. Try to play with the setting to see what works best in your application / environment:
Use this when you want a quick response time when working on a single entity. SELECT creates small queries and only fetches the data which is absolutely needed. The use-case in our example could be an application to which displays one Customer with its Invoices.
BatchSize is useful when working with a fixed set of data. When you have a batch processing 10 Customers at a time a BatchSize of 10 will drastically reduced the number of queries needed.If the BatchSize is not set too high the query will most likely return a manageable amount of data in a reasonable time.
As indicated you’ll have to worry about duplicated results. On the other hand JOIN creates the least amount of queries. In a high latency environment a single JOIN could be considerable faster then multiple SELECTS. Keep in mind that joining too much data could put a strain on the database.
If you’ve got an entity of which you know that there aren’t that many of them, and almost all of them are in the session, then SUBSELECT should be a good choice. Just keep in mind that all collections are fetched, even if the parent is not in the session. A SUBSELECT when having a single Customer in session while there are 1000+ in the database will be wasteful.