Howdy, and welcome to our first guest post by Petri Kainulainen!
Petri is passionate about software development and continuous improvement. He is specialized in software development with the Spring Framework and is the author of a Spring Data book. He is writing a lot about Java, Spring and related technologies over at http://www.petrikainulainen.net so go and check it out!
In this post Petri is discussing the 5 major issues when building web applications using Spring. Your mileage may vary, so feel free to share your thoughts in the comment section.
Spring Framework has an excellent support for writing web applications. It has a battle tested MVC framework, and Spring Boot makes developers more productive than ever before.
This sounds like a match made in heaven, but not every cloud has a silver lining. Spring web applications have a reputation for being monolithic beasts that are hard to maintain and understand. Sadly, this reputation is somewhat correct.
This blog post introduces the five biggest mistakes that we can make when we are designing the architecture of a Spring web application. We will also learn how we can avoid making these mistakes.
Let’s get started.
1 - We Don’t Modularize Our Applications
I have seen my share of architecture diagrams and other design documents which states that our application is divided into X modules and describes the responsibilities of each module. The problem is that even though these documents were written in good faith, they don’t necessarily describe the current status of our codebase. Sadly, often the truth is that our application doesn’t have proper modules. Instead it is just a big ball of mud.
I think that there is one major reason why Spring web applications end up like this:
The traditional layered approach suggests that the layers should be viewed as physical layers which are visible in the package structure of our application.
In other words, our application uses a technical package structure (i.e.
Sometimes these packages are called modules, but that doesn’t remove the problems that are caused by this approach.
These problems include:
Every class must be public. Because our classes are used by classes that are not found in the same package as the used class, we cannot restrict the visibility of our classes. This means that we cannot declare a proper public API, and if we don’t use a pedantic (and heavy) code review process, the odds are that sooner or later we find a way to use our classes in a less desirable way.
Our code has circular dependencies. Typically these circular dependencies are found from the service layer because of two reasons:
- Services consume other services.
- Our service layer is essentially a big module which gives us the illusion that we use them less carefully.
I think that circular dependencies are a code smell that should be avoided because they cause many problems and force us to use field injection (or setter injection) for mandatory dependencies instead of constructor injection.
Our entities form a “huge net” that contains our entire database. This means that we have to be very careful with eager fetching because an innocent looking query can launch a monstrous amount of SQL queries. Needless to say, this will destroy the performance of our application.
However, this solution has another and a bit more devious downside: because we can navigate from one entity to another entity and we cannot restrict access to any of them, the odds are that our service classes will take advantage of this. Unfortunately this solution has a heavy price:
- We should use functional package hierarchy instead of the technical package hierarchy. Olivier Gierke has written an excellent blog post that describes how we can do this.
- We should use aggregates. An aggregate is a cluster of entities that are treated as a single unit, and each entity of an aggregate can only be accessed through the aggregate root object. We can get started by reading a blog post titled: Designing and Storing Aggregates in Domain-Driven Design.
2 - We Select Entities Inside a Read-Only Trasnaction
There is one quite common “pattern” that is found within Spring web applications. This pattern has three steps:
- We get information from the database.
- We transform the returned entity into a DTO.
- We return the created DTO.
This is might be valid approach if we need to get every property value (or most of them) of the returned entity OR we are inside a read-write transaction.
However, if we need only a few property values AND we are inside a read-only transaction, this is an awful solution because
- We will select more information than we need.
- We might run into the N+1 selects problem.
- We cannot use the index-only scan.
In other words, querying entities has a negative effect on the performance on our web application. If you want to get more information about these problems, you should read the following blog posts:
- The Vietnam of Computer Science (check out the section: The Partial-Object Problem and the Load-Time Paradox)
- Nested Loops, ORMs and the N+1 Problem in ORMs
- The Top Two Problems Caused By ORM Tools
It is possible that developers query entities because they don’t care about the performance effects of this approach. However, I claim that developers query entities inside a read-only transaction because it is easy to do so. They think that taking a performance hit is acceptable because it allows them to write more code in the same time.
Sadly, this is not the truth. If we use the right tools, querying DTOs is dead simple. For example, we can:
- Create our database queries by using JbdcTemplate and map the query results by using BeanPropertyRowMapper.
- Create our database queries by using jOOQ.
3 - We Love Reinventing the Wheel
During the last few years principles like KISS (Keep it simple, stupid) and YAGNI (You aren’t gonna need it) have become really popular I think that main reason for this is that developers are tired of writing over engineered enterprise applications. They want to write simpler code that is easier to read and maintain.
That is an honorable goal. However, sometimes libraries or gasp frameworks can help us to reach that goal.
We have to implement a scheduled job that fetches information from an external API, makes some modifications to it, and saves it to a relational database. Because the requirements are fairly simple, we decide to write a simple scheduled job that fulfils these requirements.
We deploy the job to the production environment and after it has been running for a while, our customer has a few change requests:
- Sometimes the scheduled job fails because a timeout occurs when it reads information from the external API. If this happens, our scheduled job must try to read the information again from the external API.
- If any error occurs (an exception is thrown), our scheduled job fails and nothing is written to the database. Because some errors are recoverable, our customer wants that our scheduled job fails only if an irrecoverable error occurs
- Our customer wants that his employees can monitor the scheduled job. They want to monitor the progress of the scheduled job, track the occurred errors, and get the end status of the job (success, failure). Because we are in a hurry, and our customers needs these changes ASAP, we decided to make the required changes to the existing job. In other words, we had to basically rewrite it.
After we have finished these changes and deployed them to the production environment, our customer complains that the scheduled job is too slow. We were able to make our scheduled a lot faster, but we had to make drastic changes to it.
The end result is that the code of the scheduled job is really messy and no one wants to touch it.
What went wrong?
When we decided to write the scheduled job, we made the right call because it was the simplest thing that could possible work. However, when our customer asked us to make extra changes to it, we made the wrong call. We should have used Spring Batch that provides the requested features (and many more) out of the box.
We must understand that architecture design is a continuous process. If our requirements change in a radical way, we must be ready to re-evaluate our decisions.
4 - We Create Reports From the Database Tables That Store the Data of the Application
One very common situation is that no one pays real attention to reporting until it is time to release the application. Because it is too late to create all reports, typically we have to implement only a few critical reports ASAP (the rest are done later).
Because the pressure is high, and we REALLY need to get these reports out before the application is realized, we do the simplest thing that could possible work. We create these critical reports by selecting data from the database tables that store the data of our application.
This approach has three problems:
- Our database is (hopefully) normalized. Reports provide data in a tabular format that is denormalized and typically requires data from multiple database tables. If we create these reports from the application’s database and we query entities, we are doomed. On the other hand, if we use SQL and select only the columns we need, our reports are not as fast as they could be.
- Our database doesn’t necessarily store history information (like the price of the product X in the day Y). If our reports require this kind of information (and often they do) and we create these reports from the application’s database, our only option is to add this information to the database. The problem is that our code becomes a lot more complex only because we have to preserve the history information that our application doesn’t need.
- If we need to do some calculations before the fetched data is shown the user AND we query entities, we have are doomed. We have to make these calculations in our code, and this is slower than making them in the database. If we use SQL, we might be OK, but our query might be slower than it could be (this depends on the calculation).
It would be easy to claim that creating reports from the application’s database is an anti-pattern that should be avoided at all costs. Also, it would be a really bad advice. Although creating reports from the application’s database is a bad choice if the customer has complex reporting requirements, it might be a decent choice in certain situations.
The correct choice depends on the customer’s requirements, budget, schedule, and the number of available developers.
- If our customer has very complex requirements and a high budget, we should consider creating a proper reporting database.
- If our customer has simple requirements or a low budget, we probably cannot create a proper database, but we can either use views or create denormalized reporting tables that are populated by triggers.
- If we are in a hurry, and we don’t have “enough” developers, we should do the simplest thing that could possible work, but we should ensure that our customer understands that we will have to rewrite the reporting logic later.
When we implement our reporting solution, we should remember these “rules”:
- We should always use SQL. If we use the right tools, it doesn’t require any more work than querying entities, and it helps us to avoid performance problems.
- We should do the calculations and other pre-processing required by our reports in the database. This helps us to avoid performance problems because all the heavy lifting is done when the data is presented to the user.
- We should never add history information to the database tables that are used by our application only because our reports require it. If that information isn’t used by our application’s business logic, we should store it to the denormalized reporting tables or move it to the reporting database.
5 - We Treat Error Handling as a Second Class Citizen
It is (sadly) quite common that Spring web applications handle errors by following these rules:
- If the application is a regular web application, it renders an error page that has a general error message.
- If the application has a REST API, it simply returns the HTTP status code 500. This means that the client cannot show a proper error message to the user. It has to show a general error message.
- The developers aren’t notified about the error in any way.
The benefit of this approach is that it is very easy to implement. However, it has some major flaws:
- A user doesn’t want to see a general error message. He or she wants to know what went wrong. If the user of our application doesn’t get this information, they might become pissed off and simply don’t trust our application anymore. I know from experience that users tell jokes about these general error messages. Do we really want that they think this way?
- If we are implementing a REST API that is consumed by other programs, the developers of these programs lose faith in our API because it fails without giving a proper error message. This can have a negative effect to our reputation and cause permanent damage to our business, because our customers might choose to use our competitor who provides better error messages.
- Because the developers aren’t notified about the error when it happens, figuring out the root cause is either impossible or requires a lot of work because the only way to find the root cause is to investigate the log files. This is a very boring job and that why developers don’t like to do it. When you think about it, it is ironic that the developers did this to themselves.
We can avoid this mistake by following these rules:
- We shouldn’t show a general error message to the user UNLESS we don’t know what went wrong. We shouldn’t reveal the technical details of the error to the users, but we should definitely let them know what went wrong. Also, if our application supports multiple languages, we should translate these error messages as well.
- If we are implementing a REST API, we should learn how to use HTTP status codes and always return an error message that describes what went wrong. Also, if our REST API supports multiple languages AND it is consumed by a single page web application, we should translate these error messages because they might be shown to the users of that application.
- We should always notify the developers about the error. This notification should contain the stack trace, the date and time of the error, and identify that action that triggered this error. This saves us a lot of time, and the best part about it is that we don’t have to spend so much time reading log files. We can send these error reports by using our logging library (most popular libraries have email appenders), or we can use external tools such as New Relic.
There are many other mistakes that we can make when we are implementing a web application that uses Spring Framework. I selected these mistakes because some of them have caused me serious pain in the past, and some of them are causing me serious pain right now. In other words, the list is biased.
Do you think that I forgot something?