On Integration: Why I enjoy working with databases

October 18, 2006

Status: This article is currently pretty dry. I’d like feedback on how to make it more eloquent.

In my previous blog post, I promised to write more about using databases as the main integration strategy. In the current post, I plan to cover maybe the most important question: “Why?”

Imagine an application where every time it wants to communicate with another system, it reads or writes to the database. For now, let’s ignore how this would work, and how it would evolve, which will be the subject of later posts. What advantages does this offer?

The alternative is usually to integrate with another system though a variety of means. In Java, the most common ones are Web Services, RMI, EJBs (which offers it own quirks in addition to those of RMI), Sockets, and various tricks using the file system.

The most important issue to me is invariably productivity. When I work with databases, I generally can use Object-Relation Mapping tools. This is a very productive way of accessing database data in an application. RMI offers similar advantages, but you will have to build lazy loading on top of the domain model if you want to have a rich model where the objects are interconnected. Web Services generally have some bindings to Java, but in my experience, these are really inadequate. Either the Java side suffers, for example by forcing you to have getters and setters, by forcing you to use arrays instead of collections, or by forcing you to use strings as the main data type. Alternatively, the XML-side suffers by having non-specific types (if you use collections). Sockets, of course are very unproductive. They give up productivity for simplicity.

The data that is managed by the remote service generally will come from a database anyway. This means that the data access code will be have to be developed somewhere anyway. A remoting layer will have to be developed in addition.

To maintain sustainable productivity, we need unit tests. Unit testing has for me proved to be hard to do well for both Web Services and RMI, and EJBs are of course out of the question. As my regular readers know, using a test database for standalone unit testing is quite simple. As an added bonus, tests that use the database will essentially have verified the integration. When I use a remoting protocol, I always run into strange problems very late in the test process.

Both unit testing and productivity benefits from the fact that dealing with databases is something we’ve done for a long time. The tools and techniques for doing so are very mature, compared to other methods of integration.

Secondly, there is the problem of reliability. If you use a single database, everything you do is within one transaction. Either all work will be committed, or it will be rolled back. This vastly simplifies your logic if you care about your correctness. For distributed systems, this will in theory be solved by the 2-phase commit protocol. However, my experience is that this adds so much complexity to a solution that the system can metaphorically collapse under its own weight. As a result, most solutions I’ve seen (and, I suspect, most solutions I haven’t) simply ignore this problem. This means that the odd resource error that occurs might very well have very unpredictable results.

A remote layer will also introduce another place where things can go wrong. Many developers end up coding recovery rutines for dealing with these kinds of errors. In my experience, this is some of the most error prone code you can write.

Third, performance-wise it is hard to beat the database. Most other methods will eventually hit the database anyway, and as a general rule, adding more steps to a solution seldom makes it faster. There are some issues with scalabilitity, however, that I will address in a later post.

Last, and maybe most importantly, I have never seen a standard interface for dealing with remote services. Solutions generally end up having half-a-dozen or more different policies for accessing different back end systems. There is one thing we will always be sure of, though: There’ll always be a database among these backend systems, no matter what else you have to talk to. Every extra communication mechanism you remove will reduce the shoestring-and-paperclip-factor of your system.

By using a single data source as the place for communicating with other systems, we will reduce complexity and improve testability, performance and reliabilty.

I hope that in this post, I have demonstrated why, in an ideal world, you would want to use a single database as your primary integration mechanism. However, the world is rarely ideal. Database schemas change, more load is added than what a single database can tackle, you have to understand a forest of database schemas, some applications should not be allowed to access all the data. In my next blog post, I will talk about how to solve these problems with database without giving up the single database vision. Stay tuned for evolution, scalability, security, reuse, and understandability.

Comments:

chwlund - Oct 19, 2006

puh! I had written a long comment to this post and then suddenly the math failed and I lost mye whole comment… so this will be some kind of sum up:

first I would like to say that I really enjoy reading your blog, exiting and insightful ideas that seems to be based on many many years of experience. really worth listen to!

but I really cant wait for the “how”-part of this serie, so here are some questions:

-how is it possible to trigger a business function in another application using your shared database strategy? I guess that all applications have access to all the data in the database, but what if an application A want another application B to do some calculations on a specific set of data. how do application A trigger this calculcation in app B? (I presume that you want to avoid the nightmare of stored procedures)

-how can you avoid that the applications get tight coupled? when the apps share and communicate trough a shared database then they have to know and use the inner data structures of the other applications. this will make them strongly coupled. if you change the structures of one application this might affect three others… isnt this like really entering the world of spaghetti?

-will it not be hard to change the technology that one of these applications are based upon? when the database layer are merged together and there are no explicit system borders in the database layers, how can you manage to replace the technology… isnt the system borders in a database just to vague and implicit? isnt it much better to move the system borders for instance up in a service layer?

my philosophy is that applications that solves different problems and that actually explicitly are separate applications should (if they have to communicate at all) know as little about each other as absolutely possible! and I dont see how you achieve this goal with the shared database strategy!

Johannes Brodwall - Oct 19, 2006

Too bad about you losing you comment. I hate it when that happens.

The issues you take up are important, and I will let the feedback decide what I write about next. Specifically:

I say: “The world is flat”. You say “a large flat world is confusing”. This is true. The world should not be flat, but from certain viewpoints, it should appear flat. I’ll talk more about this later. This is how I want to address complexity and coupling.
I had not originally thought about writing about triggering business functionality, but I’ll incude our current work on this in a future post. (This was actually what triggered the whole Single Database Vision idea for me) And, like you, I am not a proponent of Stored procedures