Hitchhiker's Guide to Software Architecture and Everything Else - by Michael Stal

Sunday, February 01, 2004

The Secrets of Building Efficient Distributed Systems

The recipe is so easy. Just take a book on RMI, CORBA, or .NET Remoting. Add a little bit of EJB, OSGi, COM+ or CCM. That's all. Sorry, one ingredient is still missing. You shouldn't forget to add XML Web services. It's cool and your boss is simply expecting it from you. Then mix everything together. This is all it takes to build a distributed system. At least this is what many books and articles make you believe. Unfortunately, I've seen a lot of systems that were exactly built this way. And it is not the developer's fault in most cases. Vendors and standard organizations primarily focus on transparency isues. For example consider the fact that CORBA hides all system details from you. Hence, it should be no difference whether you are going to build a conventional system or a distributed architecture. This is no forgery but a serious pitfall and trap. In distributed systems there is no central control as in conventional systems. Operational requirements such as scalability or response times are much more difficult to meet. The same holds for non-functional requirements such as flexibility (adaptibility, extensibility, removeability, ...) or security. These cross-cutting concerns can not be centralized since they impact multiple tiers and maybe multiple layers within these tiers. Take security as an example. If you need to connect an external component or service B with your own infrastructure A you must assure that the foreign infrastructure B allows you to use a secure protocol and implements authentication functionality you are willing to trust. Secure communication and authentication rely on the fact that all parties participate in providing them. Take scalability is another example. Sure, scale-up activities are contrained to a single node. However, to scale out, multiple nodes must cooperate together to provide the same service. This might be easy for stateless services but turns out to be much more challenging for stateful services. In addition, the implementation of scale-out functionality such as load-balancing clusters might influence other parts of your architecture. If you make a system flexible this could decrease performance, and vice versa. In some cases it could even help you to increase performance and flexibility at the same time. Take the strategy pattern as an example and consider the capability of a system to use the most efficient algorithm in every runtime context. Here flexibility and performance are two sides of the same coin. In general, there are a lot of such requirements. Some of them can be considered as a whole unit while others might be more contradicting. The same two requirements can be contradicting in one context and enforcing each other in different contexts. Thus, priorities and contexts are an important issue in order to decide between different tradeoffs in your architectures. Things can get quite complex here as you can see. That is the main reason why we still need a methodology how to meet operational and non-functional requirements. The question is if there can be one central approach for all requirements or if we need to partition requirements into groups for each of which can come up with an own methodology. Efforts such as ATAM are very interesting here but this can only be considered as nice start. In the mean time, you as architects and developers should spend extra consideration and efforts for these issues. Don't focus on functional aspects only but keep in mind that non-functional and operational requirements are the major cause of trouble when implementing distributed systems today.