Friday, January 26, 2007

Against Web Services

I wrote this up last year, as we were just beginning a phase to re-architect a core piece of the system. New resources had just been brought into the team, and so lots of ideas were flying around, some very good, and some not so good.

One big idea being pushed was to change everything to run as Web Services, in an attempt to create a sort of SOA architecture. Now, in principle I certainly wouldn't be against trying out the next big architectural paradigm. But in this particular case, we were dealing with a 10-year-old legacy system, with much more fundamental issues to work out. So I was convinced that trying to throw a new architectural paradigm into the mix - especially one that was, at the time, really a very cutting edge, not yet industry-proven idea - seemed to really be deviating off the right path.

Why we shouldn't use Web Services in this phase of Re-Architecture:

I've been trying to avoid going in to a long drawn out discussion about these points, because, as you'll see when you get to the bottom of this, the answer should be pretty obvious to anyone who fully understands the problems we've faced in the past and are trying to address now. In fact, when an answer is this obvious, it is very hard to come up with a comprehensive argument for it, because most of the assumptions are just taken for granted. But I've tried to really sit down here and enumerate all of the important points to consider here, and to draw a reasoned conclusion from that.

I'm not really looking for a point-by-point debate on each of these points - for me, the real point is that there are clearly a number of debatable problems with the WebServices approach for our team, and so the conclusion is that the fact that there are so many enumerable problems here, and probably many more I haven't considered, should lead us to conclude that there's no reason to introduce so much risk into the project without some clear advantage.

I'll try to break my discussion up into two main categories: Support and Technical.

Support Reasons:

Because WebServices run in the IIS ASP.NET context, they are managed as part of IIS. This means that everyone on the development/testing team would have to become an IIS expert overnight. Given that I don't think we have a great breadth of experience with IIS, and especially with ASP.NET, this is a major learning curve for the team, and would require significant time for everyone to get trained up.

The need to ramp up on IIS with ASP.NET would extend to support people as well. We would have to convert our entire application support team into a web management team. This would mean significant restructuring of and turnover in the support team, probably having to replace or augment a good chunk of it. Currently our support people are experts primarily in AutoSys, and secondly in MQ - but from the user side only, not the system management side - because:

MQ is supported (and centrally hosted/managed) by the Middleware team. AutoSys is supported, hosted, and managed by the Scheduling Management team. The WebServices would need to be hosted on our own servers, and thus supported (at a system management level) by our own support team. There is no central managed hosting of WebServices, because with WebServices we are essentially combining both our communication and application management technologies with our actual application components, so that it is all running in one process on a single machine. The only support offered to help us with WebServices would come from the Web Support team - but clearly this is not our first-line support to guarantee reliability of the system - this would be a sort of second line support that we could call out specifically if we are diagnosing an IIS problem - similar to the kind of support we get when we are diagnosing a DB problem. But the DBAs certainly do not replace the role of our first-line support people in supporting our application - they perform tasks very specific to the DB. And even the DB is something hosted and managed centrally for us. Similarly, the Web Support team would not give us any kind of comprehensive management of IIS - they are just there to help troubleshoot when we have issues managing our own infrastructure. They are ceretainly not going to perform the level of monitoring and performance tuning on our own servers, that the middleware team would provide on the MQ servers. In this case, we are talking purely about taking ownership of all aspects of the system, and not taking advantage of any centralized management or hosting for any of it, other than managing the physical boxes. Thus, we now become responsible for (1) guaranteeing reliability of our communication infrastructure; (2) the application management infrastructure. With MQ and AutoSys, both of these are services provided to us by IT, by groups who are much more focused on the performance of those infrastructural pieces. We would essentially be giving up those benefits.

There is no out-of-the-box centralized process management solution for WebServices, in a way that would be analogous to AutoSys. Our support people clearly need a mechanism for centrally viewing and managing all services across all boxes (as do our testing and development people, really). It would also not be possible to use AutoSys in any meaningful way with WebServices, as a process management tool. So we would have to build something to meet this need for WebServices. Given the complexities introduced by being integrated with the IIS ASP.NET process (see more detail in my "Technical Reasons" comments), it would be very challenging to put together a framework/tool to manage the individual services being hosted in a WebServices envrionment.

There are too many risks and unknowns in the WebServices approach for our team right now. The focus of our current project should be making the application better, by taking the next evolutionary step in re-architecting the way we manage our systems, leveraging all of the collective experience we've gained and problems we've found with our previous approaches. The focus should certainly not be embarking out on an crusade into unknown territory to try out a new technology for the fun of it. Certainly, that could be a future phase. But do we really want to be spending so much of our time right now on learning this entirely new technology, dealing with it, supporting it, and preparing ourselves for the risk involved with all the unknowns? We now have extensive experience with MQ, and know exactly what is involved in running our current applications on MQ and porting further applications to MQ. However, we have absolutely no experience with WebServices, and we have not spent any time thinking about all of the unknowns that may come up with the development, testing, and support phases. Of course, there will be new ground that the application will cover for the first time, and there will be new support responsibilities - but shouldn't those be well-justified as providing something necessary? Here we are discussing introducing what would surely be the most risky and radically new change to our infrastructure, and there is no clear benefit to the problems we are specifically trying to address, which could not also be achieved by something much less risky and much more well known, such as MQ.

Technical Reasons:

There is no inherent guaranteed communication protocol when using Web Services - it is essentially like TCP in this regard - a method gets called directly, and if your server is not available (due to reboot, restart, crash, or even a temporary 10-millisecond network glitch), an error occurs, or the transaction may be lost in the middle if the server crashes, etc. It is, of course, possible to implement a strategy to overcome this limitation, with things like retries, re-requests, re-synchronization, custom heartbeats, etc. - however, this starts to look very similar to a lot of the inherent database problems we are trying to address now - think about all of the issues that have to be addressed in the DataLoader in order to provide a level of guaranteed processing to the database - there are similar concepts there, like retries. So, the point is, you can overcome these limitations with a lot of work, but why would you want to if you have another solution that avoids the problem altogether?

Everything runs in the IIS ASP.NET context. This introduces quite a few issues for the kind of services we run:- In order to restart one service, you typically have to restart all services on that server. For example, that means it would not be possible to restart the ServerA, without also restarting the ServerB and ServerC, assuming you run all those services on the same server. This would cause a major halt in the workflow if you are trying to restart something while you have processing underway. This is a common scenario for us in current systems (restarting a particular service for some reason), and I think we would be nieve to imagine this is going to magically go away.- A crash/corruption in one of our services, could potentially corrupt the state of all other running services. Again this would result in restarting of one service potentially necessitating a restart of all services.- To get around this problem, you would have to manage multiple instances for different services. This means an even more complex infrastructure to manage on each server.

Running in the ASP.NET context introduces a number of technical uncertainties in this company's environment. I personally have a small amount of experience with this, having built our existing web applications - and I would wager the rest of the team has absolutely no experience with this and thus no idea what may come up. To provide a few specific examples of where we have spent extensive time with our existing web apps working around these limitations:
  • Accessing databases requires a specific security context, meaning you either have to tweak the ASP.NET user context, or you have to implement custom impersonation. Our web app has extensive impersonation logic to deal with this issue, and I would not wish to have to deal with any of that in a component of this application, if it can be avoided.
  • Accessing MQ from a WebService would similarly have security issues, and similarly require impersonation, and potentially conflicting credentials with those needed for the database access.
  • Accessing other services such as remoting, windows management instrumentation, or remote processes requires further work involving complex machine configurations and low-level windows local policy settings manipulation.
  • Simple things like accessing certain files in certain locations (i.e D: drive shares, dynamic loading of dlls in different locations) can become problematic depending on permissions settings. In setting up our existing web applications on a new machine, a fair amount of time is usually dedicated just to resolving some of the dynamic plugin DLLs and dependencies, as loading them in the ASP.NET context is more problematic than loading them in a normal .exe process context.
  • Differences between Win2K and Win2003 and between workstation and server (in terms of ASP.NET permissioning models - the two systems don't even use the same users!) mean that things developed on our local machines typically have numerous problems when we port them to servers. As an example, WebBatch still does not fully work on any Win2003 server because we haven't been able to address some of the policies and permissioning issues for certain pieces of functionality - and we've spent several days on that problem alone.
So I don't mean to say I anticipate having all of these exact same issues again, but just to point out the kind of uncertainties that pop up when you are dealing with the complexity of a managed permissioning context model like IIS and ASP.NET. It's not like normal .NET applications where you just xcopy to deploy and then double click to run, and there are hardly ever issues. It's back to a model like VB where a significant number of tweaks are necessary just to get components installed correctly and up and running on a new system, and the things you have to do are different between our win2k dev workstations and the win2003 servers. Who knows if these problems will creep up when we are trying to integrate with our databases, or any other random integration point we haven't considered? We can't possibly anticipate where we will have the big problematic areas when we try to deploy application components, but why would we possibly want to add this many unkowns to the project if it's not necessary?

WebServices in ASP.NET also introduce a much more complicated set of security issues. They normally run on the HTTP protocol, which is the most notoriously insecure, open, exploitable, and broadly-permissioned communications protocol. They also run on IIS, which is the most notoriously insecure, open and exploited web server. This is exactly the opposite of what we should be thinking of for an internal system with tightly coupled services, which should be communicating in a simple and secure manner. Clearly, this company has numerous levels of firewalls and security layers, but the fundamental point is still there - this is introducing an exploitable hole in the architecture where there was none before. Managing the security of each WebService on each individual server is going to become a very complex task, and will equate to a full-time job for another resource on the team. On the other hand, MQ is fully and simply secured, and the security is managed by the middleware team. I'm not that worried that we are at major risk of putting an exploitable system out there - but this would certainly be another on the long list of things we have to think about and plan for, and it's still not clear what the advantage would be.

There are significant barriers to taking a "generic" load balancing solution with WebServices and converting it to an "intelligent" load balancing solution, once we arrive at the need to do so. WebServices are very good at working with hardware balancers to balance load purely based on generic load decisions like cpu and memory utilization, etc. However, they are very bad at working with a more intelligent software load balancing scheme where you might want to have configurable control over exactly which request goes to a particular instance of a service on a particular machine. WebServices may be one standard out there for generically load balanced solutions, but they are certainly not a standard out there for intelligently balanced solutions, and I've never heard of anyone trying to implement any kind of intelligent balancing with a WebService. This is an area where MQ excels - a solution can be built initially for a generic scenario, and then adapted to an intelligent scenario later, and MQ works well in both.

There is no inherent transactional nature in WebServices, in the sense that we do transactions with MQ. We are able to use MQ transactions to synchronize a potentially complex (but quick) operation like receiving a request into the data loader via a message, begining a transaction to read that request off the queue, writing the data to the database (in a database transaction), and then completing the transaction on the queue in a way that is synchronized with the success of the database transaction - i.e. if the database transaction fails, roll back the transaction on the queue, or even roll back to the end of the queue or to a different queue to redirect to work somewhere else. This is exactly what we already do, for example with xml data loader in stress testing. Transactional mechanisms could certainly be implemented in a custom way on top of a raw WebService protocol, but they are not inherent in the technology, and so this is just one more thing we would have to worry about implementing (and implementing correctly, since it is such a fundamental piece of our component interoperation).
* We subsequently uncovered that there was some work being done on an experimental approach for Web Service enterprise transactions - but this was apparently still in a very experimental stage at the time.

WebServices do not offer us anything in terms of performance improvement over any other technology - we are using MQ successfully in other systems, and while we do have perf issues in those systems related to DB, dataloader, etc.; I certainly don't think we've ever noticed any significant performance issues resulting from MQ. We know that other teams have, but this is more due to improper usage, rather than any inherent MQ problem. Thus, the whole idea of using WebServices to address a performance problem we don't actually have should not really come into consideration - we know where the perf issues are with this application, and they are certainly not in any kind of communications layer. We shouldn't be trying to address problems that don't exist. The whole issue of SOAP performance vs. MQ performance is debatable in any case, but this should not be our guiding factor - it's like comparing using flat files vs. using adatabase based purely on performance aspects. These are two entirely different ways to address communication, and are designed to meet different needs. Assertions that WebServices should be applied to "anything you can think of" are incorrect - there is a place to use them, and there is a place not to use them.

WebServices is not the only "standard". The idea that WebServices is a "standard" or that it is built on "standards" is being thrown around as a single justification to use it. MQ is also a standard in this company, and also throughout the industry - actually it is a much more well established standard than web services, as it has been around much longer, and I'm sure that if you did a comprehensive survey in the company, you would find many more projects using MQ than using WebServices. In any case, the fact that something is a standard does not make it right to solve every problem. Again, each standard or technology has it's place, and there are also areas that are not good for some standards. The only right way to use the argument of something being a "standard" is to argue about it being the best standard to solve a *specific problem* - here we have not identified any specific problem that WebServices might be better than other standards at solving, and in fact, given all the evidence I've presented above, I think it's safe to say the opposite - WebServices, while being a nice standard, is not the right standard to solve the problems we are trying to solve.

WebServices is not the only way to implement load balancing, nor is it the best way. There are many ways to implement load balancing for MQ, Remoting, SmartSockets, or any other communcation protocol. One of the simplest possible ways is exactly what our other application are already doing with MQ - just sharing a single request queue across N instances - this results in an easy, no-hassle, and optimally efficient load balancing mechanism. Each server is guaranteed to pick up a new message when and only when they are freed up - so optimal load balancing is achieved by simply letting the individual clients decide on their own when to pull a message. This is not to say that the the model we have used for our other applications is the model that we must necessarily use for our own load balancing, but just to show that if our goal is either simplicity or optimal balancing, there are even better ways to achieve this than what would be offered with a WebServices / hardware load balancer solution - because that is already more complex. Additionally, if there are additional goals to accomplish with load balancing, that might be better solved by a hardware balancer, then as we have already discussed a hardware balancer can also be used with MQ, for example. There is nothing inherent about WebServices that should make it a special case for LoadBalancing that cannot be achieved as effectively with some other communication mechanism.

Summary: They don't offer anything special to us! So why the extra hassle? - if we have identified a number of support and technical issues that may or may not be addressable, but there is no clear advantage over any other communcation protocol, why are we even considering all of the additional risk to the project? There is most likely an answer to each point that has been brought up above - whether that is to extend the standard WebServices model with something additional for support, or to write code to implement a custom guaranteed protocol on top of the infrastructure, or to apply further tweaks to the design - but the point is, why would we want to spend one iota of our time or have any of the risk introduced by this new and unfamiliar territory, when there is no clear benefit other than using something new and cool?

As an additional point:

Other teams in this area of the company do not think this would be a good idea. I've been around to chat with the tech leads of a number of the major projects in this area, and while these projects do have diverse opinions about technologies and approaches, the consensus regarding WebServices is very consistent - it would not be appropriate for any kind of internal component-to-component communication, and should only be considered for external connection points between disparate systems. Even in the case of external connection points, it should not be considered as the only possibility, but rather as one of a number of possible solutions, depending on the nature of the connection. For example, the connection point between two of our internal sub-systems is probably more tightly coupled than would be appropriate for WebServices. But all teams agree - appropriate technologies for internal tightly-coupled connection points (i.e. component to component) inside a system could be MQ, SmartSockets, Remoting, or raw TCP - but certainly not WebServices. Here's a summary of teams I've spoken with, and the approaches they are using:
  • 3 projects using SmartSockets for internal communications
  • 2 projects using MQ for internal communications, and 5 projects using MQ for external communications
  • 1 project using DCOM for internal communications
  • 5 projects using a file-feed approach to external communications
  • 3 projects connecting to external systems directly through the DB, but with intentions to move to MQ
  • 2 projects using WebServices internally, and 1 using them externally
The interesting thing to note is that all tech leads were very consistent in their views on when it would be appropriate to use WebServices vs. when to use MQ, and all agreed that for the discussion of WebServices as a framework for components in our application, it really would not make any sense. This opinion was even shared among those teams that are using WebServices extensively. It is also interesting to note that those are the two projects that also have a heavy focus on a ASP.NET website GUI, and the WebServices they are building are relatively simple components that are not really meant to be distributed, but rather to provide the business logic behind their web site. Obviously, because they are already fully dependent on ASP.NET for the rest of their application, they don't see any disadvantage from a support or development perspective in respect to continuing to use ASP.NET for underlying business components - and in fact this is a benefit for them. However, for an application that is not already very tied to ASP.NET via a web user interface as the primary piece of functionality, they all agree that there would need to be a very compelling reason to move to WebServices.

Interestingly, even on the discussion of WebServices as an external communications mechanism, most teams agree that this decision still needs to be contemplated vs. the current accepted company standard for this type of interaction, which is still MQ. In fact, for teams that are currently still using Feed and DB mechanisms for connecting to other systems, rather than looking at WebServices as the best alternative for those, they are all considering MQ. One other project in particular is also fairly determined on that approach, and they are receiving a lot of guidance directly from a key individual in the CTO group. They are confident that if we go to the CTO group with this WebServices idea, we would probably not have an easy time convincing them.


Finally, I would offer a few external references (just from the first few results that pop up in google) to show that these are not just ideas I'm inventing, but are actually well known issues in delineating between a Web Services approach and an MQ approach. Additionally, since some of these articles are older than others, this demonstrates that this should not be a new school of thought on the frontier of innovation - some of these concepts are well known and have been around for quite some time now:,295199,sid63_gci984269,00.html
When wouldn't I recommend Web services in this scenario? Web Services are currently lacking two key capabilities that can be a strong point of a message queue server. First, they have no inherent support for transactions. Most message queue systems have good transaction support and even support transactions across multiple data sources. If this type of cross-application transaction support is key to your implementation, then choose a message queue service.

Second, message queues allow guaranteed delivery. In their current incarnation, Web Services do not. Since messages can be sent asynchronously (fire and forget) with a message queue service, it is vital that the message be guaranteed to eventually arrive at its destination. Hence the idea of message "queues".

In the end, you need to carefully delineate your business needs and prioritize them. Include not only the technical details,
but also keep in mind maintenance, support and Total Cost of Ownership.

Remember that orthogonal component plumbing provided by either the platform or yourself, is an implementation issue - one not relevant to a specific architectural technology. It applies equally to traditional business logic within a process, as to business logic that implements an "asmx" type web service.

The choice of whether you "scale up" or "scale out" will depend on your particular circumstance and requirements.
If your business logic is simple, stateless and largely based on interacting with the data tier, you could scale hrough an application farm. But if your requirements are processing intensive and complex, the most performing solution is likely to be a long living logic block that maintains state - one to which you might pply enterprise services - one that might itself be the implementation for a service.

Microsoft does not propose that business logic call other logic in the same process through XML based messaging. How efficient would that be? Rather, a service oriented architecture remedies the failure of distributed object technologies (like COM, Corba or RMI) to provide seamless program integration outside of a process (or application domain).

Web services are frequently described as the new incarnation of distributed object technology. This is a serious misconception, made by people from industry and academia alike, and this misconception seriously limits a broader acceptance of the true web services architecture. Even though the architects of distributed systems and internet systems alike have been vocal about the fact that these technologies hardly have any relationship, it appears to be difficult to dispel the myth that they are tied together
A first thing to realize however is that the current state of web services technology is very limited compared to distributed object systems. The latter is a well-established technology with very broad support, strong reliability guarantees, and many, many support tools and technologies. For example, web services toolkit vendors have only just started to look at the reliability and transactional guarantees that distributed object systems have supported for years.
For the transport layer, it ultimately will come down to what you are most comfortable with in your enterprise. The advantage of sticking to SOAP over HTTP is that the tool support (both development and management) will be much stronger. While web services are in theory supposed to be transport-neutral, HTTP is certainly the "first among equals" choice. MQ is a good choice when you need to support guaranteed messaging or if your own testing has demonstrated that you have higher scalability and/or throughput for the volumes you are seeing.

That's because issues like security, non-repudiation (both parties are sufficiently authenticated so the transaction cannot be disavowed), redundancy (a message will continually be sent until its receipt is verified), transport (http vs. MQ Series, etc), transaction semantics (do both functions or neither), and workflow (gather information X from this application and information Y from the next) have yet to be adequately solved by Web-services architects, says Carey.

No comments: