3 Steps to Avoiding the Worst: How to ensure business continuity on-prem, off-prem or in the cloud

Tiffany Chester January 19, 2019

In today’s connected world, customers expect seamless experiences. It is taken for granted that your customer and staff-facing systems will be available and work properly all the time, and for customer-facing systems at least, be accessible from any of their connected devices, from desktop, to tablet, to smart phone, to smart watch.  

Modern seamless customer experiences however typically require multiple backend business systems to be working together in harmony. This imposes significant architectural challenges to an organization’s technical leadership team. Some systems are legacy information systems, which often do not have a second instance running in the event that the primary system goes down. Some systems are in the cloud, where the organization has little option to take action in the event of an outage, other than to tell customers they are “waiting as quickly as they can” for a fix. 

In this article, we’ll explore three steps that can help you avoid the worst of the problems that can occur in such an inter-connected environment. 

Step 1: Deploy middleware 

With the proliferation of specialized systems that each excel in their respective niches, the monolithic approach to business systems is dead. You will never be able to find a single enterprise application that has a better embedded social experience than Facebook, better CRM than SalesForce, better email distribution than MailChimp, better support for finding new talent than LinkedIn, and better user management than Active Directory. 

You need to use leading edge applications and services that provide best-of-breed experiences for your staff and your customers, and this necessitates the use of multiple systems to accomplish your business goals.  

A lack of connectivity and data sharing across applications leads to costly and error-prone manual processes, such as having your staff updating data by hand in multiple systems. Given that bad data is the leading cause of application defects and failures, and that duplicate data is a leading cause of data integrity issues, this can cause problems that do not become apparent until much later on, when it can be difficult to identify the source of the issue, and to identify which of the conflicting data sets is correct, and to manually resolve the data issues in the various systems. 

In short, a lack of connectivity across your applications costs your business money, and degrades your customers’ experience when errors happen. Worse case, the customer goes elsewhere, costing you future revenue as well. 

A typical path for an organization to take is to avoid this is to leverage the application programming interfaces (APIs) available from the various systems that make up their IT landscape. Often, organizations engage vendors to customize their various applications to include reading from and/or writing to APIs from other systems in the ecosystem. This however causes a number of problems. 

First of all, without middleware, each of the applications will need to be modified to support the various communication protocols supported by the other applications with which it needs to communicate. Some of these APIs use SOAP, some REST, some TCP-IP, some use file transfers, JMS, or FTP; some use straight HTTP and HTML, others use local DLLs or .Net assemblies. Without middleware, each of the applications will need to be modified to support each of the protocols required by each of the applications to which it needs to connect. This is a costly and risk-filled exercise. 

Secondly, once the applications are modified to consume one-another’s APIs, a frailty is introduced by this tight-coupling. When system A needs information from system B and system B is not available, has crashed, been taken offline for service, etc., then system A is unable to complete its tasks and also fails. Because systems C, D and E also communicate with system B, those systems also become unavailable. This results in your IT department “rebooting the entire enterprise” when a system failure occurs, which can be tremendously disruptive, not only to your staff that is trying to use these systems to perform their jobs, but also to your IT staff, who are constantly fire-fighting rather than working on your new initiatives. 

By introducing a middleware messaging system, point-to-point communications can be eliminated, and even outlawed as an unacceptable enterprise architecture principle. The best and most reliable infrastructures leverage a middle tier that is able to shield applications from the failures of others, provide a single point of reference for troubleshooting issues between applications and identifying root failure causes, and adapt protocols so that in most cases, existing applications don’t need to be modified to support communications with one another. 

Step 2: Intercept point-to-point communications 

Now that you have an available middleware solution in place, you can begin improving the availability of your applications immediately. Without re-engineering anything, you can start to use the middle tier to intercept any existing direct connections between your applications. Without doing anything more, this brings you many of the benefits of decoupling, including; 

  1. A single place to look to identify issues with systems that may be producing errors, not responding, or not sending outbound messages as expected; a single support process starting point 
  2. Ability to receive and distribute notifications of failures on critical systems in a unified way
  3. Ability to redirect service consumers away from broken upstream systems; for example, diverting an external service to an alternate external service, without chasing down all of the individual connections to that service across your IT infrastructure. In some cases, it may even be possible to divert traffic to a “mock service” or static data set in the event of a failure of the actual upstream service, so that other applications remain operational, albeit potentially accessing stale or cached data. 
  4. Ability to record and replay traffic – this can allow you to easily reproduce errors, and provide evidence to your vendors that certain data sets cause their system to error, crash or return incorrect results. 
  5. Ability to replace systems with alternate or newer systems, without affecting applications consuming services from the replaced application. The middle tier now acts as a service façade, effectively separating the upstream application’s implementation from its interface, making the application a “black box” from the perspective of any consuming applications.
  6. Standardized error reporting and monitoring 
  7. Standardized connectivity patterns 
  8. An evolving ecosystem of available services that can be reliably consumed by other applications to automate and orchestrate business processes

Step 3: Build facades for disconnected applications and batch processes 

Now that you have a flourishing ecosystem of APIs available on your middle-tier platform, you can begin automating various business functions within your organization, sharing data from applications with others in ways that weren’t possible before. XSLT transforms can be used to adapt the various protocols so that applications can consume services for which they were never originally designed. 

You can now begin thinking about connecting the other applications in your enterprise, that probably didn’t share data or services in any automated fashion previously. This typically involves building an adapter for the disconnected system, which for example task to the application in some native way, and exposes that capability as an XML, REST or SOAP web service.  

Even the most resistant applications, like legacy or mainframe systems, usually have a way of getting data into or out of them. Maybe they can import a file located somewhere on the SAN. Maybe a screen-scraper is available that can input data and extract resulting information from ASCII terminals or HTML screens.  

By building an adapter for these applications, you are effectively turning them into black-boxes, and are likely decreasing the urgency of upgrading or replacing those systems, as they can now be consumed through their service façades, and so long as the implementations work, it is enough for now to include those systems into the broader ecosystem, without requiring crazy code to be added to each system that want to interface with these legacy systems. 

As you are doing this, you should begin to consider asynchronous messaging patterns and reliable delivery. In some cases, this can be added to the existing connections you intercepted in step 2 above. For example, for a cloud service that sends you information, your middle tier can add the incoming information to a JMS queue, and respond immediately with a “success” response. If the receiving information system inside your IT infrastructure is up and ready to receive the data, the information is delivered immediately from the JMS queue. If the system is not available, down for any reason, the middle tier will continue trying to send the message through until the system comes back online. 

Likewise, if one of your core systems in-house delivers information to a service hosted in the cloud, your middle tier can accept the message from the core system, and continue attempting to deliver the message to the cloud service until it becomes available. You will further gain invaluable insight into the uptime of the cloud service, and can compare your records of outages to the service-level agreements you have with the external cloud service provider. 

Overall, you should continue looking for ways to implement asynchronous messaging patterns wherever you can. For example, imagine a client application that currently uses synchronous request/response semantics. Rather than sending a request and blocking while waiting for a response, the system can send an outbound request, which is acknowledged immediately by the middle tier. Later, when the upstream information system becomes available, it can send an outbound response back to the application, with a message saying something like “here’s that data you requested.  

If you have the opportunity to customize some of the applications you are connecting to your infrastructure, this is the ideal time to consider moving to asynchronous messaging patterns. These will ensure that the applications you have that require data or services from other applications can remain functioning as best as possible, even when upstream services are temporarily unavailable. 

Summary 

To summarize, from an operational standpoint, some of the largest problems in enterprise IT occur when systems are reliant on other systems, but either can’t get data into or out of those systems automatically at all, or fall over like a line of dominos when one of those critical systems fails and other systems are reliant on it. An even larger problem can occur, when organizations attempt to modify their existing applications to suit directly communicating with one another, which causes an explosion of costs, complexity and risk. And the worst of the problems: winding up in a billion-dollar megaproject initiated to implement the wholesale replacement of multiple systems, without leveraging a lower-risk evolutionary approach to making the existing systems operate together in harmony. There are many examples of these types of projects, that typically spiral out of budgetary control, and after countless delays and scope and cost increases, fail to meet stakeholders requirements in the end, and are then discarded as the organization starts over, if the organization can survive such a disastrous undertaking in the first place.

Tiffany Chester

Healthcare’s Digital Transformation

April 16, 2019
READ MORE
David Priest

Open Data Standards Versus Proprietary Data Formats

January 29, 2019
READ MORE