In many cases, “lift-and-shift” migration, where the entire IT infrastructure with an application is moved from on-premise environment to the cloud without redesigning, sounds like a bad idea. But sometimes, it can’t be avoided. And sometimes, it pays off magnificently.
This was the case with one of the ClearScale’s clients in the healthcare space. The company has consistently made innovation a top priority, which is one reason why it’s become the world’s #1 healthcare provider. The client had a king-sized problem that required some hard choices, including considering a “lift & shift” migration.
Over a span of many years, a large team of developers had created a heterogeneous, mission-critical application that was so complex, a single instance spanned over a hundred machines. The company wanted to base this application in new cloud environments, using Amazon Web Services (AWS). They also needed to maintain it on-premises, as well as in the cloud, which meant that underlying architecture couldn’t be changed.
To add to the complexity of the problem, this application was built with an Oracle database as a central component. Because it was custom-tuned and tied in with the other components, and had unusual CPU and storage load patterns, it couldn’t be migrated to Amazon RDS for Oracle, or to another database engine.
The ClearScale Solution
As a first step, ClearScale made a preliminary estimate of the AWS monthly costs, which turned out to be much higher than what the client would spend on-premises. Since their on-prem environment was over-provisioned, ClearScale began optimizing its number of cores and RAM.
The team containerized the small, underutilized machines in the environment, and bundled similar services onto single machines. Since AWS lacked a choice of inexpensive x86 virtual machines (VMs) that guaranteed a sustained components load, those machines where the load maintained steady single-digit CPU utilization percentages were chosen.
The ClearScale team then explored storage optimization, specifically looking into a combination of systems that could dramatically reduce costs. The storage grade was lowered carefully, as the team examined any risks associated with performance and reliability.
Some of pricy Provisioned IOPS EBS volumes were replaced by faster, more cost-efficient NVMe drives, attached directly to the instances. Less expensive options were found for storage-centric read/write heavy applications, such as SQL and NoSQL databases. The team managed to avoid using pricier cloud-based storage options, even for disk back-ends that required high-end systems.
Next, the ClearScale team performed a series of tests. They built a baseline configuration, then ran an expected workload, achieving the ideal price/performance ratio during several test iterations. It helped that the client already had an environment built specifically to simulate a peak hours load.
For the first test, the ClearScale team decided not to over-optimize the environment, so they could cut out the most visible “fat”. They set up CloudWatch metrics, enabling finely grained, detailed monitoring to capture what would happen to the environment.
The team defined the criteria for under- and over-utilization for both the CPU and Storage tiers, taking into account specific load patterns and how critical each component was to the functioning of the system as a whole. They also defined safe and semi-safe zones for those metrics, as well as when to map each of the components into one of the zones. Depending on which zone the component landed in, the team chose whether to keep, increase, or decrease the performance for the next test iteration.
After the tests had been performed, additional cost optimization was achieved by using reserved instances (on average ~40% for a one-year commitment, ~60% for a three-year commitment of EC2 costs). Since some components had numerous logs stored locally, they were offloaded to ELK clusters for centralized storage/management, saving additional costs on terabytes of storage volumes.
In the end, the “lift & shift” strategy proved to be a successful one for the client. Not only did the environment pass all the peak load tests, it even had a margin for a future load increase.
As for cutting costs, the ClearScale team optimized the client’s AWS expenses by nearly six times, even without applying reserved instances discounts. And the team identified a few internal architectural changes to the application that could reduce cloud costs even more.
Get in touch today to speak with a Cloud expert and discuss how we can help: