1-800-591-0442 | 24/7 Live Support Location | Careers | Contact Us

Migrating HDP Cluster to Amazon EMR to Save Costs and Ease the Upgrade Process

March 04 2019

Big data has transformed the way companies conduct business. Through the careful analysis of large amounts of information, organizations can be empowered with modern decision-making capabilities that make it easier to draw conclusions and carry out actions based on accurate, relevant, and up-to-date data. Companies that embrace big data gain access to tools that can significantly reduce the cost of doing business.

Using big data to help an organization lower costs requires a detailed understanding of how, when, and where money is being spent. Without an accurate method to assess the true price of a product or service, associated costs can continue to rise, compounding an already difficult and expensive problem to solve. Big data helps organizations curb runaway or unnecessary spending by turning towards detailed analytics platforms to identify cost centers, pinpoint wasteful areas or opportunities for efficiency, and ultimately develop plans that improve the bottom line.

A client in the financial services industry addresses these inefficiencies through comprehensive performance management tools. Its core analytic and operating system functions creates a unique workflow engine for organizations to ensure its clients operate with high efficiency. Serving the needs of multiple interested parties, its cloud-based SaaS platform was specifically engineered for using data to model, predict, and manage costs with near 100% accuracy.

The Challenge

A significant aspect of this client’s analytics efforts involves a Hadoop cluster, executing large numbers of batch jobs to generate insights from comprehensive data sets. This client currently operates a Hortonworks Data Platform (HDP) cluster self-hosted on EC2 nodes, but to reduce costs and reduce the lengthy Hortonworks upgrade process, the client is looking to migrate their Hadoop cluster to Amazon Elastic MapReduce (Amazon EMR).

One of the most unique aspects of the migration was the requirement to integrate Apache Sentry, a framework to enable, monitor, and manage Hadoop data security, with an Amazon EMR cross-realm trust Kerberos cluster. Because this client depends handles vast amounts of data containing private, sensitive information, they desired Sentry’s data security framework to use agents to sync policies and users, and enable plugins that run within the same process as the Hadoop component. This client approached ClearScale, an AWS Premier Consulting Partner, to draft a proposal to migrate their existing HortonWorks platform to Amazon EMR.

The ClearScale Solution

ClearScale began by analyzing the client’s existing data infrastructure. Starting with its Hortonworks implementation, ClearScale audited the amounts of data being utilized, the types of data processing that occurred, and the method in which teams within its organization developed and built applications for the Hortonworks platform.

Upon conclusion of the audit, ClearScale built a custom architecture design to meet the client’s specific needs. The architecture framework was designed using the Amazon EMR managed cluster platform to cost-effectively process and analyze vast amounts of data, an AWS RDS MySQL as a meta store for the data, and S3 for cloud-based storage. To meet the client’s specific data security needs, the Apache Sentry framework was integrated with Hadoop and added Kerberos for network authentication. Each component was chosen to address the client’s requests for lower maintenance, scalability and security at a lower cost.

EMR Architecture Design

alt

The Benefits

This client can now take advantage of the nearly unlimited expanding storage capabilities of S3, offering both industry-leading scalability and data availability. Amazon EMR’s highly scalable infrastructure also makes it possible to set up clusters using task-based On-Demand Instances or Spot Instances, flexible options that can save significant costs on particular workflows. Compared to Hortonworks, the flexibility of deployments through Amazon EMR allows for easily deplorable development systems and upgrade testing.

The result is a system that’s more secure, flexible, and cost-efficient. As one of the best choices for big data processing and analysis, Amazon EMR helps this client do more with less, streamlining their processing needs to be as efficient and effective as possible.

Because this client chose to partner with ClearScale, the swift transition of their data operations was handled by a trusted AWS Premier Consulting Partner with experts available to address any issues at hand. With its data operations fully addressed, this client can now focus on delivering quality analytics that saves large organizations time and money.

Get in touch today to speak with a Cloud expert and discuss how we can help:

Call us at 1-800-591-0442
Send us an email: sales@clearscale.net
Fill out a Contact Form
Read our Customer Case Studies

San Francisco

Headquarters

71 Stevenson St.

Suite 400

San Francisco, CA 94105

O: 1-800-591-0442

F: 1-415-655-6601

San Jose

5450 Thornwood Dr Suite #L

San Jose, CA 95123

Denver

1400 16th Street,

Suite 400

Denver, CO 80202

O: 1-720-932-8028

Phoenix

1910 S. Stapley Drive,

Suite 221

Mesa, AZ 85204

O: 1-480-386-5057

New York

165 Broadway, 23rd Floor

New York City, NY 10006

O: 1-646-759-3656

Toronto

100 King Street West

Suite 5600

Toronto, Ontario, M5X 1C9

O: 1-416-479-5447

© 2019 ClearScale, LLC. All Rights Reserved.    About Us  |  Careers  |  Privacy Policy
Share