Pharmaceutical research is data-intensive. It takes tremendous computational power to parse and analyze the requisite giant chunks of data. The cloud delivers that power and helps accelerate research processes and, ultimately, research findings. That’s why one pharmaceutical company was using some “off the shelf” CloudFormation templates to create its own powerful informatics environment on the AWS Cloud.

However, the company determined that it needed assistance in optimizing that environment. A key driver was its use of a set of analysis pipelines that process output to generate analysis.

The Challenge – Accelerate Data Analysis

The company’s IT team had experimented with deploying the specialized software for drug discovery and development process on Amazon EC2 instances. The problem was that some of the processes could take many hours. In some cases, they had to be executed several times. To accelerate data analyses, the company needed to automate as many steps as possible, from initial data ingestion through storing the output in an Aurora database.

Because the data could include electronically protected health information (ePHI), whatever solution was implemented also had to meet specific privacy and security requirements of the Health Insurance Portability and Accountability Act (HIPAA).

ClearScale, an AWS Premier Consulting Partner with extensive experience in the healthcare industry, was tasked with the job. Specifically, ClearScale would work with the pharmaceutical research company to develop two new data processing pipelines and refine the previously developed one with the goal of speeding up the data processing side of the company’s research.

The ClearScale Solution – AWS Data and Analytics

ClearScale’s multi-tiered solution features a combination of AWS services and best practices to meet the pharmaceutical research company’s specific needs. While not a comprehensive list of all the solution components, key elements include Amazon Elastic Container Service (Amazon ECS) at the application tier. The highly scalable, high-performance container orchestration service supports Docker containers. It also allows for easily running and scaling containerized applications on a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances.

It’s used in conjunction with Amazon CloudWatch, a monitoring and management service. A CloudWatch EC2 agent is installed on each EC2 instance using HashiCorp’s Terraform, an open-source tool for creating, changing, and improving infrastructure. The CloudWatch agent provides an automated way to send log data to CloudWatch logs from the Amazon EC2 instances.

Also included is AWS Lambda, a compute service that allows for running code without provisioning or managing servers. It automatically scales applications by running code in response to each trigger.

At the database tier is Amazon Aurora (Aurora), a relational database engine. It’s fully managed by Amazon Relational Database Service (RDS), which automates time-intensive administration tasks such as hardware provisioning, database setup, and backups. The storage tier employs Amazon Elastic File System (Amazon EFS), an elastic file system that scales on-demand without disrupting applications.

At the integration tier is Amazon Simple Queue Service (SQS). It’s a fully managed message queuing service that decouples and scales the fleet of worker nodes. It’s complemented by Amazon Simple Notification Service (SNS), a highly available, secure, fully managed publish/subscribe messaging service for decoupling microservices, distributed systems, and serverless applications.

Architecture Diagram
Architecture Diagram

The HIPAA Component

The solution also includes a number of AWS services to help satisfy HIPAA requirements for security and privacy. Among them is AWS Identity and Access Management (IAM). The service allows for centrally managing users, security credentials, and permissions that control which AWS resources users and applications can access.

AWS Key Management Service (KMS) is used to create and control the encryption keys used to encrypt data. Amazon EC2 Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. AWS CloudTrail enables governance, compliance, operational auditing, and risk auditing of the neuroscience company’s AWS account.

The Data Pipeline Solution in Action

In very simple terms, the ClearScale solution works like this. Amazon CloudWatch triggers an ingestion task to look for new data on their corporate network. If new data is found, it’s fetched and copied to Amazon EFS storage. SQS, used as a messaging queue, triggers a Lambda function. The Lambda function starts the processing task. Amazon ECS scales out the resources to meet the needs of the processing task. After the processing is complete, results are written to an Aurora database. An SNS notification is sent, and the ECS resources are scaled in.

The Results – Rapid Data Analysis

With ClearScale’s solution deployed, tested, and successfully operating, the pharmaceutical research company now can conduct the analysis portion of its research much faster. Its researchers, data analysts, and IT team no longer have to devote time to tedious processes. Instead, they can focus on more strategic endeavors. That all translates to quicker research findings — and faster time-to-market for potential treatments.

The secure, reliable infrastructure and the ability to scale compute resources as needed also provide greater flexibility and potential cost savings for future research projects.

What Can ClearScale Do for Your Company

If you’re interested in learning more about ClearScale’s work with data pipelines, analytics, and similar topics, you’ll find more case studies here.

Interested in ClearScale’s other areas of expertise or what we can do for your company? Let us know.