Business Need

General Electric Health Care brings different type of source data from Hospital equipment which includes sensor data, PHI data, data generated by Equipment’s, preserve them on AWS cloud environment for Analytical use. The source data comprises of different type of file format and the historical data to be available on centralized location for processing, enriching and visualization.

Approach & Solution

The previous infrastructure could no longer handle demand, Healthcare system switched to AWS involving multiple services & products to deliver fast, reliable, and secure data needs & processing power.

AWS Services Used

  • Amazon S3: To store the source data, processed data, and enriched data
  • AWS EMR: used to handle the periodic workloads to process, aggregate and enrich the data using the services available such as SPARK, Sqoop to import the data from multiple source systems to S3, Hive to store the processed data, oozie to schedule the workloads, Zeppelin to query data stored in s3.
  • Amazon Redshift: As a warehouse to preserve the historical data
  • AWS IOT: To collect the sensor data and store that into S3 as RAW to process it further using EMR.
  • AWS Ec2: to install custom software for data visualization.

Approach is to bring the data from different source systems containing different file formats (flat, json, sensor data and structural data), use S3 as a centralized storage to store RAW, enriched and cleansed data. Use EMR as the big data analytics and application platform to process data from S3 using the services provided by EMR. AWS redshift as a data warehouse to store all the processed historical loads. 

Share it