OWLDQ analytics on AWS

Business need

HealthCare is one of the vertical for General Electric. Healthcare system bring different type of source data from Hospital equipment which includes sensor data, PHI data, data generated by Equipment’s, preserve them on AWS cloud environment for Analytical use. Before performing Analytics, the data has to be validated, hence owlDQ is the tool used for data validation between source and destination.

Approach & solution

OwlDQ is a web application that can connect to source and destination data stores & run spark-based jobs to compare & score the data. This tools helps business visualize the data quality on different data stores.

AWS Services Used:

  • AWS EMR: Used to run owlDQ data validation jobs.
  • Amazon RDS: Metrastore for owlDQ webapp
  • AWS Ec2: Used for hosting the webapplication & submit the workloads to EMR cluster
  • AWS ELB: loadbalencer for the web application

Approach is to host a sophisticated web application  that automate the Data quality without the need of rules. Owl applies the latest advancements in Data Science and Machine Learning to the problem of Data Quality. OwlDQ creates and submits the spark workloads on top of EMR cluster tor run the analytical jobs & publish reports on the data quality between different data stores. The reports can be visualized from a webbrowser connection to owlDQ. ELB is used to route the traffic to webapplication based on the load. EC2 instance is used to host the webapplication. RDS (postresql) stores the metadata written by the workloads. OwlDQ can connect to multiple services provided by AWS (like S3, RDS, redshift, EBS…etc.)

Posted by wissenadmin | 11 August 2022
APPROACH & SOLUTION: OwlDQ Web application that can connect to source and destination data stores & run spark-based jobs to compare & score the data. This tool helps business visualize…
17 LikesComments Off on RDS – Performance Improvement & Cost Reduction
Posted by wissenadmin | 11 August 2022
Transportation (Heterogeneous) Industry Vertical made their application availability 100% with 45% Increase in End-to-End Query performance Business Need:  Industries that depend on data extractions from Distinct source like Databases, Sensors,…
15 LikesComments Off on AWS Relational Database Service
Posted by wissenadmin | 11 August 2022
AWS Elastic MapReduce (EMR) Increased operational efficiency 40% and reduced 34% of cost Business Need:  Transportation Industry vertical has a wide range of usage pattern in which business has some…
15 LikesComments Off on AWS Elastic MapReduce