Case Study : AWS Relational Database Service

Category

Case Studies

Author

Wissen Team

Date

July 10, 2023

Approach & Solution:

The client Web application that can connect to source and destination data stores & run spark-based jobs to compare & score the data. This tool helps business visualize the data quality on different data stores.

AWS Cloud for Web Application with RDS PostgreSQL Architecture:

AWS Services Used:

  • AWS EMR
  • Amazon RDS
  • AWS EC2
  • AWS ELB

Approach is to host a sophisticated web application that automate the Data quality without the need of rules. Owl applies the latest advancements in Data Science and Machine Learning to the problem of Data Quality. OwlDQ creates and submits the spark workloads on top of EMR cluster to run the analytical jobs & publish reports on the data quality between different data stores. The reports can be visualized from a web browser connection to owlDQ. ELB is used to route the traffic to web application based on the load. EC2 instance is used to host the web application. RDS(PostgreSQL) stores the metadata written by the workloads. OwlDQ can connect to multiple services provided by AWS (like S3, RDS, redshift, EBS...etc.)

Performance Issues:

As our infra with RDS was able to run a few thousands of jobs, The requirement was to run huge tables with millions and the bandwidth for Compute was recommended and increased memory for the PostgreSQL 13.4. The team initially asked to separate the web and agent to try with the installation on the EMR nodes. The queue priority and overlapping was a big challenge. The team also was facing issues with the jobs getting hung and frequent owlDQ services to be restarted as the jobs would get stuck in accepted state and would not move further. Now the platform team as a regular work around cleared the dead tuples and had run vacuum still the dead lock was a challenge.

Remedial Solution:

Now the remedy was to turn of the auto disable feature in the RDS instances with the outage window. The team with the AWS expert’s guidance performed the task and owlDQ team found this is a better solution with the improved performance with great speed. The spark jobs and the dashboard setup on a separate EC2 setup by separating webapp and owlDQ agent with db.m6g.2xlarge along with additional storage of 100GB has helped with 32% performance improvement and 15% cost reduction.