For a leading aviation client, the EMR clusters were responsible for running all ingestion, analytic and data science jobs on a different Individual clusters, clusters are running the scheduled jobs and the ad hoc jobs based on business needs. As part of cluster build process, it may use multiple EC2 instances up to 300 instances per month with auto scaling nodes. Because of this the client needs to spend more money for all the EMR EC2 instances. The challenge for the customer is cost optimization and better response time for the Spark.
Solution and Approach:
AWS Graviton Processor designed by AWS for delivering the EMR cloud workloads with best price and performance has been adopted as a solution. Graviton Provides Exceptional Business Agility by Connecting Applications to a modern and Agile Approach for Software Development and Infrastructure. Easy Scalability, Connectivity, Analytics and Decreases Time-to-market. Apache Spark is performance optimized on Amazon EMR cluster version release 5.28.0 and later. This feature as enabled our current customer configuration to migrate to run Apache Spark with Amazon EMR cluster on Graviton. AWS Graviton AMI processor resulted in 30% cost reduction and increases 15% spark performance.
Below is the EMR Clusters architecture for reference: -
By migrating to Graviton platform the customer is able to reduce 30% on the cost of the cluster and increase Spark performance by 15%.