Amazon EMR is a web-based service that simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective to distribute and process vast amounts of data across dynamically scalable Amazon EC2 instances.
Amazon EMR is a web-based service that simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective to distribute and process vast amounts of data across dynamically scalable Amazon EC2 instances.
Customer Reviews
Rishab S.
Advanced user of Amazon EMREMR can execute the code using spark or other clusters like Hadoop, and these increase the performance and decrease the total job execution time.
Execution time comes down to few minutes as against several hours running on either EC2 or other computing servers.
Easy to choose between hadoop or spark based EMR culsters , it can be used in conjection with other AWS services like we can build an orcestation involving EMR and several other task on AWS datapipeline service.
It takes time to spin up an EMR cluster, sometimes up to fifteen to thirty minutes if using it for the first time for a task and this happens mostly whenever starting a new cluster.
Once triggered adding new task is easy but initial setup takes time.
And we have to think on the use case and code for which EMR is to be used for at time EC2 is able to finish or perform the same processing in the comparable amount of time so in those cases we might end up increasing the overall cost of the project.
We are running it to perform processing which takes several hours on EC2 to be running on spark-based EMR cluster to complete the processing within minutes instead of several hours.
Ease of use and ability to choose from either Hadoop or spark.
Processing time decreases from 6-9 hours to 30-40 minutes compared with the Ec2 instance and more in some cases.