Browsed by
Month: April 2019

AWS EMR Tutorial – Part 1

AWS EMR Tutorial – Part 1

Hello! We have set up the Hadoop environment from the previous post. And YES! It IS a hassle unless you need your own tuned version of the environment. Therefore, I’ll introduce a more convenient way to use Hadoop environment from this post. We’ll test MRjob or PySpark using AWS EMR. In part 1 we’ll launch the EMR and use it very naively (static instances and using HDFS). From part 2 we’ll use EMR more correctly (?) (using AWS CLI and…

Read More Read More