Browsed by
Category: Cloud Computing

AWS EMR Tutorial – Part 1

AWS EMR Tutorial – Part 1

Hello! We have set up the Hadoop environment from the previous post. And YES! It IS a hassle unless you need your own tuned version of the environment. Therefore, I’ll introduce a more convenient way to use Hadoop environment from this post. We’ll test MRjob or PySpark using AWS EMR. In part 1 we’ll launch the EMR and use it very naively (static instances and using HDFS). From part 2 we’ll use EMR more correctly (?) (using AWS CLI and…

Read More Read More

Hadoop 101: Multi-node installation using AWS EC2

Hadoop 101: Multi-node installation using AWS EC2

In this post, we will build the multi-node Hadoop cluster using three EC2 instances ( one for master, two for slaves). (I will assume that you know how to use AWS. If you don’t, please check this link) To run Map-Reduce task properly, you need enough memory. Therefore, we will use t2.medium type instance. (If you are a student and need some free credit, check this link.) AWS EC2 t2.medium×3 (1 for a name node, 2 for data nodes) Ubuntu…

Read More Read More