Tag Archives: EMR

Wordcount mapreduce example using Hive on local and EMR

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.

In short, you can run a Hadoop MapReduce using SQL-like statements with Hive.

Here is an WordCount example I did using Hive. The example first shows how to do it on your Local machine, then I will show how to do it using Amazon EMR.

Local

1. Install Hive.

First you need to install Hadoop on your local, here is a post for how to do it. After you installed Hadoop, you can use this official tutorial.

Continue reading