Before reading this post, please go through my previous post at “How MapReduce Algorithm Works” to get some idea about MapReduce Algorithm. My previous post has already explained about “How MapReduce performs WordCounting” in theoretically.

And if you are not familiar with HDFS Basic commands, please go through my post at “Hadoop HDFS Basic Developer Commands” to get some basic knowledge about how to execute HDFS Commands in CloudEra Environment.

In this post, We are going to develop same WordCounting program using Hadoop 2 MapReduce API and test it in CloudEra Environment.

MapReduce WordCounting Example

We need to write the following three programs to develop and test MapReduce WordCount example:

  1. Mapper Program
  2. Reducer Program
  3. Client Program

NOTE:-
To develop MapReduce Programs, there are two versions of MR API:

  1. One from Hadoop 1.x (MapReduce Old API)
  2. Another from Hadoop 2.x (MapReduce New API)

In Hadoop 2.x, MapReduce Old API is deprecated. So we are gong to concentrate on MapReduce New API to develop this WordCount Example.

In CloudEra environment, They have already provided Eclipse IDE setup with Hadoop 2.x API. So it is very easy to develop and test MapReduce Programs using this setup.

To develop WordCount MapReduce Application, please use the following steps:

  • Open Default Eclipse IDE provided by CloudEra Environment.
  • We can use already created project or create a new Java Project.
  • For simplicity, I’m going to use existing “training” Java Project. They have already added all required Hadoop 2.x Jars to this project classpath. It is ready to use Eclipse Java Project.
  • Create WordCount Mapper Program
  • Create WordCount Reducer Program
  • Create WordCount Client Program to test this application

Let’s us start developing these three programs in next sections.

Mapper Program

Create a “WordCountMapper” Java Class which extends Mapper class as shown below:

Code Explanation:

    • Our WordCountMapper class has implemented Hadoop 2 MapReduce API class “Mapper”.
    • Mapper class has defined by using Generic Type as Mapper<LongWritable, Text, Text, IntWritable>

Here <LongWritable, Text, Text, IntWritable>

    1. First two <LongWritable, Text> represents Input Data types to our WordCount’s Mapper Program.

For Example:- In our example, we will give a File(Huge amount of Data, any format). Mapper reads each line from this file and give one unique number as shown below

In Hadoop MapReduce API, it is equal to <LongWritable, Text>.

    1. Last two <Text, IntWritable> represents Output Data types of our WordCount’s Mapper Program.

For Example:- In our example, WordCount’s Mapper Program gives output as shown below

In Hadoop MapReduce API, it is equal to <Text, IntWritable>.

  • We have implemented Mapper’s map() method and provided our Mapping Function logic here.

Reducer Program

Create a “WordCountReducer” Java Class which extends Reducer class as shown below:

Code Explanation:

    • Our WordCountReducer class has extended Hadoop 2 MapReduce API class: “Reducer”.
    • Reducer class has defined by using Generic Type as Mapper<Text, IntWritable, Text, IntWritable>

Here <Text, IntWritable, Text, IntWritable>

    1. First two <Text, IntWritable> represents Input Data types to our WordCount’s Reducer Program.

For Example:- In our example, our Mapper Program will give <Text, IntWritable> output, which will become the input of Reducer Program.

In Hadoop MapReduce API, it is equal to <Text, IntWritable>.

    1. Last two <Text, IntWritable> represents Output Data types of our WordCount’s Reducer Program.

For Example:- In our example, WordCount’s Reducer Program gives output as shown below

In Hadoop MapReduce API, it is equal to <Text, IntWritable>.

  • We have implemented Reducer’s reduce() method and provided our Reduce Function logic here.

Client Program

Create a “WordCountClient” Java Class with main() method as shown below:

Code Explanation:

  • Hadoop 2 MapReduce API has “Job” class at “org.apache.hadoop.mapreduce” package.
  • Job Class is used to create Jobs (Map/Reduce Jobs) to perform our WordCounting tasks.
  • Client program is using Job Object’s setter methods to set all MapReduce Components like Mapper, Reducer, Input Data Type, Output Data type etc.
  • These Jobs will perform our WordCounting Mapping and Reducing tasks.

NOTE:-

  • As we discussed in my previous post, MapReduce algorithm uses 3 functions: Map Function, Combine Function and Reduce Function.
  • By observing these 3 programs, we can find out one thing that we have developed only only two functions : Map and Reduce. Then What about Combine function?
  • That means we have used default Combine function logic available in Hadoop 2 MapReduce API.
  • We will discuss on “How to develop Combine Function” in my coming posts.

Now we have developed all required components (programs). It’s time to test it.

Test MapReduce WordCounting Example

Our WordCounting project final structure looks like this:

mrv1-wordcount-proj-structure

Please use the following steps to test our MapReduce Application.

    • Create our WordCount application JAR file using Eclipse IDE.
    • export-wordcounting-jar-400x450
    • Execute the following “hadoop” command to run our WordCounting Application

Syntax:-

Let us assume that we have already created “/ram/mrv1/output” folder structure in Hadoop HDFS FileSytem. If you are not performed that, please go through my previous post at “Hadoop HDFS Basic Developer Commands” to create them.

Example:-

NOTE:-
Just for simple readability purpose, I’ve provided command into multiple lines. Please type this command in single line as shown below:

mrv1-wordcount-proj-structure

By going through this log, we can observe that how Map and Reduce jobs work to solve our WordCounting problem.

  • Execute the following “hadoop” command to view the output directory content

It shows the content of “/ram/mrv1/output/” directory as shown below:

mrv1-wordcount-proj-structure

  • Execute the following “hadoop” command to view our WordCounting Application output

This command displays WordCounting Application output. As my output file is too big, I’m not able to show you my file output here.

NOTE:-
Here we have used some Hadoop HDFS commands to run and test our WordCounting Application. If you are not familiar with HDFS commands, please go through my “Hadoop HDFS Basic Developer Commands” post.

That’s it all about Hadoop 2.x MapReduce WordCounting Example. We will develop some more useful MapReduce programs in my coming posts.

Please drop me a comment if you like my post or have any issues/suggestions.

By admin

Leave a Reply

%d bloggers like this: