Install Hadoop on Ubuntu With Examples

In this lesson, we will see how we can get started with Apache Hadoop by installing it on our Ubuntu machine. Installing and running Apache Hadoop can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.

In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

ubuntu-version (1)

 

Ubuntu Version

Also, if you just want quickly explore Hadoop, read CloudEra Hadoop VMWare Single Node Environment Setup.

Prerequisite for Installing Hadoop on Ubuntu

Before we can start installing Hadoop, we need to update Ubuntu with the latest software patches available:

Next, we need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:

To install Hadoop, make a directory and move inside it:

Installing Hadoop on Ubuntu

Now that we’re ready with the basic setup for Hadoop on our Ubuntu machine, let’s download Hadoop installation files so that we can work on its configuration as well:

We’re going to use the Hadoop 3.0.1 version for Hadoop. Find the latest version for Hadoop here. Once the file is downloaded, run the following command to unzip the file:

This might take few moments as the archive is big in size. At this moment, Hadoop should be unarchived in your current directory:

hadoop-unarchived

 

Hadoop Unarchived

Adding Hadoop user account

We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:

You should see something like this:

hadoop-unarchived

 

Ubuntu Adding User Group

Now we can add a new user to this group:

Note that I am running all commands as a root user. Now, we have a user called jdhadoopuser in the hadoop group.

Finally, we’ll provide root access to jdhadoopuser user. To do this, open the /etc/sudoers file with this command:

Now, enter this as the last line in the file:

As of now, file should look like this:

jdhadoopuser-sudo-user

 

Making root user

Hadoop Single Node Setup: Standalone Mode

Hadoop on a Single Node means that Hadoop will run as a single Java process. This mode is usually used only in debugging environments and not for production use. With this mode, we can run simple Map R programs which process a smaller amount of data.

Rename the hadoop archive as currently present to hadoop only:

Now, provide ownership of this directory to the jdhadoopuser.

A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:

Now, edit the .bashrc file to add Hadoop and Java to path using this command:

Add these lines to the end of the .bashrc file:

Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh file. In separate Hadoop installations, the location of this file can be different. To find where this file is, run the following command right outside the hadoop directory:

When I visit the directory I am shown, I can see the needed file present there:

hadoop-env-file

 

Hadoop Env file

Now, edit the file:

On the last line, enter the following and save it:

Testing Hadoop Installation on Ubuntu

We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR. Just execute the following command:

Once you execute the following command, we see the file part-r-00000 as an output:

wordcount-output

 

Output file

If you want, you can see the content of this file with following command:

Now that this example ran, this means that Hadoop has been successfully installed on your system!

Conclusion

In this lesson, we saw how we can install Apache Hadoop on an Ubuntu server and start executing sample programs with it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.

By admin

Leave a Reply

%d bloggers like this: