In this lesson, we will see how we can get started with Apache Hadoop by installing it on our Ubuntu machine. Installing and running Apache Hadoop can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.
In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:
Ubuntu Version
Also, if you just want quickly explore Hadoop, read CloudEra Hadoop VMWare Single Node Environment Setup.
Prerequisite for Installing Hadoop on Ubuntu
Before we can start installing Hadoop, we need to update Ubuntu with the latest software patches available:
1 |
sudo apt-get update && sudo apt-get -y dist-upgrade |
Next, we need to install Java on the machine as Java is the main Prerequisite to run Hadoop. Java 6 and above versions are supported for Hadoop. Let’s install Java 8 for this lesson:
1 |
sudo apt-get -y install openjdk-8-jdk-headless |
To install Hadoop, make a directory and move inside it:
1 |
mkdir jd-hadoop && cd jd-hadoop |
Installing Hadoop on Ubuntu
Now that we’re ready with the basic setup for Hadoop on our Ubuntu machine, let’s download Hadoop installation files so that we can work on its configuration as well:
1 |
wget https://mirror.cc.columbia.edu/pub/software/apache/hadoop/common/hadoop-3.0.1/hadoop-3.0.1.tar.gz |
We’re going to use the Hadoop 3.0.1 version for Hadoop. Find the latest version for Hadoop here. Once the file is downloaded, run the following command to unzip the file:
1 |
tar xvzf hadoop-3.0.1.tar.gz |
This might take few moments as the archive is big in size. At this moment, Hadoop should be unarchived in your current directory:
Hadoop Unarchived
Adding Hadoop user account
We will create a separate Hadoop user on our machine to keep HDFS separate from our original file system. We can first create a User group on our machine:
1 |
addgroup hadoop |
You should see something like this:
Ubuntu Adding User Group
Now we can add a new user to this group:
1 |
useradd -G hadoop jdhadoopuser |
Note that I am running all commands as a root user. Now, we have a user called jdhadoopuser
in the hadoop
group.
Finally, we’ll provide root access to jdhadoopuser
user. To do this, open the /etc/sudoers
file with this command:
1 |
sudo visudo |
Now, enter this as the last line in the file:
1 |
jdhadoopuser ALL=(ALL) ALL |
As of now, file should look like this:
Making root user
Hadoop Single Node Setup: Standalone Mode
Hadoop on a Single Node means that Hadoop will run as a single Java process. This mode is usually used only in debugging environments and not for production use. With this mode, we can run simple Map R programs which process a smaller amount of data.
Rename the hadoop archive as currently present to hadoop
only:
1 |
mv /root/jd-hadoop/hadoop-3.0.1 /root/jd-hadoop/hadoop |
Now, provide ownership of this directory to the jdhadoopuser
.
1 |
chown -R jdhadoopuser:hadoop /root/jd-hadoop/hadoop |
A better location for Hadoop will be the /usr/local/ directory, so let’s move it there:
1 2 3 4 |
mv hadoop /usr/local/ cd /usr/local/ |
Now, edit the .bashrc
file to add Hadoop and Java to path using this command:
1 |
vi ~/.bashrc |
Add these lines to the end of the .bashrc
file:
1 2 3 4 5 6 |
# Configure Hadoop and Java Home export HADOOP_HOME=/usr/local/hadoop export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export PATH=$PATH:$HADOOP_HOME/bin |
Now, it is time to tell Hadoop as well where Java is present. We can do this by providing this path in hadoop-env.sh
file. In separate Hadoop installations, the location of this file can be different. To find where this file is, run the following command right outside the hadoop
directory:
1 |
find hadoop/ -name hadoop-env.sh |
When I visit the directory I am shown, I can see the needed file present there:
Hadoop Env file
Now, edit the file:
1 |
vi hadoop-env.sh |
On the last line, enter the following and save it:
1 |
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 |
Testing Hadoop Installation on Ubuntu
We can test Hadoop installation by executing a sample application now which comes pre-made with Hadoop, a word counter example JAR. Just execute the following command:
1 |
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.1.jar wordcount /usr/local/hadoop/README.txt /root/jd-hadoop/Output |
Once you execute the following command, we see the file part-r-00000
as an output:
Output file
If you want, you can see the content of this file with following command:
1 |
cat part-r-00000 |
Now that this example ran, this means that Hadoop has been successfully installed on your system!
Conclusion
In this lesson, we saw how we can install Apache Hadoop on an Ubuntu server and start executing sample programs with it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.