Installing Apache Hive on Ubuntu and Running HQL Queries With Examples

In this lesson, we will see how we can get started with Apache Hive by installing it on our Ubuntu machine and verifying the installation by running some Hive DDL Commands as well. Installing and running Apache Hive can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.

In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

ubuntu-version

Ubuntu Version

Prerequisites for Hive Installation

Before we can proceed to Hive Installation on our machine, we need to have some other things installed as well:

Java Setup

Before we can start installing Hive, we need to update Ubuntu with the latest software patches available:

Next, we need to install Java on the machine as Java is the main Prerequisite to run Hive and Hadoop. Java 6 and above versions are supported for Hive. Let’s install Java 8 for this lesson:

Getting Started with Hive Installation

We are ready to start downloading Hive once you have installed Java and Hadoop based on instructions presented above.

Find all Hive installation files on Apache Hive archives. Now, run the following set of commands to make a new directory and download the latest available Hive installation archive from the mirror site:

With this, a new file apache-hive-2.3.3-bin.tar.gz will be downloaded on the system:

ubuntu-version

Downloading Hive

Let us uncompress this file now:

Now, the periods in the file name might not be accepted as valid identifiers on the path variables in Ubuntu. To avoid these issues, rename the unarchived directory:

Once this is done, we need to add Hive home directory to path. Run the following set of commands to edit the .bashrc file:

Add the following lines in the .bashrc file and save it:

Now, to make environment variables come into effect, source the .bashrc file:

Note that path to Hadoop is already set in our file and overall configuration is done:

If you want to confirm that Hadoop is correctly working, just check its version:

hadoop-version

Check Hadoop version

Now, we need to configure the directory information where Hive can store data into Hadoop Distributed File System (HDFS). For this, we will make a new directory:

Once this is done, we have the last configuration to do before we can launch the Hive shell. We need to inform hive about the database that it should use for its schema definition. We execute the following line so that Hive can initialize the metastore schema:

When we execute the command, we will see the following success output:

hive-schema-initialization

Hive metastore schema initialization

Starting the Hive Shell

After all this configuration is done, Hive can be launched with a single and simple command:

If everything worked correctly, you should see the hive shell appearing magically:

hive-schema-initialization

Starting Hive shell

Using the Hive Shell

Now that we have a Hive shell running, we will put it to use with some basic Hive DDL Commands in which we will use Hive Query language (HQL).

HQL: Creating a Database

Like any other Database, we can start using Hive only after we make a Database. Let’s do this now:

We will see the following output:

hadoop-version

Create Database in Hive

A better way to create a database is by checking if the DB doesn’t exist already:

We will see the same output here as well:

hive-schema-initialization

Create Database in Hive, if not exists

Now we can show databases which exist in Hive:

This will result in the following:

hive-schema-initialization

Show Databases using HQL

HQL: Creating Tables

We have an active database present where we can create some tables as well. To do this, first switch to the DB you want to use:

Now, create a new table inside this DB with some fields:

Once this table is created, we can show its schema as:

We will see the following output:

describe-table

 

Table metadata

HQL: Inserting Data into Tables

As final commands, let us insert a record in the table we just created:

We will see a long output as Hive, with the help of Hadoop starts MapReduce Jobs to fulfill the data insertion into the warehouse we created. The output will:

 

Insert Data into Hive

Finally, we can see the data in Hive as:

 

Show all data in Hive

Conclusion

In this lesson, we saw how we can install Apache Hive on an Ubuntu server and start executing sample HQL Queries in it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.

By admin

Leave a Reply

%d bloggers like this: