In this lesson on Apache Hive commands, we will go through the most common commands in Hive in HQL and perform most basic operations like creating tables, altering their schema and much more.

In this guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

ubuntu-version

Ubuntu Version

Prerequisites for running Hive Commands

Before we can proceed to run Hive queries on our machine, we need to have some other things too:

Once these things are ready, continue with the lesson to learn about running hive commands.

Launching Hive shell

With a simple command, you should be able to start the hive shell:

hive

The Hive shell will open as:

launch-hive-command

Launch Hive shell

Running Database commands

We will start by mainly executing database commands. In a later section, we will move to DDL commands which manages the table schema and data.

Creating and Describe Database with metadata

With Hive, when you create a database, it is easy to assign useful metadata to a database as description, author and much more options. Let’s try some of these options here:


CREATE DATABASE IF NOT EXISTS journaldev
COMMENT "Study BigData at JournalDev"
LOCATION '/opt/hive/warehouse/jd_db'
with DBPROPERTIES ('createdby'='shubham', 'createdfor'='JournalDev');

Now that the database is created, we can see the metadata information by describing the database:


DESCRIBE DATABASE extended journaldev;

Let’s see the output for this command:

describe-database

Describe Metadata for Database

Altering Database

The metadata assigned to the database is not permanent. We can change it with simple Alter Database command with the following syntax:


ALTER (DATABASE) database_name SET DBPROPERTIES
(property_name=property_value, ...);

We can also modify the owner of a Hive Database with similar command:

ALTER (DATABASE) database_name SET OWNER [USER|ROLE] user_or_role;

Let’s try this here now:


ALTER DATABASE journaldev
SET OWNER ROLE admin;

Now when we describe the database, we can see that owner data has been changed:

describe-database

Alter Database role

Please note that we are writing Hive commands in capital letters just for easy differentiation. The commands are case insensitive.

Display all Databases

Just like SQL, we can all databases which exist in Hive till now:

SHOW DATABASES;

 

show-databases-1

Show Databases

Using a Database

When we want to run some DDL commands in a particular database, we must select it using the following command:

USE journaldev;

Once we use a specific database, only then we can run Table related commands in it.

Hive DDL Commands

Now that we have selected a specific database, we are ready to run Table related commands in the database. Our first example will be to define a schema for our table.

Defining Table schema

Let’s start working with Tables in Hive by defining the schema for our first table. Basic syntax looks almost something like SQL here too:


CREATE  TABLE [IF NOT EXISTS] [db_name.]table_name    --
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [LOCATION hdfs_path]

Let’s put above syntax to use by defining a new table:


CREATE TABLE IF NOT EXISTS journaldev.lessons (
ID BIGINT COMMENT 'ID for each lesson contributed',
title STRING COMMENT 'title which will be shown to users',
link STRING COMMENT 'link to access the lesson')
COMMENT 'This table stores data related to lessons'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY 'n'
STORED AS TEXTFILE
LOCATION '/opt/hive/warehouse/journaldev.db/lessons';

We will receive an OK response here as well:

create-table

 

Create table

Here, we defined a very simple schema for the Hive Table with metadata with comments and delimiters.

Truncate all Data in Table

We can easily empty a Hive Table by running a simple truncate command:

TRUNCATE TABLE db_name.table_name;

Delete table

If we wish to delete an entire table with its data, we can simply delete it:

DROP TABLE [IF EXISTS] table_name [PURGE];

Note that PURGE is an important option here. If PURGE is used, data cannot be retrieved back as without this option, data of the table will go to the .Trash/current-directory. When this option is used, data will completely deleted.

Hive DML Command

Inserting data into a Hive table is easy as well. We can use the following Insert command:


INSERT INTO TABLE journaldev.lessons
VALUES (20353, 'Installing Hive on Ubuntu', 'journaldev.com/20353/install-apache-hive-ubuntu-hql-queries'), (20358, 'Installing Hadoop on Ubuntu', 'journaldev.com/20358/install-hadoop-on-ubuntu');

We can see the data we inserted as:

SELECT * FROM journaldev.lessons;

Or we can limit the data to just 1 row:

SELECT * FROM journaldev.lessons LIMIT 1;

Conclusion

In this lesson, we learned various Apache Hive commands and ran them on our Ubuntu machine. We saw how similar Hive commands are to SQL and fast as well. Real power of Hive comes in when it is run as a cluster, just like Hadoop.

Read more Big Data posts here.

By admin

Leave a Reply

%d bloggers like this: