In this lesson on Apache Hive commands, we will go through the most common commands in Hive in HQL and perform most basic operations like creating tables, altering their schema and much more.
In this guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:
Ubuntu Version
Prerequisites for running Hive Commands
Before we can proceed to run Hive queries on our machine, we need to have some other things too:
Once these things are ready, continue with the lesson to learn about running hive commands.
Launching Hive shell
With a simple command, you should be able to start the hive shell:
hive
The Hive shell will open as:
Launch Hive shell
Running Database commands
We will start by mainly executing database commands. In a later section, we will move to DDL commands which manages the table schema and data.
Creating and Describe Database with metadata
With Hive, when you create a database, it is easy to assign useful metadata to a database as description, author and much more options. Let’s try some of these options here:
CREATE DATABASE IF NOT EXISTS journaldev
COMMENT "Study BigData at JournalDev"
LOCATION '/opt/hive/warehouse/jd_db'
with DBPROPERTIES ('createdby'='shubham', 'createdfor'='JournalDev');
Now that the database is created, we can see the metadata information by describing the database:
DESCRIBE DATABASE extended journaldev;
Let’s see the output for this command:
Describe Metadata for Database
Altering Database
The metadata assigned to the database is not permanent. We can change it with simple Alter Database command with the following syntax:
ALTER (DATABASE) database_name SET DBPROPERTIES
(property_name=property_value, ...);
We can also modify the owner of a Hive Database with similar command:
ALTER (DATABASE) database_name SET OWNER [USER|ROLE] user_or_role;
Let’s try this here now:
ALTER DATABASE journaldev
SET OWNER ROLE admin;
Now when we describe the database, we can see that owner data has been changed:
Alter Database role
Please note that we are writing Hive commands in capital letters just for easy differentiation. The commands are case insensitive.
Display all Databases
Just like SQL, we can all databases which exist in Hive till now:
SHOW DATABASES;
Show Databases
Using a Database
When we want to run some DDL commands in a particular database, we must select it using the following command:
USE journaldev;
Once we use a specific database, only then we can run Table related commands in it.
Hive DDL Commands
Now that we have selected a specific database, we are ready to run Table related commands in the database. Our first example will be to define a schema for our table.
Defining Table schema
Let’s start working with Tables in Hive by defining the schema for our first table. Basic syntax looks almost something like SQL here too:
CREATE TABLE [IF NOT EXISTS] [db_name.]table_name --
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[LOCATION hdfs_path]
Let’s put above syntax to use by defining a new table:
CREATE TABLE IF NOT EXISTS journaldev.lessons (
ID BIGINT COMMENT 'ID for each lesson contributed',
title STRING COMMENT 'title which will be shown to users',
link STRING COMMENT 'link to access the lesson')
COMMENT 'This table stores data related to lessons'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY 'n'
STORED AS TEXTFILE
LOCATION '/opt/hive/warehouse/journaldev.db/lessons';
We will receive an OK response here as well:
Create table
Here, we defined a very simple schema for the Hive Table with metadata with comments and delimiters.
Truncate all Data in Table
We can easily empty a Hive Table by running a simple truncate command:
TRUNCATE TABLE db_name.table_name;
Delete table
If we wish to delete an entire table with its data, we can simply delete it:
DROP TABLE [IF EXISTS] table_name [PURGE];
Note that PURGE
is an important option here. If PURGE
is used, data cannot be retrieved back as without this option, data of the table will go to the .Trash/current-directory
. When this option is used, data will completely deleted.
Hive DML Command
Inserting data into a Hive table is easy as well. We can use the following Insert command:
INSERT INTO TABLE journaldev.lessons
VALUES (20353, 'Installing Hive on Ubuntu', 'journaldev.com/20353/install-apache-hive-ubuntu-hql-queries'), (20358, 'Installing Hadoop on Ubuntu', 'journaldev.com/20358/install-hadoop-on-ubuntu');
We can see the data we inserted as:
SELECT * FROM journaldev.lessons;
Or we can limit the data to just 1 row:
SELECT * FROM journaldev.lessons LIMIT 1;
Conclusion
In this lesson, we learned various Apache Hive commands and ran them on our Ubuntu machine. We saw how similar Hive commands are to SQL and fast as well. Real power of Hive comes in when it is run as a cluster, just like Hadoop.
Read more Big Data posts here.