In this tutorial, we’ll be discussing and implementing the Huffman Coding Algorithm in Java.

Huffman Coding

The Huffman Coding Algorithm was discovered by David A. Huffman in the 1950s. The purpose of the Algorithm is lossless data compression.

This algorithm is commonly used in JPEG Compression.

Now traditionally to encode/decode a string, we can use ASCII values. But this doesn’t compress it. The number of bits involved in encoding the string isn’t reduced.

Unlike ASCII codes, Huffman codes use lesser number of bits.

The principle of Huffman code is based on the frequency of each data item.

Every data item is assigned a variable length of prefix code (typically a binary string).

The length of each prefix code is based on the frequency such that the most commonly occurring data has the smallest prefix code.

No two different data items would have the same prefix. Hence the prefixes are known as Prefix Codes. This is important to make sure decoded string is the same as the original string.

The more often the data is used the lesser bits it takes up.

The Huffman algorithm is a greedy algorithm. Since at every stage it looks for the best available option.A Disadvantage of Huffman codes is that a minor change in any bit of the encoded string would break the whole message.

The basic idea of the Huffman Algorithm is building a Tree from bottom up.

The steps involved in building the Huffman Tree are:

  • Get Frequency of each character and store it in a Map.
  • Create a node for each character with its frequency and insert it into a Min Priority Queue.
  • Extract the two nodes with the minimum frequency from the priority queue.
  • Create a new node by merging these two nodes as children and with weight equal to the sum of the two nodes frequencies.
  • Add this back to the Priority Queue.
  • Repeat till the Priority Queue has only one node left. That node becomes the root of the Huffman Tree.

Once the tree is built, to find the prefix code of each character we traverse the tree as:

  • Starting at the top when you go left, append 0 to the prefix code string.
  • When you go right, append 1.
  • Stop when you have reached the Leaf nodes.
  • The string of 0 and 1s created till now is the prefix code of that particular Node in the tree.

During decoding, we just need to print the character of each leaf traversed by the above prefix code in the Huffman tree.

All Input characters are present only in the leaves of the Huffman tree.

An example of a Huffman tree is given below:

The Huffman Coding

The string to be encoded needs the prefix codes for all the characters built in a bottom-up manner.

The internal node of any two Nodes should have a non-character set to it.

Huffman Coding Algorithm Implementation

The code for the HuffmanNode class is given below:

It holds the character and its frequency.

The comparable interface is implemented. This would be used when comparing the values in the PriorityQueue in order to get the minimum value always.

The code for the class is given below:

So our string to be compressed and encoded/decoded is “ABA11$A”.

In encoding method we store the frequencies of each character in a map first.

Then we call the buildTree method using the approach discussed before.

We save the prefix code for each character in a HashMap inside the setPrefixCode method.

In the decode method, we just traverse the encoded string left or right based on 0 or 1. Once we reach the leaf, we append that leaf data to the StringBuilder and again start from the top for the remaining encoded string.

Following is the output of the above program:

Huffman Coding Algorithm Implementation Example

You can checkout complete code and more DS & Algorithm examples from our GitHub Repository.

Reference: Wikipedia

By admin

Leave a Reply