Python tarfile module is used to read and write tar archives. Python provides us excellent tools and modules to manage compressed files, which includes (but not limited to) performing file and directory compression with different mechanisms like gzip, bz2 and lzma compression.
In this post, we will see various practical demonstrations of Python tarfile module functions. This is similar to python zip function. Let’s get started.
Python tarfile module
Python tarfile module provides us functions to perform various operations like:
- read and write gzip, bz2 and lzma archives
- read and write POSIX.1-1988 (ustar) format
- read and write support for GNU tar format
- read and write gzip, bz2 and lzma archives
Apart from these features, we can also handle directories and restore file information like timestamp, access permissions and owner.
Checking validity of TAR files
We will start by a simplest example of checking if a file is a valid TAR file. We will use is_tarfile()
function to do this:
1 2 3 4 5 6 7 8 |
import tarfile for file_name in [ 'README.txt', 'example.tar.gz' ]: try: print(file_name, tarfile.is_tarfile(filename)) except (IOError, err): print(file_name, err) |
Let’s run this example and check the output:
Note that these files should exist in the directory you run this script in.
Reading TAR file metadata
In this section, we will study metadata related to a TAR file like what files does it contain, using the open()
and getnames()
function:
1 2 3 4 5 6 |
import tarfile t = tarfile.open('example.tar.gz', 'r') print("Files in TAR file:") print(t.getnames()) |
Let’s run this example and check the output:
Note that, we just put sample files in this TAR to demonstrate.
Let’s get a little deep in getting the file’s metadata before moving on to next example. We will print its size and much more information related to it:
1 2 3 4 5 6 7 8 9 10 11 |
import tarfile import time t = tarfile.open('example.tar.gz', 'r') for info in t.getmembers(): print(info.name) print('Modified:', time.ctime(info.mtime)) print('Mode :', oct(info.mode)) print('Type :', info.type) print('Size :', info.size, 'bytes') |
When we run this program, we can see much more information related to the files:
Here, we will extract files from the archive file:
1 2 3 4 5 6 7 8 9 10 11 |
import tarfile t = tarfile.open('example.tar.gz', 'r') for file_name in [ 'TarFolder/README.txt', 'TarFolder/tarfile_validity.py' ]: try: f = t.extractfile(file_name) except KeyError: print('ERROR: Did not find %s in tar archive' % file_name) else: print(file_name, ':', f.readlines()) |
Let’s run this example and check the output:
Adding Files to an Archive
Here, we will add files to an archive file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import tarfile print('creating archive') out = tarfile.open('example.tar.gz', mode="w") try: print('adding README.txt') out.add('README.txt') finally: print('closing tar archive') out.close() print('Contents of archived file:') t = tarfile.open('example.tar.gz', 'r') for member in t.getmembers(): print(member.name) |
Let’s run this example and check the output:
Here, it is worth noticing that ‘w’ doesnn’t preserve previous contents of the file. We can instead use ‘a’ mode to append files to an archive.
Appending Files to an Archive
Here, we will append files to an archive file, instead of using the ‘w’ mode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import tarfile print('creating archive') out = tarfile.open('example.tar.gz', mode="a") try: print('adding README.txt') out.add('README.txt') finally: print('closing tar archive') out.close() print('Contents of archived file:') t = tarfile.open('example.tar.gz', 'r') for member in t.getmembers(): print(member.name) |
Let’s run this example and check the output:
Clearly, after adding README.txt to the TAR, there now exists 2 files.
Reference: API Documentation.