Tar files are archive files that are commonly used for backup purposes. It was developed mainly to write data to devices that do not enforce any kind of file system on data.
The tar archives got their name from "tape archive" where the backup was generally taken.
Tar archives sometimes referred to as tarball can archive files with different file systems into it. It keeps the information about the file system of files stored inside of it.
Tar files can be created using the tar utility which normally comes installed on Linux/Unix systems.
Python provides a module named tarfile which lets us read, write, and append files to tar archives.
As a part of this tutorial, we have explained how to work with tape archives (tar files) in Python using tarfile module with simple examples. Tutorial covers topics like reading contents of tar archives, creating tar archives, adding files to tar archives, retrieving member details for tar archives, extracting files from tar archives, etc. All topics are explained with simple and easy-to-understand examples.
Below, we have listed essential sections of Tutorial to give an overview of the material covered.
Below we have printed the contents of the text file that we'll be using for the explanation purposes of tar files. We have also created a tar file using Linux tar utility with this one text file, to begin with.
!cat zen_of_python.txt
!tar -cf zen_of_python.tar zen_of_python.txt
As a part of our first example, we'll explain how we can open an existing tar file and read the contents of it. The tarfile provides us with useful methods for this.
Below, we have listed signatures of important methods that can be referred to properly using various methods. We have included method definitions in all of our examples.
open(name=None,mode='r') - This method takes as input tar file name and mode of reading tar file. It returns an instance of type TarFile which has a list of methods that can be used to work with members of the tar file. File open modes are generally of the form 'mode[:compression]'. Below is a list of modes with which tar files can be opened.
Our code for this example starts by opening the archive that we created earlier and returning TarFile instance. It then uses this instance to extract text file contents using extractfile() method. We then print the contents of the text file and close the archive by calling close() method.
import tarfile
compressed_file = tarfile.open(name="zen_of_python.tar")
print("Object Type : ",compressed_file)
file_obj = compressed_file.extractfile("zen_of_python.txt")
print("\nContents of File : \n")
print(file_obj.read().decode())
compressed_file.close()
As a part of our second example, we'll explain how we can create a tar file and add files to it. The TarFile instance provides the necessary methods for it.
Our code for this example starts by creating a new archive using open() method with mode w. It then adds the file to the archive using add() method and closes it. The second part of our code opens the archive in reading mode and tries to read the contents of the member which we added earlier. It then prints the contents and closes the archive.
Please make a note that we can create an instance of **TarFile** and use it as context manager (with statement). We then don't need to call close() method to close archive.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python2.tar", mode="w")
compressed_file.add("zen_of_python.txt") ## Adding file to archive
compressed_file.close()
## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python2.tar")
print("Object Type : ",compressed_file)
file_obj = compressed_file.extractfile("zen_of_python.txt")
print("\nContents of File : \n")
print(file_obj.read().decode())
compressed_file.close()
Our code for this example explains another way of adding file to archive using TarInfo instance. We'll be using gettarinfo() and addfile() methods of TarFile for this purpose.
Our code for this example starts by creating new archive by calling open() method with w mode. It then creates TarInfo instance for text file using gettarinfo() method of TarFile instance. It then adds text file to archive using addfile() method giving it TarInfo instance and file object pointing to file contents in binary mode.
Our second part of the code then opens the archive in reading mode and reads the contents of the text file written to it to verify that the archive was created properly or not.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="w")
tar_info = compressed_file.gettarinfo("zen_of_python.txt")
compressed_file.addfile(tar_info, fileobj=open("zen_of_python.txt", "rb")) ## Adding file to archive
compressed_file.close()
## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python3.tar")
print("Object Type : ",compressed_file)
file_obj = compressed_file.extractfile("zen_of_python.txt")
print("\nContents of File : \n")
print(file_obj.read().decode())
compressed_file.close()
As a part of our third example, we'll explain how we can use TarInfo instance to retrieve details about members of the archive. We have created 3 sub examples that do the same thing but using different methods of TarFile instance. Below are few important methods of TarFile and TarInfo that we'll use in our examples.
is_tarfile(name) - This is the method of tarfile. It takes as the input file name or file-like object and returns True if its tar file else returns False.
Our code for this example has created a method that takes as input TarInfo instance and prints important details about the file using it.
The code starts by opening the existing tar file.
It then retrieves TarInfo instance for text file by calling next() method of TarFile instance.
It then calls a method that prints detail of a member passed to it and then finally closes the archive.
import tarfile
compressed_file = tarfile.open(name="zen_of_python.tar")
print("Is tarfile? : {}\n".format(tarfile.is_tarfile("zen_of_python.tar")))
print("Object Type : ",compressed_file)
def print_member_info(member):
print("File Name : {}".format(member.name))
print("File Size : {}".format(member.size))
print("File Last Modification Time : {}".format(member.mtime))
print("File Mode : {}".format(member.mode))
print("User ID : {}".format(member.uid))
print("User Name : {}".format(member.uname))
print("Group ID : {}".format(member.gid))
print("Group Name : {}".format(member.gname))
print("Is File? : {}".format(member.isfile()))
print("Is Directory? : {}".format(member.isdir()))
print("Is Symbolic Link? : {}".format(member.issym()))
print("Is Hard Link? : {}".format(member.islnk()))
print("Is Character Device? : {}".format(member.ischr()))
print("Is Block Device? : {}".format(member.isblk()))
print("Is FIFO? : {}".format(member.isfifo()))
print("Is Character Device/Block Device/FIFO? : {}\n".format(member.isdev()))
member = compressed_file.next()
print("\nMember Type : {}\n".format(member))
print_member_info(member)
compressed_file.close()
This sub-example performs the same function as the previous sub-example but using different methods of TarFile.
Our code for this example starts by opening the existing archive in reading mode. It then retrieves all members of the archive using getnames() method. It then loops through member names retrieving TarInfo for each member and printing details about a member. It then closes the archive at last.
import tarfile
compressed_file = tarfile.open(name="zen_of_python.tar")
print("Object Type : ",compressed_file)
print("\nList of File in Archive : {}".format(compressed_file.getnames()))
def print_member_info(member):
print("File Name : {}".format(member.name))
print("File Size : {}".format(member.size))
print("File Last Modification Time : {}".format(member.mtime))
print("File Mode : {}".format(member.mode))
print("User ID : {}".format(member.uid))
print("User Name : {}".format(member.uname))
print("Group ID : {}".format(member.gid))
print("Group Name : {}".format(member.gname))
print("Is File? : {}".format(member.isfile()))
print("Is Directory? : {}".format(member.isdir()))
print("Is Symbolic Link? : {}".format(member.issym()))
print("Is Hard Link? : {}".format(member.islnk()))
print("Is Character Device? : {}".format(member.ischr()))
print("Is Block Device? : {}".format(member.isblk()))
print("Is FIFO? : {}".format(member.isfifo()))
print("Is Character Device/Block Device/FIFO? : {}\n".format(member.isdev()))
for filename in compressed_file.getnames():
member = compressed_file.getmember(name=filename)
print("\nMember Type : {}\n".format(member))
print_member_info(member)
compressed_file.close()
Our third sub-example also performs the same operations as our previous two sub-examples but with using a different method of TarFile for an explanation.
Our code for this example like previous examples starts by opening the archive in reading mode. It then uses getmembers() method of TarFile to retrieve list of TarInfo instances for members of archive. It then loops through TarInfo instances printing details of each file. At last, it closes the archive.
import tarfile
compressed_file = tarfile.open(name="zen_of_python.tar")
print("Object Type : ",compressed_file)
print("\nList of File in Archive : {}".format(compressed_file.getnames()))
def print_member_info(member):
print("File Name : {}".format(member.name))
print("File Size : {}".format(member.size))
print("File Last Modification Time : {}".format(member.mtime))
print("File Mode : {}".format(member.mode))
print("User ID : {}".format(member.uid))
print("User Name : {}".format(member.uname))
print("Group ID : {}".format(member.gid))
print("Group Name : {}".format(member.gname))
print("Is File? : {}".format(member.isfile()))
print("Is Directory? : {}".format(member.isdir()))
print("Is Symbolic Link? : {}".format(member.issym()))
print("Is Hard Link? : {}".format(member.islnk()))
print("Is Character Device? : {}".format(member.ischr()))
print("Is Block Device? : {}".format(member.isblk()))
print("Is FIFO? : {}".format(member.isfifo()))
print("Is Character Device/Block Device/FIFO? : {}\n".format(member.isdev()))
for member in compressed_file.getmembers():
print("Member Type : {}\n".format(member))
print_member_info(member)
compressed_file.close()
As a part of our fourth example, we'll explain with simple sub-examples how we can create archives with different compression formats. We'll also compare archive sizes at the end of trying all compression formats. The tarfile let us compress contents using gzip, bzip2, and lzma compression methods.
Our code for this example is exactly the same as our code for the second example with minor changes where we demonstrated how we can create an archive and add members to it. Our code opens archive in w:gz mode in order to force gzip compression. The code then writes a text file to the archive and closes it. The second part of the code then opens the archive in r:gz mode and reads the contents of the file to verify it.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python.tar.gz", mode="w:gz")
compressed_file.add("zen_of_python.txt") ## Adding file to archive
compressed_file.close()
## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python.tar.gz", mode="r:gz")
print("Object Type : ",compressed_file)
file_obj = compressed_file.extractfile("zen_of_python.txt")
print("\nContents of File : \n")
print(file_obj.read().decode())
compressed_file.close()
Below we have listed both tar files in order to compare sizes. We can notice that our gzip compressed tar file has quite less size compared to tar file without compression which we created as a part of our second example.
!ls -lrt zen_of_python.tar*
Our code for this sub-example is exactly the same as our code for the previous example with the only change that we are using bzip2 compression in this example.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python.tar.bz2", mode="w:bz2")
compressed_file.add("zen_of_python.txt") ## Adding file to archive
compressed_file.close()
## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python.tar.bz2", mode="r:bz2")
print("Object Type : ",compressed_file)
file_obj = compressed_file.extractfile("zen_of_python.txt")
print("\nContents of File : \n")
print(file_obj.read().decode())
compressed_file.close()
Below we are again comparing tar file sizes for verification purposes.
!ls -lrt zen_of_python.tar*
Our third sub-example is exactly the same as our previous two sub-examples but with the only change that it uses lzma compression.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python.tar.xz", mode="w:xz")
compressed_file.add("zen_of_python.txt") ## Adding file to archive
compressed_file.close()
## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python.tar.xz", mode="r:xz")
print("Object Type : ",compressed_file)
file_obj = compressed_file.extractfile("zen_of_python.txt")
print("\nContents of File : \n")
print(file_obj.read().decode())
compressed_file.close()
Below we are again comparing all tar files to check compression performed by different compression methods based on file sizes.
!ls -lrt zen_of_python.tar*
As a part of our fifth example, we are demonstrating how we can append files to an existing archive.
Our code for this example starts by opening the existing archive that we created in our previous example in a mode. It then adds one JPEG, one PDF, and one GIF file to archive. It then closes the archive.
The second part of our code opens the archive in reading mode and lists the contents of the archive. The list() method prints members of the archive.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="a")
compressed_file.add("dr_apj_kalam.jpeg") ## Adding file to archive
compressed_file.add("Deploying a Django Application to Google App Engine.pdf") ## Adding file to archive
compressed_file.add("intro_ball.gif") ## Adding file to archive
compressed_file.close()
## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python3.tar")
print("Object Type : ",compressed_file)
print("Archive Members : {}\n".format(compressed_file.getnames()))
compressed_file.list()
compressed_file.close()
As a part of our sixth example, we'll explain with simple sub-examples how we can extract files from an existing archive to the current path or different location. We'll be using the below methods for our purpose.
Our code for this example starts by opening an existing archive created in one of our previous examples in reading mode. It then calls extractall() method giving it folder name to extract all members of the archive. We then close the archive. We are also verifying extraction at the end.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")
folder_name = "zen_of_python_all"
print("Extracting Files to Folder : {}".format(folder_name))
compressed_file.extractall(folder_name)
compressed_file.close()
!ls -lrt zen_of_python_all/
Our code for this example starts by opening the existing archive in reading mode. It then calls extractall() method but this time includes logic to keep only JPEG and GIF files in the archive and excluding all other files. We check file names ending with .jpeg and .gif for this purpose and give list to members parameter of extractall() method.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")
folder_name = "zen_of_python_images"
print("Extracting Files to Folder : {}".format(folder_name))
compressed_file.extractall(path=folder_name, members=[member for member in compressed_file.getmembers() if ".jpeg" in member.name or ".gif" in member.name])
compressed_file.close()
!ls -lrt zen_of_python_images/
Our code for this example is almost the same as our code from the previous example with the only change that we are using extract() method to extract individual members. We are again looping through all members and extracting only JPEG and GIF files.
import tarfile
### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")
folder_name = "zen_of_python_images2"
print("Extracting Files to Folder : {}".format(folder_name))
for file_name in compressed_file.getnames():
if ".jpeg" in file_name or ".gif" in file_name:
compressed_file.extract(file_name, folder_name)
compressed_file.close()
!ls -lrt zen_of_python_images2/
Our code for this example is almost the same as our code from previous examples. We loop through each member of the archive and retrieve their contents using extractfile() method. We then write them by creating a file. We are also comparing the contents of the file at the end.
import tarfile
import os
### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")
folder_name = compressed_file.name[:-4]
os.makedirs(folder_name, exist_ok=True)
for member_name in compressed_file.getnames():
file_content = compressed_file.extractfile(member_name)
with open(os.path.join(folder_name,member_name), "wb") as fp:
fp.write(file_content.read())
compressed_file.close()
!ls -lrt zen_of_python3/
This ends our small tutorial explaining how we can use Python tarfile module to work with tape archives.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to