Updated On : Aug-31,2021 Tags tape-archives, tar-files
tarfile - Simple Guide to Work with Tape Archives in Python

tarfile: Simple Guide to Work with Tape Archives(TAR Files) in Python

Tar files are archive files that are commonly used for backup purposes. The tar archives got their name from "tape archive" where the backup was generally taken. Tar archives sometimes referred to as tarball can archive files with different file systems into it. It keeps the information about the file system of files store inside of it. It was developed mainly to write data to devices that do not enforce any kind of file system on data. Tar files can be created using the tar utility which normally comes installed on Linux/Unix systems.

Python provides a module named tarfile which lets us read, write, append files to tar files. As a part of this tutorial, we'll explain the API of tarfile module with simple examples.

Below we have printed the contents of the text file that we'll be using for the explanation purposes of tar files. We have also created a tar file using Linux tar utility with this one text file, to begin with.

In [16]:
!cat zen_of_python.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
    
In [17]:
!tar -cf zen_of_python.tar zen_of_python.txt

Example 1: Read TAR Archive

As a part of our first example, we'll explain how we can open an existing tar file and read the contents of it. The tarfile provides us with useful methods for this.


open(name=None,mode='r') - This method takes as input tar file name and mode of reading tar file. It returns an instance of type TarFile which has a list of methods that can be used to work with members of the tar file. File open modes are generally of the form 'mode[:compression]'. Below is a list of modes with which tar files can be opened.

  • 'r'/'r:*' - It opens the archive for reading with transparent compression.
  • 'r:' - It opens the archive for reading without compression.
  • 'r:gz' - It opens the archive for reading with gzip compression.
  • 'r:bz2' - It opens the archive for reading with bzip2 compression.
  • 'r:xz' - It opens the archive for reading with lzma compression.
  • 'x'/'x:' - It creates archive without compression.
  • 'x:gz' - It creates an archive for writing with gzip compression.
  • 'x:bz2' - It creates an archive for writing with bzip2 compression.
  • 'x:xz' - It creates an archive for writing with lzma compression.
  • 'w'/'w:' - It opens the archive for writing without compression.
  • 'w:gz' - It opens the archive for writing with gzip compression.
  • 'w:bz2' - It opens the archive for writing with bzip2 compression.
  • 'w:xz' - It opens the archive for writing with lzma compression.
  • 'a'/'a:' - It opens the archive for appending without compression.

Important Methods of TarFile Object

  • extractfile(member) - This method takes as input member name as string or TarInfo instance and returns file-like object. The TarInfo is a special kind of object used by tarfile for maintaining information about members of the tar file which we'll explain in upcoming examples.
  • close() - It closes opened archive.

Our code for this example starts by opening the archive that we created earlier and returning TarFile instance. It then uses this instance to extract text file contents using extractfile() method. We then print the contents of the text file and close the archive by calling close() method.

In [37]:
import tarfile

compressed_file = tarfile.open(name="zen_of_python.tar")

print("Object Type : ",compressed_file)

file_obj = compressed_file.extractfile("zen_of_python.txt")

print("\nContents of File : \n")
print(file_obj.read().decode())

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59b8d7b550>

Contents of File :


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Example 2: Create TAR Archive and Add Files

As a part of our second example, we'll explain how we can create a tar file and add files to it. The TarFile instance provides the necessary methods for it.


Important Methods of TarFile Object

  • add(name,arcname=None,recursive=True,filter=None) - This method accepts the name of the file that we want to add to archive. The name can be the file name as well as the directory name.
    • The arcname parameter accepts string specifying the name that we want to use for members in the archive.
    • The recursive parameter accepts a boolean value. The default value is True which will add all subdirectories inside of the given directory if the member name give is directory. We can avoid this by setting this parameter to False and it won't add subdirectories in the archive.
    • The filter parameter accepts function that takes as input TafInfo instance and returns modified TarInfo instance. If it returns None then that member will be ignored. We can use this parameter to filter out members that we don't want to add to the archive.

2.1

Our code for this example starts by creating a new archive using open() method with mode w. It then adds the file to the archive using add() method and closes it. The second part of our code opens the archive in reading mode and tries to read the contents of the member which we added earlier. It then prints the contents and closes the archive.


NOTE

Please make a note that we can create an instance of **TarFile** and use it as context manager (with statement). We then don't need to call close() method to close archive.

In [39]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python2.tar", mode="w")

compressed_file.add("zen_of_python.txt") ## Adding file to archive

compressed_file.close()

## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python2.tar")

print("Object Type : ",compressed_file)

file_obj = compressed_file.extractfile("zen_of_python.txt")

print("\nContents of File : \n")
print(file_obj.read().decode())

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59b8de96a0>

Contents of File :


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

2.2

Our code for this example explains another way of adding file to archive using TarInfo instance. We'll be using gettarinfo() and addfile() methods of TarFile for this purpose.


Important Methods of TarFile Object.

  • gettarinfo(name=None,arcname=None) - This method takes as input file name and return TarInfo instance for the file. The details to create TarInfo object is retrieved using os.stat() function. We can provide alternative file name to be used inside of archive using arcname parameter.
  • addfile(tarinfo,fileobj=None) - This method takes as input TarInfo instance and adds file to archive. We can provide file object to fileobj parameter from which file contents will be read and added to the archive.

Our code for this example starts by creating new archive by calling open() method with w mode. It then creates TarInfo instance for text file using gettarinfo() method of TarFile instance. It then adds text file to archive using addfile() method giving it TarInfo instance and file object pointing to file contents in binary mode.

Our second part of the code then opens the archive in reading mode and reads the contents of the text file written to it to verify that the archive was created properly or not.

In [96]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="w")

tar_info = compressed_file.gettarinfo("zen_of_python.txt")

compressed_file.addfile(tar_info, fileobj=open("zen_of_python.txt", "rb")) ## Adding file to archive

compressed_file.close()

## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python3.tar")

print("Object Type : ",compressed_file)

file_obj = compressed_file.extractfile("zen_of_python.txt")

print("\nContents of File : \n")
print(file_obj.read().decode())

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59aacce0f0>

Contents of File :


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Example 3: Print Details about Members of TAR Archive

As a part of our third example, we'll explain how we can use TarInfo instance to retrieve details about members of the archive. We have created 3 sub examples that do the same thing but using different methods of TarFile instance. Below are few important methods of TarFile and TarInfo that we'll use in our examples.


is_tarfile(name) - This is the method of tarfile. It takes as the input file name or file-like object and returns True if its tar file else returns False.

Important Methods of TarFile Object

  • next() - This method returns TarInfo instance for next member of archive. We can call it more than once to retrieve TarInfo for all members of the archive.

Important Methods and Attributes of TarInfo Object

  • name - It returns member name.
  • size - It returns member size in bytes.
  • mtime - It returns last modification time of member.
  • mode - It returns permission bits.
  • uid - It returns user id of user who added this member.
  • uname - It returns user name of user who added this member.
  • gid - It returns group id of user who added this member.
  • gname - It returns group name of user who added this member.
  • isfile() - This method returns True if member is File else False.
  • isdir() - This method returns True if member is directory else False.
  • issym() - This method returns True if member is symbolic link else False.
  • islnk() - This method returns True if member is hard link else False.
  • ischr() - This method returns True if member is character device else False.
  • isblk() - This method returns True if member is block device else False.
  • isfifo() - This method returns True if member is FIFO else False.
  • isdev() - This method returns True if member is character device, block device or FIFO else False.

3.1

Our code for this example has created a method that takes as input TarInfo instance and prints important details about the file using it. The code starts by opening the existing tar file. It then retrieves TarInfo instance for text file by calling next() method of TarFile instance. It then calls a method that prints detail of a member passed to it and then finally closes the archive.

In [78]:
import tarfile

compressed_file = tarfile.open(name="zen_of_python.tar")

print("Is tarfile? : {}\n".format(tarfile.is_tarfile("zen_of_python.tar")))
print("Object Type : ",compressed_file)

def print_member_info(member):
    print("File Name : {}".format(member.name))
    print("File Size : {}".format(member.size))
    print("File Last Modification Time : {}".format(member.mtime))
    print("File Mode : {}".format(member.mode))
    print("User ID : {}".format(member.uid))
    print("User Name : {}".format(member.uname))
    print("Group ID : {}".format(member.gid))
    print("Group Name : {}".format(member.gname))
    print("Is File? : {}".format(member.isfile()))
    print("Is Directory? : {}".format(member.isdir()))
    print("Is Symbolic Link? : {}".format(member.issym()))
    print("Is Hard Link? : {}".format(member.islnk()))
    print("Is Character Device? : {}".format(member.ischr()))
    print("Is Block Device? : {}".format(member.isblk()))
    print("Is FIFO? : {}".format(member.isfifo()))
    print("Is Character Device/Block Device/FIFO? : {}\n".format(member.isdev()))

member = compressed_file.next()
print("\nMember Type : {}\n".format(member))
print_member_info(member)

compressed_file.close()
Is tarfile? : True

Object Type :  <tarfile.TarFile object at 0x7f59aaf32208>

Member Type : <TarInfo 'zen_of_python.txt' at 0x7f59b8f63048>

File Name : zen_of_python.txt
File Size : 862
File Last Modification Time : 1617856505
File Mode : 420
User ID : 1001
User Name : sunny
Group ID : 1001
Group Name : sunny
Is File? : True
Is Directory? : False
Is Symbolic Link? : False
Is Hard Link? : False
Is Character Device? : False
Is Block Device? : False
Is FIFO? : False
Is Character Device/Block Device/FIFO? : False

3.2

This sub-example performs the same function as the previous sub-example but using different methods of TarFile.


Important Methods of TarFile Object

  • getnames() - This method returns list of strings where each string represents one archive member.
  • getmember(member_name) - This method takes as input member name and returns TarInfo instance for that member.

Our code for this example starts by opening the existing archive in reading mode. It then retrieves all members of the archive using getnames() method. It then loops through member names retrieving TarInfo for each member and printing details about a member. It then closes the archive at last.

In [65]:
import tarfile

compressed_file = tarfile.open(name="zen_of_python.tar")

print("Object Type : ",compressed_file)

print("\nList of File in Archive : {}".format(compressed_file.getnames()))

def print_member_info(member):
    print("File Name : {}".format(member.name))
    print("File Size : {}".format(member.size))
    print("File Last Modification Time : {}".format(member.mtime))
    print("File Mode : {}".format(member.mode))
    print("User ID : {}".format(member.uid))
    print("User Name : {}".format(member.uname))
    print("Group ID : {}".format(member.gid))
    print("Group Name : {}".format(member.gname))
    print("Is File? : {}".format(member.isfile()))
    print("Is Directory? : {}".format(member.isdir()))
    print("Is Symbolic Link? : {}".format(member.issym()))
    print("Is Hard Link? : {}".format(member.islnk()))
    print("Is Character Device? : {}".format(member.ischr()))
    print("Is Block Device? : {}".format(member.isblk()))
    print("Is FIFO? : {}".format(member.isfifo()))
    print("Is Character Device/Block Device/FIFO? : {}\n".format(member.isdev()))

for filename in compressed_file.getnames():
    member = compressed_file.getmember(name=filename)
    print("\nMember Type : {}\n".format(member))
    print_member_info(member)

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59b8f469b0>

List of File in Archive : ['zen_of_python.txt']

Member Type : <TarInfo 'zen_of_python.txt' at 0x7f59b8e45430>

File Name : zen_of_python.txt
File Size : 862
File Last Modification Time : 1617856505
File Mode : 420
User ID : 1001
User Name : sunny
Group ID : 1001
Group Name : sunny
Is File? : True
Is Directory? : False
Is Symbolic Link? : False
Is Hard Link? : False
Is Character Device? : False
Is Block Device? : False
Is FIFO? : False
Is Character Device/Block Device/FIFO? : False

3.3

Our third sub-example also performs the same operations as our previous two sub-examples but with using a different method of TarFile for an explanation.


Important Methods of TarFile Object

  • getmembers() - This method returns list of TarInfo instances for members of archive.

Our code for this example like previous examples starts by opening the archive in reading mode. It then uses getmembers() method of TarFile to retrieve list of TarInfo instances for members of archive. It then loops through TarInfo instances printing details of each file. At last, it closes the archive.

In [66]:
import tarfile

compressed_file = tarfile.open(name="zen_of_python.tar")

print("Object Type : ",compressed_file)

print("\nList of File in Archive : {}".format(compressed_file.getnames()))

def print_member_info(member):
    print("File Name : {}".format(member.name))
    print("File Size : {}".format(member.size))
    print("File Last Modification Time : {}".format(member.mtime))
    print("File Mode : {}".format(member.mode))
    print("User ID : {}".format(member.uid))
    print("User Name : {}".format(member.uname))
    print("Group ID : {}".format(member.gid))
    print("Group Name : {}".format(member.gname))
    print("Is File? : {}".format(member.isfile()))
    print("Is Directory? : {}".format(member.isdir()))
    print("Is Symbolic Link? : {}".format(member.issym()))
    print("Is Hard Link? : {}".format(member.islnk()))
    print("Is Character Device? : {}".format(member.ischr()))
    print("Is Block Device? : {}".format(member.isblk()))
    print("Is FIFO? : {}".format(member.isfifo()))
    print("Is Character Device/Block Device/FIFO? : {}\n".format(member.isdev()))

for member in compressed_file.getmembers():
    print("Member Type : {}\n".format(member))
    print_member_info(member)

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59b8f46940>

List of File in Archive : ['zen_of_python.txt']
Member Type : <TarInfo 'zen_of_python.txt' at 0x7f59b8e66cc8>

File Name : zen_of_python.txt
File Size : 862
File Last Modification Time : 1617856505
File Mode : 420
User ID : 1001
User Name : sunny
Group ID : 1001
Group Name : sunny
Is File? : True
Is Directory? : False
Is Symbolic Link? : False
Is Hard Link? : False
Is Character Device? : False
Is Block Device? : False
Is FIFO? : False
Is Character Device/Block Device/FIFO? : False

Example 4: Create TAR Archive with Different Compression Formats

As a part of our fourth example, we'll explain with simple sub-examples how we can create archives with different compression formats. We'll also compare archive sizes at the end of trying all compression formats. The tarfile let us compress contents using gzip, bzip2, and lzma compression methods.

4.1

Our code for this example is exactly the same as our code for the second example with minor changes where we demonstrated how we can create an archive and add members to it. Our code opens archive in w:gz mode in order to force gzip compression. The code then writes a text file to the archive and closes it. The second part of the code then opens the archive in r:gz mode and reads the contents of the file to verify it.

In [88]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python.tar.gz", mode="w:gz")

compressed_file.add("zen_of_python.txt") ## Adding file to archive

compressed_file.close()

## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python.tar.gz", mode="r:gz")

print("Object Type : ",compressed_file)

file_obj = compressed_file.extractfile("zen_of_python.txt")

print("\nContents of File : \n")
print(file_obj.read().decode())

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59aaf0d1d0>

Contents of File :


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Below we have listed both tar files in order to compare sizes. We can notice that our gzip compressed tar file has quite less size compared to tar file without compression which we created as a part of our second example.

In [89]:
!ls -lrt zen_of_python.tar*
-rw-r--r-- 1 sunny sunny 10240 Apr  8 10:05 zen_of_python.tar
-rw-r--r-- 1 sunny sunny   598 Apr  8 10:50 zen_of_python.tar.gz

4.2

Our code for this sub-example is exactly the same as our code for the previous example with the only change that we are using bzip2 compression in this example.

In [90]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python.tar.bz2", mode="w:bz2")

compressed_file.add("zen_of_python.txt") ## Adding file to archive

compressed_file.close()

## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python.tar.bz2", mode="r:bz2")

print("Object Type : ",compressed_file)

file_obj = compressed_file.extractfile("zen_of_python.txt")

print("\nContents of File : \n")
print(file_obj.read().decode())

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59aaa1b898>

Contents of File :


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Below we are again comparing tar file sizes for verification purposes.

In [91]:
!ls -lrt zen_of_python.tar*
-rw-r--r-- 1 sunny sunny 10240 Apr  8 10:05 zen_of_python.tar
-rw-r--r-- 1 sunny sunny   598 Apr  8 10:50 zen_of_python.tar.gz
-rw-r--r-- 1 sunny sunny   598 Apr  8 10:50 zen_of_python.tar.bz2

4.3

Our third sub-example is exactly the same as our previous two sub-examples but with the only change that it uses lzma compression.

In [92]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python.tar.xz", mode="w:xz")

compressed_file.add("zen_of_python.txt") ## Adding file to archive

compressed_file.close()

## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python.tar.xz", mode="r:xz")

print("Object Type : ",compressed_file)

file_obj = compressed_file.extractfile("zen_of_python.txt")

print("\nContents of File : \n")
print(file_obj.read().decode())

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59aacc3630>

Contents of File :


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Below we are again comparing all tar files to check compression performed by different compression methods based on file sizes.

In [93]:
!ls -lrt zen_of_python.tar*
-rw-r--r-- 1 sunny sunny 10240 Apr  8 10:05 zen_of_python.tar
-rw-r--r-- 1 sunny sunny   598 Apr  8 10:50 zen_of_python.tar.gz
-rw-r--r-- 1 sunny sunny   598 Apr  8 10:50 zen_of_python.tar.bz2
-rw-r--r-- 1 sunny sunny   664 Apr  8 10:50 zen_of_python.tar.xz

Example 5: Append File to Existing TAR Archive

As a part of our fifth example, we are demonstrating how we can append files to an existing archive.

Our code for this example starts by opening the existing archive that we created in our previous example in a mode. It then adds one JPEG, one PDF, and one GIF file to archive. It then closes the archive.

The second part of our code opens the archive in reading mode and lists the contents of the archive. The list() method prints members of the archive.

In [97]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="a")

compressed_file.add("dr_apj_kalam.jpeg") ## Adding file to archive
compressed_file.add("Deploying a Django Application to Google App Engine.pdf") ## Adding file to archive
compressed_file.add("intro_ball.gif") ## Adding file to archive

compressed_file.close()

## Opening archive in read mode
compressed_file = tarfile.open(name="zen_of_python3.tar")

print("Object Type : ",compressed_file)

print("Archive Members : {}\n".format(compressed_file.getnames()))
compressed_file.list()

compressed_file.close()
Object Type :  <tarfile.TarFile object at 0x7f59aacce748>
Archive Members : ['zen_of_python.txt', 'dr_apj_kalam.jpeg', 'Deploying a Django Application to Google App Engine.pdf', 'intro_ball.gif']

?rw-r--r-- sunny/sunny        862 2021-04-08 10:05:05 zen_of_python.txt
?rw-rw-r-- sunny/sunny       6471 2021-01-16 09:56:07 dr_apj_kalam.jpeg
?rwxrwxrwx sunny/sunny     547350 2019-09-01 07:27:18 Deploying a Django Application to Google App Engine.pdf
?rw-rw-r-- sunny/sunny       5015 2021-03-12 20:45:36 intro_ball.gif

Example 6: Extract TAR Archive to a Folder

As a part of our sixth example, we'll explain with simple sub-examples how we can extract files from an existing archive to the current path or different location. We'll be using the below methods for our purpose.


Important Methods of TarFile Object

  • extract(member,path='') - This method takes as input member name as string or TarInfo instance and extracts a file to current working directory. We can provide location using path parameter if we want to extract the file to a different location.
  • extractall(path='',members=None) - This method extracts all members of tar file to current directory. We can provide location using path parameter to extract files to a different location. We can also provide a subset of members as a list if we don't want to extract all files from the archive. The members parameter accepts a list of strings or a list of TarInfo instances.

6.1

Our code for this example starts by opening an existing archive created in one of our previous examples in reading mode. It then calls extractall() method giving it folder name to extract all members of the archive. We then close the archive. We are also verifying extraction at the end.

In [137]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")

folder_name = "zen_of_python_all"

print("Extracting Files to Folder : {}".format(folder_name))

compressed_file.extractall(folder_name)

compressed_file.close()
Extracting Files to Folder : zen_of_python_all
In [139]:
!ls -lrt zen_of_python_all/
total 556
-rwxrwxrwx 1 sunny sunny 547350 Sep  1  2019 'Deploying a Django Application to Google App Engine.pdf'
-rw-rw-r-- 1 sunny sunny   6471 Jan 16 09:56  dr_apj_kalam.jpeg
-rw-rw-r-- 1 sunny sunny   5015 Mar 12 20:45  intro_ball.gif
-rw-r--r-- 1 sunny sunny    862 Apr  8 10:05  zen_of_python.txt

6.2

Our code for this example starts by opening the existing archive in reading mode. It then calls extractall() method but this time includes logic to keep only JPEG and GIF files in the archive and excluding all other files. We check file names ending with .jpeg and .gif for this purpose and give list to members parameter of extractall() method.

In [140]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")

folder_name = "zen_of_python_images"

print("Extracting Files to Folder : {}".format(folder_name))

compressed_file.extractall(path=folder_name, members=[member for member in compressed_file.getmembers() if ".jpeg" in member.name or ".gif" in member.name])

compressed_file.close()
Extracting Files to Folder : zen_of_python_images
In [141]:
!ls -lrt zen_of_python_images/
total 16
-rw-rw-r-- 1 sunny sunny 6471 Jan 16 09:56 dr_apj_kalam.jpeg
-rw-rw-r-- 1 sunny sunny 5015 Mar 12 20:45 intro_ball.gif

6.3

Our code for this example is almost the same as our code from the previous example with the only change that we are using extract() method to extract individual members. We are again looping through all members and extracting only JPEG and GIF files.

In [142]:
import tarfile

### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")

folder_name = "zen_of_python_images2"

print("Extracting Files to Folder : {}".format(folder_name))

for file_name in compressed_file.getnames():
    if ".jpeg" in file_name or ".gif" in file_name:
        compressed_file.extract(file_name, folder_name)

compressed_file.close()
Extracting Files to Folder : zen_of_python_images2
In [143]:
!ls -lrt zen_of_python_images2/
total 16
-rw-rw-r-- 1 sunny sunny 6471 Jan 16 09:56 dr_apj_kalam.jpeg
-rw-rw-r-- 1 sunny sunny 5015 Mar 12 20:45 intro_ball.gif

6.4

Our code for this example is almost the same as our code from previous examples. We loop through each member of the archive and retrieve their contents using extractfile() method. We then write them by creating a file. We are also comparing the contents of the file at the end.

In [144]:
import tarfile
import os

### Creating archive
compressed_file = tarfile.open(name="zen_of_python3.tar", mode="r")

folder_name = compressed_file.name[:-4]
os.makedirs(folder_name, exist_ok=True)

for member_name in compressed_file.getnames():
    file_content = compressed_file.extractfile(member_name)
    with open(os.path.join(folder_name,member_name), "wb") as fp:
        fp.write(file_content.read())

compressed_file.close()
In [145]:
!ls -lrt zen_of_python3/
total 556
-rw-r--r-- 1 sunny sunny    862 Apr  8 11:18  zen_of_python.txt
-rw-r--r-- 1 sunny sunny   6471 Apr  8 11:18  dr_apj_kalam.jpeg
-rw-r--r-- 1 sunny sunny   5015 Apr  8 11:18  intro_ball.gif
-rw-r--r-- 1 sunny sunny 547350 Apr  8 11:18 'Deploying a Django Application to Google App Engine.pdf'

This ends our small tutorial explaining how we can use tarfile module to work with tape archives. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki