Updated On : Oct-02,2022 Time Investment : ~30 mins

ZipFile - Simple Guide to Work with Zip Archives in Python

ZIP is a very commonly used archive format to compress data. It was created in 1989. It supports lossless data compression.

ZIP files ends with .zip or .ZIP extension.

The majority of operating systems nowadays provide built-in zip compression utilities.

Zip archives use different compression algorithms (DEFLATED, LZMA, BZIP2, etc) for compressing contents.

When zip archives are transferred on the internet, the MIME-type application/zip is used to represent it.

Python provides a module named zipfile with a bunch of functionalities to work with zip archives. It provides various tools to perform operations like create, read, write and append on zip archives.

> What Can You Learn From This Article?

As a part of this tutorial, we have explained how to use Python module zipfile to handle zip archives with simple examples. We have explained important usage of zipfile module like how to read contents of zip archives, create zip archives, append files to an existing zip archive, compress content with different compression algorithms, test zip file for corruption, etc.

Below, we have listed essential sections of Tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Reading the Contents of Existing Zip File
  2. Create Zip Archive
  3. Append Files to Existing Zip File
  4. ZipFile as a Context Manager
  5. Creating Zip Archive with Different File Types
  6. Trying Different Compression Algorithms
  7. Extracting Files from Zip Archive
  8. Important Properties of Zipped Files
  9. Testing Zip File for Corruption
  10. Special Class for Zipping ".py" Files

We have created one simple text file that we'll be using when explaining various examples. Below we have printed the contents of the file. We have also created one archive with only this file using our Linux tools which we'll try to read in our first example.

!cat zen_of_python.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
    

1. Reading the Contents of Existing Zip File

As a part of our first example, we'll explain how we can open an existing zip archive and read the contents of the file stored in it. We'll be using class ZipFile and its methods for this purpose.

Our code for this example starts by creating an instance of ZipFile with zip archive and then opens text file inside of it using open() method of ZipFile instance. It then reads the contents of the file, prints it, and closes the archive. We have decoded the contents of the file as it’s returned as bytes.

Below, we have listed class and methods used in example. All sections will have information about class and methods listed which can be referred to check for important parameters.


ZipFile(file,mode='r',compression=ZIP_STORED,allowZip64=True,compresslevel=None) - This class lets us work with a zip file given as first parameter. It let us open, read, write, append to zip files.

  • file - It accepts string specifying zip file name.
  • mode - It accepts one of the below-mentioned characters as input.
    • 'r' - For reading zip archive. Default.
    • 'w' - For writing to zip archive. It'll create a zip archive if it does not exist and overwrite the contents of the existing zip archive.
    • 'a' - For appending new files to an existing zip archive.
    • 'x' - It creates an archive and writes files to it.
  • compression - This parameter accepts one of the below values specifying the compression algorithm.
    • 'zipfile.ZIP_STORED' - Default.
    • 'zipfile.ZIP_DEFLATED' - It requires python module zlib installed to work.
    • 'zipfile.ZIP_BZIP2' - It requires python module bz2 installed to work.
    • 'zipfile.ZIP_LZMA' - It requires python module lzma installed to work.
  • compresslevel - This parameter accepts value between 0-9 for ZIP_DEFLATED and 1-9 for ZIP_BZIP2. The lower values will compress fast but low compression whereas higher values will take time but compresses most.
  • allowZip64 - This parameter accepts boolean value. The True value indicates that when compressing file size larger than 4 GB, it'll create zip files with zip64 extension.

> Important Methods of ZipFile Object

  • open(name, mode='r') - It let us work with an individual member of the archive. It returns a file-like object which we can utilize as per our need. The name parameter accepts the file name to open. The mode parameter accepts below-mentioned values.
    • 'r' - It let us read the contents of the file.
    • 'w' - It let us write contents to the file. Please make a note that if the zip archive is opened in reading mode and we try to write to file then it'll fail with ValueError.
  • close() - This method closes archive after writing important metadata.
  • read(name) - This method accepts a string filename and returns the contents of file as bytes.

import zipfile

zipped_file = zipfile.ZipFile("zen_of_python.zip")

fp = zipped_file.open("zen_of_python.txt") ## Open file from zip archive

file_content = fp.read().decode() ## Read contents

print(file_content)

zipped_file.close() ## Close Archive
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Our second code example for this part explains the usage of read() method of ZipFile object. It has almost the same code as the previous example with the only change that it uses read() method. We also check whether the file is a valid zip file or not using is_zipfile() method.


> Important Method of "zipfile" Module

is_zipfile() - This method takes as input file or file-like object and returns True if its zip archive else returns False based on the magic number of file.


import zipfile

zipped_file = zipfile.ZipFile("zen_of_python.zip")

file_content = zipped_file.read("zen_of_python.txt")

print(file_content.decode())

zipped_file.close()

print("Is zen_of_python.zip a zip file ? ",zipfile.is_zipfile("zen_of_python.zip")) ## Checking for zip file
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Is zen_of_python.zip a zip file ?  True

2. Create Zip Archive

As a part of our second example, we'll explain how we can create zip archives. We'll be using write() method of ZipFile instance to write files to zip archive.


> Important Methods of ZipFile Object

write(filename) - It accepts a filename and adds that file to the archive.


Our code for this example creates a new archive by using 'w' mode of ZipFile instance. It then writes our text file to the archive and closes the archive.

The next part of our code opens the same archive again in reading mode and reads the contents of the text file written to verify it.

import zipfile

### Create archive and write file to it.
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="w", ) ## Opening in write mode

zipped_file.write("zen_of_python.txt")

zipped_file.close()

## Read archive to check contents
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="r")

file_content = zipped_file.read("zen_of_python.txt")

print(file_content.decode())

zipped_file.close()
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Our code for this example is exactly the same as our code for the previous example with the only change that it explains the usage of writestr() method.


> Import Methods of ZipFile Object

  • writestr(filename or zipinfo object, data) - This method accepts a name that will be given to the file in the archive as the first parameter and contents of the file as string or bytes as the second parameter. It then writes the contents to the archive.
    • It can also accept ZipInfo object as the first parameter instead of the file name. The ZipInfo instance will have details about the file. We'll explain about ZipInfo object in our upcoming example.

import zipfile

### Create archive and write file to it.
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="w", )

zipped_file.writestr("zen_of_python.txt", data=open("zen_of_python.txt", "r").read())

zipped_file.close()

## Read archive to check contents
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="r")

file_content = zipped_file.read("zen_of_python.txt")

print(file_content.decode())

zipped_file.close()
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

3. Append Files to Existing Zip File

As a part of our third example, we'll explain how we can append new files to an existing archive.

Our code for this example opens an archive that we created in the previous example in append mode. It then writes a new JPEG file to the archive and closes it. The next part of our code again opens the same archive in reading mode and prints the contents of the archive using printdir() method. The printdir() method lists archive contents which contain information about each file of an archive, its last modification date, and size.

import zipfile

### Create archive and write file to it.
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="a", )

zipped_file.write("dr_apj_kalam.jpeg")

zipped_file.close()

## Read archive to check contents
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="r")

zipped_file.printdir()

zipped_file.close()
File Name                                             Modified             Size
zen_of_python.txt                              2021-05-07 10:15:10          862
dr_apj_kalam.jpeg                              2021-05-07 10:02:14         6471

4. ZipFile as a Context Manager

We can also use ZipFile instance as a context manager (with statement). We don't need to call close() method on archive if we have opened it as a context manager as it'll close the archive by itself once we exit from context manager.

Our code for this example opens an existing archive as a context manager, reads the contents of the text file inside of it, and prints them.

It's recommended to use ZipFile as a context manager because it's a safe and easy way to work with archives. We'll be using it as a context manager in our examples going forward.

Python has a library named contextlib that let us create context managers with ease. Please feel free to check below link if you are interested in learning it.

import zipfile

with zipfile.ZipFile("zen_of_python.zip") as zipped_file:

    file_content = zipped_file.read("zen_of_python.txt")

    print(file_content.decode())
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

5. Creating Zip Archive with Different File Types

As a part of our fifth example, we are demonstrating how we can write multiple files to a zip archive.

Our code for this example starts by creating an instance of ZipFile with a new zip archive name and mode as 'w'. It then loops through four file names and writes them one by one to an archive. It then also lists a list of files present in an archive using namelist() method.

import zipfile

with zipfile.ZipFile("multiple_files.zip", "w") as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)


    print("List of files in archive : ", zipped_file.namelist())
List of files in archive :  ['zen_of_python.txt', 'dr_apj_kalam.jpeg', 'intro_ball.gif', 'Deploying a Django Application to Google App Engine.pdf']

6. Trying Different Compression Algorithms

As a part of our sixth example, we'll try different compression algorithms available for creating zip archives.

Our code for this example creates a different archive for each compression type and writes four files of a different type to each archive. We are then listing archive names to see the file size created by each type.

import zipfile

### Zip STORED
with zipfile.ZipFile("multiple_files_stored.zip", mode="w", compression=zipfile.ZIP_STORED) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)

### ZIP DEFLATED
with zipfile.ZipFile("multiple_files_deflated.zip", mode="w", compression=zipfile.ZIP_DEFLATED) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)

### ZIP BZIP2
with zipfile.ZipFile("multiple_files_bzip2.zip", mode="w", compression=zipfile.ZIP_BZIP2) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)

### ZIP LZMA
with zipfile.ZipFile("multiple_files_lzma.zip", mode="w", compression=zipfile.ZIP_LZMA) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)

We can notice from our results below that lzma has performed more compression compared to other algorithms.

!ls -l multiple_files_*
-rw-r--r-- 1 sunny sunny 526634 May  7 10:29 multiple_files_bzip2.zip
-rw-r--r-- 1 sunny sunny 519869 May  7 10:29 multiple_files_deflated.zip
-rw-r--r-- 1 sunny sunny 519444 May  7 10:29 multiple_files_lzma.zip
-rw-r--r-- 1 sunny sunny 560230 May  7 10:29 multiple_files_stored.zip

Below we have created another example that is exactly the same as our above example with the only change that we have provided compresslevel for BZIP2 and DEFLATED compressions. As we had explained earlier, the lower values for compression level will result in compression completing faster but compression will be low hence file sizes will be more compared to more compression levels.

When we don't provide compression level, it uses the best compression. We can compare file sizes generated by this example with previous ones. We can notice that file sizes generated by DEFLATED and BZIP2 are more compared to the previous example. As we did not provide compression level in our previous example, it used most compression. For this example, we have provided the least compression level hence compression completes fast but file sizes are more due to less compression.

import zipfile

### Zip STORED
with zipfile.ZipFile("multiple_files_stored.zip", mode="w", compression=zipfile.ZIP_STORED) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)

### ZIP DEFLATED with compresslevel=0 (least compression)
with zipfile.ZipFile("multiple_files_deflated.zip", mode="w", compression=zipfile.ZIP_DEFLATED, compresslevel=0) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)

### ZIP BZIP2 with compresslevel=1 (least compression)
with zipfile.ZipFile("multiple_files_bzip2.zip", mode="w", compression=zipfile.ZIP_BZIP2, compresslevel=1) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)

### ZIP LZMA
with zipfile.ZipFile("multiple_files_lzma.zip", mode="w", compression=zipfile.ZIP_LZMA) as zipped_file:

    for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
        zipped_file.write(file)
!ls -l multiple_files_*
-rw-r--r-- 1 sunny sunny 527705 May  7 10:32 multiple_files_bzip2.zip
-rw-r--r-- 1 sunny sunny 560330 May  7 10:32 multiple_files_deflated.zip
-rw-r--r-- 1 sunny sunny 519444 May  7 10:32 multiple_files_lzma.zip
-rw-r--r-- 1 sunny sunny 560230 May  7 10:32 multiple_files_stored.zip

7. Extracting Files from Zip Archive

As a part of our seventh example, we'll explain how we can extract the contents of an archive. We'll be using extract() and extractall() methods of ZipFile instance for our purpose.


> Important Methods of ZipFile Instance

  • extract(member_name,path=None) - This method accepts member file name that we want to extract. It then extracts the member to the current directory. If we want to extract a file to some different location then we can give that location as a string to path parameter of the method. The method returns the path where the file was extracted.

  • extractall(path=None, members=None) - This method extracts all members of the zip archive to the current path if the path is not provided else it'll extract members to the given path. We can also give a list of members to extract as a list to members parameter and it'll only extract that subset of members to the given path. The method returns the path where files were extracted.


Our code for this example starts by opening a previously created zip file in reading mode. It then extracts one single JPEG to the current path. We then extract the same file to a different location by giving the location to path parameter. We have then also extracted all files using extractall() method.

import zipfile

with zipfile.ZipFile("multiple_files.zip") as zipped_file:
    extraction_path = zipped_file.extract("dr_apj_kalam.jpeg")

    print("Path where files are extracted : ", extraction_path)

    extraction_path = zipped_file.extract("dr_apj_kalam.jpeg", path="/home/sunny/multiple_files/")

    print("Path where files are extracted : ", extraction_path)

    extraction_path = zipped_file.extractall(path="/home/sunny/multiple_files/")
Path where files are extracted :  /home/sunny/dr_apj_kalam.jpeg
Path where files are extracted :  /home/sunny/multiple_files/dr_apj_kalam.jpeg

8. Important Properties of Zipped Files

As a part of our eighth example, we'll explain how we can retrieve details of individual members of an archive.

The zipfile library has a class named ZipInfo whose object stores information about one file.

The information like modification date, compression type, compression size, file size, CRC, and few other important attributes are available through ZipInfo instance.

We can get ZipInfo instance using getinfo() or infolist() methods of ZipInfo instance.


> Important Methods of ZipFile Object

  • getinfo(member_name) - This method takes as input member name of the archive and returns ZipInfo instance for that member.
  • infolist() - This method returns a list of ZipInfo instances where each instance represents one member of an archive.

> Important Methods/Attributes of ZipInfo Object

  • date_time - Last modification date and time.
  • compress_type - It returns integer specifying compression type. (zipfile.ZIP_STORED - 0, zipfile.ZIP_DEFLATED - 8, zipfile.ZIP_BZIP2 - 12, zipfile.ZIP_LZMA - 14)
  • compress_size - Size of compressed data in bytes.
  • file_size - Size of uncompressed file.
  • comment - Comment about file.
  • create_system - System that created zip archive.
  • create_version - PKZIP version used to create archive.
  • extract_version - PKZIP version required to extract archive.
  • flag_bits - ZIP flag bits.
  • volume - Volume no of file header.
  • internal_attr - Internal file attributes.
  • external_attr - External file attributes.
  • header_offset - offset in bytes to file header.
  • CRC - CRC-32 of uncompressed member.
  • isdir() - This method returns True if member is directory else False.

Our code for this example starts by creating a method that takes as input ZipInfo instance and prints important attributes of members represented by that instance.

We have then opened the existing zip archive in reading mode. We have then retrieved ZipInfo instance for the text file using getinfo() method and printed the attributes of the file.

We have then retrieved all ZipInfo instances for all members of the archive and printed details of each member.

import zipfile

def print_zipinfo_details(zipinfo):
    print("\n=============== {} ==================".format(zipinfo.filename))
    print("\nLast Modification Datetime of file   : ", zipinfo.date_time)
    print("Compression Type                     : ", zipinfo.compress_type)
    print("Compressed Data Size                 : ", zipinfo.compress_size)
    print("Uncompressed Data Size               : ", zipinfo.file_size)
    print("Comment for file                     : ", zipinfo.comment)
    print("System that created zip file         : ", zipinfo.create_system)
    print("PKZIP Version                        : ", zipinfo.create_version)
    print("PKZIP Version needed for extraction  : ", zipinfo.extract_version)
    print("ZIP Flags                            : ", zipinfo.flag_bits)
    print("Volume Number of File Header         : ", zipinfo.volume)
    print("Internal Attributes                  : ", zipinfo.internal_attr)
    print("External Attributes                  : ", zipinfo.external_attr)
    print("Byte offset to File Header           : ", zipinfo.header_offset)
    print("CRC-32 of compressed data            : ", zipinfo.CRC)
    print("Is zip member directory ?            : ", zipinfo.is_dir())

with zipfile.ZipFile("multiple_files.zip") as zipped_file:
    print("List of files in archive : ", zipped_file.namelist())

    zipinfo = zipped_file.getinfo("zen_of_python.txt")
    print("\nZipInfo Object : ", zipinfo)

    print_zipinfo_details(zipinfo)

    for zipinfo in zipped_file.infolist():
        print_zipinfo_details(zipinfo)
List of files in archive :  ['zen_of_python.txt', 'dr_apj_kalam.jpeg', 'intro_ball.gif', 'Deploying a Django Application to Google App Engine.pdf']

ZipInfo Object :  <ZipInfo filename='zen_of_python.txt' filemode='-rw-r--r--' file_size=862>

=============== zen_of_python.txt ==================

Last Modification Datetime of file   :  (2021, 5, 7, 10, 0, 46)
Compression Type                     :  0
Compressed Data Size                 :  862
Uncompressed Data Size               :  862
Comment for file                     :  b''
System that created zip file         :  3
PKZIP Version                        :  20
PKZIP Version needed for extraction  :  20
ZIP Flags                            :  0
Volume Number of File Header         :  0
Internal Attributes                  :  0
External Attributes                  :  2175008768
Byte offset to File Header           :  0
CRC-32 of compressed data            :  1558622507
Is zip member directory ?            :  False

=============== zen_of_python.txt ==================

Last Modification Datetime of file   :  (2021, 5, 7, 10, 0, 46)
Compression Type                     :  0
Compressed Data Size                 :  862
Uncompressed Data Size               :  862
Comment for file                     :  b''
System that created zip file         :  3
PKZIP Version                        :  20
PKZIP Version needed for extraction  :  20
ZIP Flags                            :  0
Volume Number of File Header         :  0
Internal Attributes                  :  0
External Attributes                  :  2175008768
Byte offset to File Header           :  0
CRC-32 of compressed data            :  1558622507
Is zip member directory ?            :  False

=============== dr_apj_kalam.jpeg ==================

Last Modification Datetime of file   :  (2021, 5, 7, 10, 16, 40)
Compression Type                     :  0
Compressed Data Size                 :  6471
Uncompressed Data Size               :  6471
Comment for file                     :  b''
System that created zip file         :  3
PKZIP Version                        :  20
PKZIP Version needed for extraction  :  20
ZIP Flags                            :  0
Volume Number of File Header         :  0
Internal Attributes                  :  0
External Attributes                  :  2176057344
Byte offset to File Header           :  909
CRC-32 of compressed data            :  2457736190
Is zip member directory ?            :  False

=============== intro_ball.gif ==================

Last Modification Datetime of file   :  (2021, 5, 7, 10, 0, 46)
Compression Type                     :  0
Compressed Data Size                 :  5015
Uncompressed Data Size               :  5015
Comment for file                     :  b''
System that created zip file         :  3
PKZIP Version                        :  20
PKZIP Version needed for extraction  :  20
ZIP Flags                            :  0
Volume Number of File Header         :  0
Internal Attributes                  :  0
External Attributes                  :  2176057344
Byte offset to File Header           :  7427
CRC-32 of compressed data            :  944415501
Is zip member directory ?            :  False

=============== Deploying a Django Application to Google App Engine.pdf ==================

Last Modification Datetime of file   :  (2021, 5, 7, 10, 0, 46)
Compression Type                     :  0
Compressed Data Size                 :  547350
Uncompressed Data Size               :  547350
Comment for file                     :  b''
System that created zip file         :  3
PKZIP Version                        :  20
PKZIP Version needed for extraction  :  20
ZIP Flags                            :  0
Volume Number of File Header         :  0
Internal Attributes                  :  0
External Attributes                  :  2180972544
Byte offset to File Header           :  12486
CRC-32 of compressed data            :  3310412975
Is zip member directory ?            :  False

9. Testing Zip File for Corruption

As a part of our ninth example, we are demonstrating how we can check for corrupt archives using testzip() method of ZipFile instance.


> Important Methods of ZipFile Object

  • testzip() - This method loops through all members of the archive and checks their CRC & headers. It returns None if all files are good else returns the name of the first file which is corrupted.

Our code for this example opens two different archives and checks them for corruption. It then prints whether archives are corrupt or not.

import zipfile

with zipfile.ZipFile("multiple_files.zip") as zipped_file:
    test_result = zipped_file.testzip()

    print("Is zip file (multiple_files.zip) valid ? ",test_result if test_result else "Yes")

with zipfile.ZipFile("zen_of_python.zip") as zipped_file:
    test_result = zipped_file.testzip()

    print("Is zip file (zen_of_python.zip)  valid ? ",test_result if test_result else "Yes")
Is zip file (multiple_files.zip) valid ?  Yes
Is zip file (zen_of_python.zip)  valid ?  Yes

10. Special Class for Zipping ".py" Files

As a part of our tenth and last example, we'll explain how we can compress compiled python files (.pyc) and create an archive from them. The zipfile provides a special class named PyZipFile for this purpose.


PyZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, optimize=-1) - The PyZipFile constructor has same parameters as that of ZipFile constructor.

  • It has one extra parameter named optimize which takes integer values. The default value is -1 for it which means all .pyc files are added. Other accepted value for this parameter is 0,1 and 2 which represents compiled python files which are compiled with that optimization level. If we give optimize parameter then only files with that optimization will be added to the archive.

> Important Methods of PyZipFile Object

It has the same methods as ZipFile with one extra method described below.

  • writepy(pathname,filterfunc=None) - This method takes as input pathname as a string and adds all compiled files present in that path to archive as per optimization level. All subdirectories and .pyc files inside of them are also added recursively to the archive.
    • filterfunc - The parameter takes as input callable. The callable should take as an input string and return True/False. The function can be used to filter out files based on some condition.

Our code for this example starts by creating an instance of PyZipFile for the new archive. It then adds files to the zip archive from the path given as input. We then list all files added to the archive using printdir() method.

import zipfile

with zipfile.PyZipFile("python_files.zip", mode="w") as pyzipped_file:
    pyzipped_file.writepy(pathname="/home/sunny/multiprocessing_synchronization_primitive_examples")

    pyzipped_file.printdir()
File Name                                             Modified             Size
multiprocessing_synch_primitive_ex_1_1.pyc     2021-05-07 10:20:16         1159
multiprocessing_synch_primitive_ex_1_2.pyc     2021-05-07 10:20:16         1321
multiprocessing_synch_primitive_ex_1_3.pyc     2021-05-07 10:20:16         1286
multiprocessing_synch_primitive_ex_1_4.pyc     2021-05-07 10:20:16         1427
multiprocessing_synch_primitive_ex_1_5.pyc     2021-05-07 10:20:16         1440
multiprocessing_synch_primitive_ex_2_1.pyc     2021-05-07 10:20:16         1347
multiprocessing_synch_primitive_ex_2_2.pyc     2021-05-07 10:20:16         1461
multiprocessing_synch_primitive_ex_2_3.pyc     2021-05-07 10:20:16         1474
multiprocessing_synch_primitive_ex_2_4.pyc     2021-05-07 10:20:16         1355
multiprocessing_synch_primitive_ex_3_1.pyc     2021-05-07 10:20:16         1903
multiprocessing_synch_primitive_ex_3_2.pyc     2021-05-07 10:20:16         1912
multiprocessing_synch_primitive_ex_3_3.pyc     2021-05-07 10:20:16         1911
multiprocessing_synch_primitive_ex_3_4.pyc     2021-05-07 10:20:16         1931
multiprocessing_synch_primitive_ex_3_5.pyc     2021-05-07 10:20:16         1899
multiprocessing_synch_primitive_ex_4_1.pyc     2021-05-07 10:20:16         1767
multiprocessing_synch_primitive_ex_4_2.pyc     2021-05-07 10:20:16         1963
multiprocessing_synch_primitive_ex_4_3.pyc     2021-05-07 10:20:16         1964
multiprocessing_synch_primitive_ex_4_4.pyc     2021-05-07 10:20:16         1747
multiprocessing_synch_primitive_ex_5_1.pyc     2021-05-07 10:20:16         1860
multiprocessing_synch_primitive_ex_5_2.pyc     2021-05-07 10:20:16         1891
multiprocessing_synch_primitive_ex_6_1.pyc     2021-05-07 10:20:16         1255
multiprocessing_synch_primitive_ex_6_2.pyc     2021-05-07 10:20:16         1452
multiprocessing_synch_primitive_ex_6_3.pyc     2021-05-07 10:20:16         1696
multiprocessing_synch_primitive_ex_6_4.pyc     2021-05-07 10:20:16         1702

This ends our small tutorial explaining the usage of zipfile module of Python.

References

Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.


Subscribe to Our YouTube Channel

YouTube SubScribe

Newsletter Subscription