Search code examples
pythonjsonepubseekpython-zipfile

ZipFile' object has no attribute 'seek'


I am trying to get a script working that can make an ePub file. They are compressed zip files that are deflated (i.e. without compression) and have to be done in order. This current script will create a .zip but it is unusable a creates errors both in Python Shell and on the Terminal app when running the zip -t command.

The error in question is as follows on the Python shell:

Traceback (most recent call last):
  File "/Users/Hal/Documents/GitHub/Damore-essay-ebook/GenEpub-old.py", line 29, in <module>
    if zipfile.is_zipfile(zf) is True:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/zipfile.py", line 183, in is_zipfile
    result = _check_zipfile(fp=filename)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/zipfile.py", line 169, in _check_zipfile
    if _EndRecData(fp):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/zipfile.py", line 241, in _EndRecData
    fpin.seek(0, 2)
AttributeError: 'ZipFile' object has no attribute 'seek'

The error in question on the Mac Terminal (though I am sure the output would the same wherever I ran zip -t:

Archive:  IdealogicalEcho.epub
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of IdealogicalEcho.epub or
        IdealogicalEcho.epub.zip, and cannot find IdealogicalEcho.epub.ZIP, period.

Python source code:

#!/usr/bin/env python

#GenEpub.py - Generates an .epub file from the data provided.
#Ideally with no errors or warnings from epubcheck (needs to be implemented, maybe with the Python wrapper).

import os
import json
import zipfile
    
with open('metadata.json') as json_file:
        data = json.load(json_file)

#The ePub standard requires deflated compression and a compression order.
zf = zipfile.ZipFile(data["fileName"] + '.epub', mode='w', compression=zipfile.ZIP_STORED)

zf.write(data["fileName"] + '/mimetype')

for dirname, subdirs, files in os.walk(data["fileName"] + '/META-INF'):
    zf.write(dirname)
    for filename in files:
        zf.write(os.path.join(dirname, filename))

for dirname, subdirs, files in os.walk(data["fileName"] + '/EBOOK'):
    zf.write(dirname)
    for filename in files:
        zf.write(os.path.join(dirname, filename))

#zipfile has a built-in validator for debugging
if zipfile.is_zipfile(zf) is True:
    print("ZIP file is valid.")

#Extra debugging information
#print(getinfo.compress_type(zf))
#print(getinfo.compress_size(zf))
#print(getinfo.file_size(zf))

zf.close()

JSON file I used:

{
        "comment1": "Metadata.json - Insert the e-book's metadata here. WIP",

        "comment2": "Technical metadata - This is the where the cover image is specified. Recommended to use ePub V2.0.1 over 3.0 for epubVersion and Reflowable rather than Fixed for textPresentation (unless doing a project that requires a specific layout). mobiCover and generateKindle are currently unused but added for futureproofing.",
        "epubCover": "cover.jpg",
        "mobiCover": "cover.jpg",
        "fileName": "IdealogicalEcho",
        "epubVersion": "2.0.1",
        "textPresentation": "Reflowable",
        "generateKindle": "no",

        "comment3": "Book metadata - Information about the e-book itself. Language is specified with ISO 639-1. Rights can be worldwide, country specific or under a permissable license such as Creative-Commons SA",
        "title": "Google's Idealogical Echochamber",
        "creator": "James Damore",
        "subject": "Academic",
        "publisher": "Hal Motley",
        "ISBN": "-",
        "language": "en",
        "rights": "Creative-Commons SA",

        "comment4": "This is the page order that the e-book has. The first number before the colon is the page order, the second is the indentation, third is the page name and fourth is file itself.",
            "pages": [
                    {
                        "1": [0, "Cover", "bookcover.xhtml"],
                        "2": [0, "Title", "title.xhtml"],
                        "3": [0, "Indicia", "indicia.xhtml"],
                        "4": [0, "License", "license.xhtml"],
                        "5": [0, "Contents", "toc.xhtml"],
                        "6": [0, "Foreword", "foreword.xhtml"],
                        "7": [0, "Article", "article.xhtml"]
                    }
                            ]
}

Solution

  • The problem lies somewhere inside is_zipfile. Although it is stayed that "filename may be a file or file-like object" (13.5.1. ZipFile Objects: zipfile.is_zipfile), it fails with the seek error.

    A possible solution is to close the file and reopen it just to check:

    zf.close()
    
    with open(data["fileName"] + '.epub','r') as f:
        if zipfile.is_zipfile(f) is True:
            print("ZIP file is valid.")
    

    I also found that that check is extremely basic and will return True even if you manually damage some bytes. It takes some effort to actually make it fail.

    Interestingly, the apparent more thorough zipfile.ZipFile.testzip function needs that zf again – but it also fails if called before zf.close(). And there is no zf.flush() ...

    Luckily, checking the created ePub file with zip after running the script reveals it contains no errors:

    ~/Documents $ zip -T IdealogicalEcho.epub 
    test of IdealogicalEcho.epub OK
    

    (which does not tell you, by the way, that it is a valid epub. (It is not.))