Search code examples
pythonpython-3.xfilecharacter-encodingzip

Python special characters not preserved on file write in 'wb'


I am having the problem in the following function:

def download_zip(key: str):
    response = requests.get('some url')

    if response.status_code == 200:
        header = response.headers.get('content-disposition')

        if type(header) == str:
            file_name = re.findall("filename=(.+)", header)[0][1:-1]
        else:
            file_name = f"{key}.zip"

        with open(file_name, mode='wb') as f:
            f.write(response.content)

        print(f"Written: {file_name}")

    else:
        print(f"Failed: {key} -> {response.status_code}")


The URL 'some url' points to a zip file.

When downloading the file with a browser, its name contains japanese characters which are preserved: 音のないレプリカ. With my code, they are not and instead produce something like this: é³ã®ãªãã¬ããªã«. How can I make it preserve those characters?


Solution

  • Case-specific solution: fix the encoding to what (you think) it should be:

    file_name.encode('iso-8859-1').decode('utf-8')
    

    More general solution: yell at the owner of the webserver to fix their headers, then properly set the filename* field of Content-Disposition. Then you could have some confidence as to what the correct encoding was.