Dealing with the wrong content encoding provided by the repository

Hi
some repositories return packages (*.tar.gz) with a wrong header `Content-encoding: gzip` and `requests` automatically decompresses these files (see http://docs.python-requests.org/en/master/user/quickstart/#raw-response-content). So the resulted file is not "gzipped" but only "tarred" and the exception is raised:

```poetry/repositories/pypi_repository.py in _get_info_from_sdist() at line 467
   tar = tarfile.TarFile(str(filepath), fileobj=gz)
 /usr/lib/python3.6/tarfile.py in __init__() at line 1480
   self.firstmember = self.next()
 /usr/lib/python3.6/tarfile.py in next() at line 2295
   tarinfo = self.tarinfo.fromtarfile(self)
 /usr/lib/python3.6/tarfile.py in fromtarfile() at line 1090
   buf = tarfile.fileobj.read(BLOCKSIZE)
 /usr/lib/python3.6/gzip.py in read() at line 276
   return self._buffer.read(size)
 /usr/lib/python3.6/_compression.py in readinto() at line 68
   data = self.read(len(byte_view))
 /usr/lib/python3.6/gzip.py in read() at line 463
   if not self._read_gzip_header():
 /usr/lib/python3.6/gzip.py in _read_gzip_header() at line 411
   raise OSError('Not a gzipped file (%r)' % magic)
```

There are two options:
1) Use the `tarfile.open` which can deal with it, instead of using directly the `tarfile.TarFile(str(filepath), fileobj=gz)`.
2) Use the requests raw stream instead of `iter_content`. Replace `r.iter_content(chunk_size=1024)` with the `r.raw.stream(1024)` in `poetry.repositories.pypi_repository.PyPiRepository#_download` and `poetry.repositories.legacy_repository.LegacyRepository#_download` Btw these spots violate DRY principle.

I suggest the option 1, because it is more robust and it doesn't depend on the download method.

I can prepare pull request even the tests will be tricky, because now the `_download` methods are not tested at all ;)

Best regards
PS: The `pip` can deal with it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with the wrong content encoding provided by the repository #517

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dealing with the wrong content encoding provided by the repository #517

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions