Skip to content

Commit

Permalink
Hash packages without unpacking
Browse files Browse the repository at this point in the history
The PyPIRepository._get_file_hash used to call unpack_url, when
generating the hash.  It only needed the side effect of the downloaded
package being left in the download directory and the unpacking part was
actually unnecessary.  Change it to just open the (local or remote)
package as a file object and hash the contents without unpacking.

This makes it faster and lighter, since unpacking consumes CPU cycles
and disk space, and more importantly, avoids problems which happen when
some distribution has a file with the same name as a directory in
another.  Unpacking both to packages to the same directory will then
fail.  E.g. matplotlib-2.0.2.tar.gz has a directory named LICENSE, but
many other packages have a file named LICENSE.

Fixes #512, #544
  • Loading branch information
suutari committed Sep 2, 2017
1 parent 877241d commit 627dbaf
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 15 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ when `--allow-unsafe` was not set. ([#517](https://github.com/jazzband/pip-tools
(thus losing their VCS directory) and `python setup.py egg_info` fails. ([#385](https://github.com/jazzband/pip-tools/pull/385#) and [#538](https://github.com/jazzband/pip-tools/pull/538)). Thanks @blueyed and @dfee
- Fixed bug where some primary dependencies were annotated with "via" info comments. ([#542](https://github.com/jazzband/pip-tools/pull/542)). Thanks @quantus
- Fixed bug where pkg-resources would be removed by pip-sync in Ubuntu. ([#555](https://github.com/jazzband/pip-tools/pull/555)). Thanks @cemsbr
- Fixed package hashing doing unnecessary unpacking

# 1.9.0 (2017-04-12)

Expand Down
51 changes: 36 additions & 15 deletions piptools/repositories/pypi.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@

import hashlib
import os
from contextlib import contextmanager
from shutil import rmtree

from pip.download import unpack_url
from pip.download import is_file_url, url_to_path
from pip.index import PackageFinder
from pip.req.req_set import RequirementSet
from pip.wheel import Wheel
Expand Down Expand Up @@ -194,18 +195,38 @@ def get_hashes(self, ireq):
}

def _get_file_hash(self, location):
with TemporaryDirectory() as tmpdir:
unpack_url(
location, self.build_dir,
download_dir=tmpdir, only_download=True, session=self.session
)
files = os.listdir(tmpdir)
assert len(files) == 1
filename = os.path.abspath(os.path.join(tmpdir, files[0]))

h = hashlib.new(FAVORITE_HASH)
with open(filename, "rb") as fp:
for chunk in iter(lambda: fp.read(8096), b""):
h.update(chunk)

h = hashlib.new(FAVORITE_HASH)
with open_local_or_remote_file(location, self.session) as fp:
for chunk in iter(lambda: fp.read(8096), b""):
h.update(chunk)
return ":".join([FAVORITE_HASH, h.hexdigest()])


@contextmanager
def open_local_or_remote_file(link, session):
"""
Open local or remote file for reading.
:type link: pip.index.Link
:type session: requests.Session
:raises ValueError: If link points to a local directory.
:return: a context manager to the opened file-like object
"""
url = link.url_without_fragment

if is_file_url(link):
# Local URL
local_path = url_to_path(url)
if os.path.isdir(local_path):
raise ValueError("Cannot open directory for read: {}".format(url))
else:
with open(local_path, 'rb') as local_file:
yield local_file
else:
# Remote URL
headers = {"Accept-Encoding": "identity"}
response = session.get(url, headers=headers, stream=True)
try:
yield response.raw
finally:
response.close()

0 comments on commit 627dbaf

Please sign in to comment.