Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[crcmod] Failing to validate files with the C implementation (release 5.24) #1726

Open
bebenlebricolo opened this issue Jul 5, 2023 · 0 comments

Comments

@bebenlebricolo
Copy link

bebenlebricolo commented Jul 5, 2023

Hi team,

I'm having an issue with gsutil while trying to download a file (~290 MB) from a google bucket.
The issue happens both on my Linux machine (Arch Linux) and a docker container that I'm trying to build.

Note that the archive I'm trying to download was sliced in subparts (exactly 2 components) by the upload step and the archive itself is valid (I can download it from my web browser, extract it and the archive itself is 100% fine).

Arch linux repro steps

Base setup :

  • using the package provided by Google GCloud CLi install page
    • Archive rev : 473.0.1-linuxx86_64
    • Extract the package, then install it using the install.sh script.
      No crcmod lib was found when downloading from google bucket ( gsutil version -l showed that crcmod compiled was not installed)
  • gsutil -d cp gs://... /tmp/... -> download succeeded

Adding crcmod via pacman package manager

Installed the crcmod : pacman -S python-crcmod
Trying the download again : gsutil -d cp gs://... /tmp/...
-> download failed with strange file stream errors :

 return http_callable(uri, method=method, body=body, headers=headers,
187.3   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/httplib2/python3/httplib2/__init__.py", line 1701, in request
187.3     (response, content) = self._request(
187.3   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 452, in OverrideRequest
187.3     (response, content) = self._conn_request(conn, request_uri, method, body,
187.3   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 690, in _conn_request
187.3     text_util.write_to_fd(self.stream, new_data)
187.3   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/text_util.py", line 378, in write_to_fd
187.3     fd.write(data)
187.3   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 2606, in write
187.3     assert (self._start_byte <= current_file_pos and
187.3 AssertionError

Removing crcmod again

pacman -R python-crcmod
Then gsutil -d cp gs://... /tmp/... -> Succeeded again !

Docker build setup

Using a debian-11 based image:

Note : I attempted two methods, installing the package both with debian package manager apt-get and via the regular linux installation tools, same results so far -> python is bundled with the installation anyways and so is crcmod

# From https://cloud.google.com/sdk/docs/install?hl=fr#deb
#RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list \
#    && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg  add - \
#    && apt-get update -y && apt-get install google-cloud-cli -y


RUN cd /tmp && curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-437.0.1-linux-x86_64.tar.gz \
    && tar xf google-cloud-cli-437.0.1-linux-x86_64.tar.gz -C /opt \
    && /opt/google-cloud-sdk/install.sh -q
RUN ln -s /opt/google-cloud-sdk/bin/gcloud /usr/bin/gcloud      \
    && ln -s /opt/google-cloud-sdk/bin/gsutil /usr/bin/gsutil

# Authentication part
RUN gcloud auth activate-service-account ${SERVICE_ACCOUNT} --key-file=/tmp/sa_keyfile.json
RUN gsutil -d cp gs://...  /tmp/...

This fails with the same pattern as depicted above (while validating the CRC32 of the object)
Adding this extra step :
RUN /opt/google-cloud-sdk/platform/bundledpythonunix/bin/pip3.9 uninstall --quiet --yes crcmod
gets rid of the issue.

So I believe something might be off with the crcmod lib used version (...)
Also, the error is not very informative as gsutil solely reports :
CommandException: Some components of /tmp/<file> were not downloaded successfully. Please retry this download.

PS : I found some issues here and there (especially really old ones) but I haven't found recents ones. Hope it does not make for another double ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant