-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot stream-write a zipfile #631
Comments
For files of any real size, the current workaround I give is not sufficient - I get errors around the ~50MB range:
... it's not clear to me why In any case, the archived files up to that point have been written to the gcloud storage location, and it can be downloaded and unpacked, and those files read. |
Hello, thank you for your report. Some context here: the reason flush() triggers a file close is because file-like i/o uses a resumable upload handler under the hood, and resumable uploads require chunks of exactly the same size every time until the final chunk, which is of arbitrary size. flush() sends a chunk of arbitrary size, which is the signal to the remote server that the upload is complete. I don't think I yet fully understand your use case or the connection between the zipfile closing and flush() being triggered. Is there relevant code missing from your example? If the |
The same error is generated regardless of whether there is anything replacing This repro generates the same error, which is the same error as in my real code: import io
import zipfile
from google.cloud.storage import Blob, Client
client = Client("client_account")
blob = Blob.from_string("gs://some_bucket_path/something.zip", client)
with blob.open(mode="wb") as blob_file, \
zipfile.ZipFile(blob_file, mode="w") as zip_file, \
open("large_image.png", mode="rb") as large_image_file, \
open("small_image.jpg", mode="rb") as small_image_file:
large_image_bytes = large_image_file.read()
small_image_bytes = small_image_file.read()
for i in range(10):
print(f"writing image {i}")
with zip_file.open(f"{i}.png", mode="w") as zip_entry:
zip_entry.write(large_image_bytes)
with zip_file.open(f"{i}.jpg", mode="w") as zip_entry:
zip_entry.write(small_image_bytes)
print(f"Succeeded writing image {i}") ... it turns out this actually succeeds in writing all of the images to the zip on gcloud, but then throws an error on finish:
|
Oh, I see, zipfile is issuing the flush() to blobfile automatically on close. With the resumable upload behavior where a partial upload necessarily closes the file, there's no way we can simply make flush() work the way people expect it to. Given that restriction, what kind of behavior would make more sense in this case? |
I don't know, I just don't want it to throw errors when I close the file. |
Okay. I'll work on a new behavior here. In the meantime, this is perhaps a silly workaround, but inside your |
I'm attempting to stream-write an uncompressed zip file (200MB+ -> 1GB+ eventually, mostly ~3MB images) to avoid writing to disk. Unfortunately when the zip file is closed it attempts to at least partially flush the stream, and the storage client seems to assume that a flush will only occur to close the streamed file, which isn't the case here (and seems wrong in the general sense, since buffers are often flushed when they reach saturation).
Environment details
google-cloud-storage
version: 1.42.3Code example
Stack trace
</ br>
Workaround
Insert an
io.BufferedWriter
:This results in the file being written to cloud storage (and at least for a simple case, including correctly written contents), but it prints a new error:
The text was updated successfully, but these errors were encountered: