Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Blob.compose doesn't use default content_type #5834

Closed
rasmi opened this issue Aug 22, 2018 · 12 comments
Closed

Storage: Blob.compose doesn't use default content_type #5834

rasmi opened this issue Aug 22, 2018 · 12 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@rasmi
Copy link

rasmi commented Aug 22, 2018

Blob.compose does not use the self._get_content_type method, which means Blob.compose raises an error if the content_type is not explicitly specified. This is documented behavior, but is it intended behavior? Should compose use self._get_content_type to use the default type, or perhaps infer the type from the blobs being composed?

@tseaver tseaver added type: question Request for information or clarification. Not an issue. api: storage Issues related to the Cloud Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: question Request for information or clarification. Not an issue. labels Aug 22, 2018
@tseaver
Copy link
Contributor

tseaver commented Aug 28, 2018

@frankyn I'm not sure how the GCS team would like to shape this feature.

@frankyn
Copy link
Member

frankyn commented Sep 6, 2018

Acking. Adding to GCS client library weekly.

@rasmi
Copy link
Author

rasmi commented Sep 19, 2018

@frankyn -- any updates?

@frankyn
Copy link
Member

frankyn commented Sep 19, 2018

Apologies on the wait.

content-type is optional and defaults to application/octet-stream based on Storage.Objects Ref docs. I don't think we should fail the method if it's not set.

Additionally, I'm looking at upload_* and these methods provide a parameter for content_type. Including this parameter would follow the pattern that exists for these methods and would be easier to understand.

@tseaver do you have objections to surfacing content_type as an optional parameter for compose()?

@rasmi
Copy link
Author

rasmi commented Sep 19, 2018

Great! content-type as an optional parameter (but falling back to the default through self._get_content_type) seems ideal.

Could this lead to unexpected behavior? For example, composing blobs of some non-default type but not specifying that type in compose leads them to be composed as application/octet-stream. I suppose the type can also be inferred from the types of the objects being composed, but this seems more involved.

@frankyn
Copy link
Member

frankyn commented Sep 19, 2018

@rasmi that sounds like a fair condition.

if content_type is not None:
  # use this
if self._get_content_type is not `None`
  # use this
else:
  # it's okay to not provide content_type as it will default to application/octet-stream.

I think inferring the content-type might be not be as helpful as simply surfacing a parameter for the content type. I think it would feel magical which is not a bad thing, but it may or may not be the desired result for everyone.

@rasmi
Copy link
Author

rasmi commented Sep 19, 2018

Okay, agreed.

I can submit a PR. However, I have an implementation question. Content type is not explicitly used in self.compose, but it does pass self when making a request, so presumably the blob object's content type must be set before the request is made (which is why the error is raised if it's not set).

Additionally, self._get_content_type() handles arguments/None values correctly. So I think we can simply add the content_type argument to the compose function and do:

self.content_type = self._get_content_type(content_type)

at the top of compose to ensure that the content type is set. If the argument is None, it will use the blob's content type, and if the blob has no content type, it will use the default. Is that a reasonable implementation?

@tseaver
Copy link
Contributor

tseaver commented Sep 19, 2018

@frankyn as long as the objects.compose API honors the destination.content_type field, it should work just to set content_type on the Blob instance through which we are invoking it (we already require that, in fact, because we copy the destination blob's properties into the request body).

sources = [bucket.blob(name) for name in ['source1.txt', 'source2.txt']]
destination = bucket.blob('destination.txt)
destination.content_type = 'text/plain'
# set other destination properties here
destination.compose(sources)

I don't really see the value in adding Another Way to Do It(TM).

This issue is a request that, if the destination blob does not have content_type set, we deduce the correct value by looking at the content types of the sources. Given that we may not have loaded them (e.g., in the example I just showed), I don't know how feasible it would be. I guess we could do something like:

    def _deduce_content_type_for_compose(self, sources):
        if self.content_type is None:
            source_types = set(
                src.content_type for src in sources if src.content_type is not None)
            if len(source_types) == 1:
                return source_types[0]
            return self.content_type

@frankyn
Copy link
Member

frankyn commented Sep 19, 2018

Thanks @tseaver, your comment helps.
I don't think we should infer content types from a list of objects and instead depend on the destination blob content type.

How about not failing the compose if a content_type is not found?

@tseaver
Copy link
Contributor

tseaver commented Sep 19, 2018

How about not failing the compose if a content_type is not found?

That would be fine. It has been a while, but I wonder if I added that check because the objects.compose API used to fail if it wasn't set.

@frankyn
Copy link
Member

frankyn commented Sep 19, 2018

Ah good question. I do recall this came up for Node.js and the GCS API has since changed this requirement. The code used to be correct but now the restriction has been removed.

@tseaver tseaver self-assigned this Sep 19, 2018
@tseaver tseaver added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Sep 19, 2018
@tseaver
Copy link
Contributor

tseaver commented Sep 19, 2018

How about not failing the compose if a content_type is not found?

PR #6031 makes that change, including a system test. Note that the server does not set application/octet-stream as the composed object's contentType: it remains unset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants