Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MD5 from UploadFileOptions #20356

Closed
wants to merge 2 commits into from

Conversation

siminsavani-msft
Copy link
Contributor

@siminsavani-msft siminsavani-msft commented Mar 7, 2023

Fixes #20092

Adds MD5 from UploadFileOptions to UploadOptions and CommitBlockListOptions.

  • The purpose of this PR is explained in this or a referenced issue.
  • The PR does not update generated files.
  • Tests are included and/or updated for code changes.
  • Updates to CHANGELOG.md are included.
  • MIT license headers are included in each file.

@ghost ghost added the Storage Storage Service (Queues, Blobs, Files) label Mar 7, 2023
Copy link
Member

@jhendrixMSFT jhendrixMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth doing a scrub to see if there are other places where we missed propagating option values.


md5sum, resp, err := performUploadWithMD5Check(s.T(), _require, s.T().Name(), fileSize)

_require.ErrorContains(err, "400 The MD5 value specified in the request did not match with the MD5 value calculated by the server.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we getting this error for large file uploads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For larger files, the service won't be able to calculate the MD5 and this error is expected as the user provided MD5 and the response MD5 won't match. We don't get this error as of now because the user provided MD5 doesn't get passed at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a transactional md5 will always be calculated by the service. If the service is failing the transactional md5 check it's because the check is actually failing. I imagine this is because you've configured the transactional md5 on put blocklist. A transactional md5 for that API is literally checking the content integrity of the blocklist itself (it's one of the only places .NET SDK currently doesn't calculate one).

Transactional md5 validates individual requests and nothing more, meaning it would need to validate individual blocks. If one large md5 is being passed in for a partitioned upload, the client should fail fast since the checksum cannot be split.

Tags: o.Tags,
Metadata: o.Metadata,
Tier: o.AccessTier,
TransactionalContentMD5: o.TransactionalContentMD5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two features, one is to just set the MD5 sum in the blob and other one is transactional MD5. For transactional options crc64 is also supported. Transactional fields are only for validation and will not set any properties in the blob. Just double check that we are good with all these combinations as I do not see transactional CRC here, just had doubts on we are still missing some things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful about the way we chose to expose transactional MD-5 and CRC-64 to the customer. We will be implement end-to-end content validation in the Go SDK this fall, after the FE has implement trailing CRC-64 header.

This work will look something like this - Azure/azure-sdk-for-net#29778

We do not need to calculate CRC-64 or MD5 within the Go SDK at this time, but we need to design our public interfaces in a way that will support this option in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how UploadOptions looks in .NET - https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/storage/Azure.Storage.Blobs/src/Models/BlobUploadOptions.cs

It has UploadTransferValidationOptions as a property - https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/storage/Azure.Storage.Common/src/UploadTransferValidationOptions.cs

I recommend we do something similar in Go to future proof out public interfaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Sean. In .NET we had to retire our byte[] contentHash arguments and replace them with a whole configuration setup. It would be best to avoid writing "instant legacy" APIs and this probably merits further offline discussion of how to balance immediate needs with medium-term plans for transactional content integrity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In .NET, if the customer populates TransferValidationOptions.PrecalculatedChecksum and TransferValidationOptions.ChecksumAlgorithm, the SDK uses their pre-calculated checksum. If the customer only populates TransferValidationOptions.ChecksumAlgorithm, the SDK calculates the checksum for the user.

We don't want to implement the SDK calculating the checksum for the user until ~6 months from now when the service supports trailing checksum footers, but we need to design our public interface to be compatible with this functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaschrep-msft, what should we do in in the case where the customer provides a precalculated checksum for a multi-part upload, before we implement calculating checksums in the SDK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fail fast. If a customer provides a transactional checksum that the SDK cannot make use of, we shouldn't pretend an integrity check has occurred when it actually hasn't.

In .NET, before we had SDK calculation on caller behalf, we didn't accept the parameter on any API that had the possibility of splitting an upload, as we didn't want customers to write code that suddenly broke once they passed a file size threshold they were not aware of. Customers had to use direct to REST apis if they wanted this feature.

@vibhansa-msft vibhansa-msft added this to the azblob v1.0.1 milestone Mar 8, 2023
@seanmcc-msft seanmcc-msft mentioned this pull request Mar 21, 2023
5 tasks
@siminsavani-msft siminsavani-msft deleted the siminsavani/UploadContentMD5 branch April 4, 2023 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AzBlob Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[azblob] file upload validation
6 participants