Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move implicit directory repair from list to delete operations #156

Merged
merged 7 commits into from
Apr 2, 2019

Conversation

yzhou2001
Copy link
Contributor

@yzhou2001 yzhou2001 commented Mar 15, 2019

For details please refer to issue #155.

Automated tests for Hadoop 2&3, and the integration tests have passed cleanly.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@medb medb self-requested a review March 27, 2019 00:19
@medb medb changed the title move implicit directory repair from list ops to the delete Move implicit directory repair from list to delete operations Apr 2, 2019
@medb
Copy link
Contributor

medb commented Apr 2, 2019

Great contribution, thank you!

@medb medb merged commit 03ec27b into GoogleCloudDataproc:master Apr 2, 2019
medb added a commit that referenced this pull request May 14, 2019
…Status` methods

Note: this is essentially the same change as in [] that triggered omg/12873 in the past, but it has feature flag that turns off it by default and tests that assert number of GCS requests when parallelism is enabled.

In the worst case `getFileStatus` method can make up to 3 sequential requests to GCS to get implicit directory status.

After moving implicit directory repair from list to delete/rename operations this worst case could be more frequent than before, because there higher chance to encounter implicit non-repaired directory:
#156

This CL adds an option to execute these GCS requests in parallel which could reduce latency by up to 3 times.

	Change on 2019/05/13 by idv <idv@google.com>

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=248044068
medb added a commit that referenced this pull request May 14, 2019
…Status` methods

Note: this is essentially the same change as in [] that triggered omg/12873 in the past, but it has feature flag that turns off it by default and tests that assert number of GCS requests when parallelism is enabled.

In the worst case `getFileStatus` method can make up to 3 sequential requests to GCS to get implicit directory status.

After moving implicit directory repair from list to delete/rename operations this worst case could be more frequent than before, because there higher chance to encounter implicit non-repaired directory:
#156

This CL adds an option to execute these GCS requests in parallel which could reduce latency by up to 3 times.

	Change on 2019/05/13 by idv <idv@google.com>

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=248044068
medb added a commit that referenced this pull request May 15, 2019
…Status` methods

Note: this is essentially the same change as in [] that triggered omg/12873 in the past, but it has feature flag that turns off it by default and tests that assert number of GCS requests when parallelism is enabled.

In the worst case `getFileStatus` method can make up to 3 sequential requests to GCS to get implicit directory status.

After moving implicit directory repair from list to delete/rename operations this worst case could be more frequent than before, because there higher chance to encounter implicit non-repaired directory:
#156

This CL adds an option to execute these GCS requests in parallel which could reduce latency by up to 3 times.

	Change on 2019/05/13 by idv <idv@google.com>

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=248044068
mayanks pushed a commit to mayanks/hadoop-connectors that referenced this pull request Aug 3, 2022
mayanks pushed a commit to mayanks/hadoop-connectors that referenced this pull request Aug 3, 2022
…Status` methods

Note: this is essentially the same change as in [] that triggered omg/12873 in the past, but it has feature flag that turns off it by default and tests that assert number of GCS requests when parallelism is enabled.

In the worst case `getFileStatus` method can make up to 3 sequential requests to GCS to get implicit directory status.

After moving implicit directory repair from list to delete/rename operations this worst case could be more frequent than before, because there higher chance to encounter implicit non-repaired directory:
GoogleCloudDataproc#156

This CL adds an option to execute these GCS requests in parallel which could reduce latency by up to 3 times.

	Change on 2019/05/13 by idv <idv@google.com>

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=248044068
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants