-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move VEP to Requester Pays #8268
Conversation
Adding "WIP" tag since I had a test failure from |
Ah, the mix of reference genomes is coming from me running all the "cluster_tests" for GRCh38 cluster as well. One of the tests hardcodes 37 |
Ok, all tests passing now. Removing WIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use forum link
conf.extend_flag('initialization-actions', [deploy_metadata[f'vep-{args.vep}.sh']]) | ||
sys.stderr.write("WARNING: Running VEP outside of the US is VERY expensive. Please contact us for support.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Please contact the Hail team on https://discuss.hail.is for support."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and in the message above)
This appers to have been lost in hail-is#8268. There is no explicit mention of it and it was inconsistently removed (e.g. loftee in GRCh38 still has the `&`). I assume it was a mistake.
This appers to have been lost in #8268. There is no explicit mention of it and it was inconsistently removed (e.g. loftee in GRCh38 still has the `&`). I assume it was a mistake.
This PR changes our vep init scripts to pull from the new
hail-us-vep
google bucket, which is requester pays.I realized while doing this that my previous PR (#8253) really had nothing to do with supporting this, since that really just sets the configuration for the Google Cloud / Hadoop connector, which is not how we get vep data. I tried to enforce the rules from those command line arguments anyway by rejecting
--vep
flag if they don't specify they're ok with requester pays and manually checking thathail-us-vep
was in their approved bucket list. But if a user was to specify the init scripts usinggcloud dataproc
and didn't go throughhailctl
, there'd be no catch to check if they were ok with requester pays. Perhaps there is some way we could set environment variables on the dataproc machines based on the--requester-pays-allow....
flags and use those in the init scripts.I also expanded
make test-dataproc
to test with a GRCh38 cluster as well, as we use separate scripts to make them and I'm uncomfortable with only testing one.