Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: call_variants_outputs did not pass sanity check. #517

Closed
NagaComBio opened this issue Feb 18, 2022 · 2 comments
Closed

ValueError: call_variants_outputs did not pass sanity check. #517

NagaComBio opened this issue Feb 18, 2022 · 2 comments

Comments

@NagaComBio
Copy link

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.3/docs/FAQ.md: Yes

Describe the issue:
The error arises during the "postprocess_variants" step. The quick-test and a run on chr22 from the same sample ran through without any issue. I tried to use group_variants=false as suggested here. But a similar error/crash occurs at a different variant/location. A similar problem was reported here, but the final fix is not provided.

Setup

  • Operating system: CentOS 7
  • DeepVariant version: 1.3.0
  • Installation method (Docker, built from source, etc.): Singularity image built from docker image
  • Type of data: (sequencing instrument, reference genome, anything special that is unlike the case studies?) WGS, Illumina x10

Steps to reproduce:

  • Command:
# Modified script
  singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
        -B ${INPUT_PATH}:/input \
       compute_envs/deepvariant_latest.sif \
        /opt/deepvariant/bin/run_deepvariant \
        --model_type=WGS \
        --ref=hs37d5_PhiX.fa \
        --reads=/input/${pid}/alignment/${prefix}_${pid}_merged.mdup.bam \
        --intermediate_results_dir=/input/${pid}/deepvariant_calling/tmp/${prefix}/ \
        --output_vcf=/input/${pid}/deepvariant_calling/${prefix}_${pid}_deepvariant.vcf.gz \
        --output_gvcf=/input/${pid}/deepvariant_calling/${prefix}_${pid}_deepvariant.g.vcf.gz \
        --num_shards=15

I have also tried postprocessing with group_variants, which also produces a similar error.

singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
        -B ${INPUT_PATH}:/input \
        compute_envs/deepvariant_latest.sif \
        /opt/deepvariant/bin/postprocess_variants \
        --group_variants=false \
        --ref=hs37d5_PhiX.fa \
        --infile=/input/${pid}/deepvariant_calling/tmp/${prefix}/call_variants_output.tfrecord.gz \
        --outfile=/input/${pid}/deepvariant_calling/${prefix}_${pid}_deepvariant.vcf.gz
  • Error trace: (if applicable)
I0217 17:00:21.108631 47945364948800 postprocess_variants.py:1115] Using sample name from call_variants output. Sample name: sample_
2022-02-17 17:00:21.116319: I deepvariant/postprocess_variants.cc:88] Read from: "..."/call_variants_output.tfrecord.gz
2022-02-17 17:00:22.403255: I deepvariant/postprocess_variants.cc:103] Total #entries in single_site_calls = 228285
I0217 17:00:24.204934 47945364948800 postprocess_variants.py:1180] CVO sorting took 0.051486388842264814 minutes
I0217 17:00:24.205343 47945364948800 postprocess_variants.py:1183] Transforming call_variants_output to variants.
I0217 17:00:24.205814 47945364948800 postprocess_variants.py:1204] Writing variants to VCF.
I0217 17:00:24.205858 47945364948800 postprocess_variants.py:774] Writing output to VCF file: "..."/_deepvariant.vcf.gz
I0217 17:00:24.230250 47945364948800 genomics_writer.py:175] Writing "..."/_deepvariant.vcf.gz with NativeVcfWriter
I0217 17:00:24.234843 47945364948800 postprocess_variants.py:783] 1 variants written.
I0217 17:00:38.475637 47945364948800 postprocess_variants.py:783] 100001 variants written.
W0217 17:00:50.128011 47945364948800 postprocess_variants.py:403] Alt allele indices found from call_variants_outputs for variant reference_bases: "GTTTT"
alternate_bases: "G"
alternate_bases: "GT"
alternate_bases: "GTT"
calls {
  info {
    key: "AD"
    value {
      values {
        int_value: 18
      }
      values {
        int_value: 33
      }
      values {
        int_value: 10
      }
      values {
        int_value: 6
      }
    }
  }
  info {
    key: "DP"
    value {
      values {
        int_value: 79
      }
    }
  }
  info {
    key: "VAF"
    value {
      values {
        number_value: 0.4177215189873418
      }
      values {
        number_value: 0.12658227848101267
      }
      values {
        number_value: 0.0759493670886076
      }
    }
  }
  genotype: -1
  genotype: -1
  call_set_name: "sample"
}
end: 160351258
reference_name: "1"
start: 160351253
 is [[0], [1], [2]], which is invalid.
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1249, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/absl_py/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/absl_py/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1205, in main
    write_variants_to_vcf(
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 778, in write_variants_to_vcf
    for variant in variant_iterable:
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/com_google_deepvariant/deepvariant/haplotypes.py", line 87, in maybe_resolve_conflicting_variants
    for overlapping_candidates in _group_overlapping_variants(sorted_variants):
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/com_google_deepvariant/deepvariant/haplotypes.py", line 106, in _group_overlapping_variants
    for variant in sorted_variants:
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 853, in _transform_call_variants_output_to_variants
    canonical_variant, predictions = merge_predictions(
  File "/tmp/Bazel.runfiles_ohe4bkg1/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 717, in merge_predictions
    raise ValueError('`call_variants_outputs` did not pass sanity check.')
ValueError: `call_variants_outputs` did not pass sanity check.

Does the quick start test work on your system?
Please test with https://github.com/google/deepvariant/blob/r0.10/docs/deepvariant-quick-start.md.
Is there any way to reproduce the issue by using the quick start?
No, the quick start and also chr22 from the same sample ran through.

Any additional context:

@MariaNattestad
Copy link
Collaborator

Hi @NagaComBio

Sorry for the delay! I don't have a clear solution to this problem just from looking at the error message, but if you can share the data, e.g. with just a small slice of the bam, then I can try to reproduce the issue. If that's possible, you can email me at marianattestad@google.com.

For now I can tell you that --group_variants=false is only applicable when using vcf_candidate_importer, which is the most common way that this error occurs, since the input VCF for that can have multiple candidate variants in the same position, which isn't supposed to be possible when the candidates are generated by make_examples without vcf_candidate_importer.

Thanks,
Maria

@NagaComBio
Copy link
Author

Hi @MariaNattestad

Thanks for the offer, but it would be difficult to share the data without a DTA. So, I went back and reran the workflow (--num_shards=5) for a short region around the above coordinates and then again for the complete chr1, both the tests ran through without any errors. And some of the candidate variants called earlier are not present here.

1       160351251       .       T       <*>     0       .       END=160351253   GT:GQ:MIN_DP:PL 0/0:50:80:0,261,2609
1       160351254       .       GTTTT   G,<*>   9.1     PASS    .       GT:GQ:DP:AD:VAF:PL      0/1:9:79:18,33,0:0.417722,0:8,0,26,990,990,990

Not sure how it's resolved. But, I will close this issue for now and will reopen it if a similar error pops up during the rerun of all the chrs.

Thank you for looking into the issue,
Naga

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants