Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running gridss_somatic_filter #635

Open
kcleal opened this issue Jul 24, 2023 · 8 comments
Open

Error when running gridss_somatic_filter #635

kcleal opened this issue Jul 24, 2023 · 8 comments

Comments

@kcleal
Copy link

kcleal commented Jul 24, 2023

Hi,

Ive run in to an error running the somatic filter:

Rscript ./GRIDSS/gridss_somatic_filter --input ERR2752450.gridss.vcf --output gridss_hq_somatic.vcf.gz --scriptdir ./GRIDSS/
No reference genome supplied using --ref. Not performing variant equivalence checks.
2023-07-24 13:50:28 Reading ERR2752450.gridss.vcf
Tumour samples: ERR2752450.cram
Matched normals: ERR2752449.cram
Error in `str_detect()`:
! `string` must be a vector, not a <CompressedCharacterList> object.
Backtrace:
    ▆
 1. ├─global align_breakpoints(full_vcf)
 2. │ └─stringr::str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\\]\\[]")
 3. │   └─stringr:::check_lengths(string, pattern)
 4. │     └─vctrs::vec_size_common(...)
 5. └─vctrs:::stop_scalar_type(`<fn>`(`<CmprssCL>`), "string", `<env>`)
 6.   └─vctrs:::stop_vctrs(...)
 7.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
Execution halted

Any ideas about how to fix this, thanks?

@d-cameron
Copy link
Member

d-cameron commented Jul 25, 2023 via email

@kcleal
Copy link
Author

kcleal commented Jul 25, 2023

Thanks @d-cameron for the quick reply. The vcf was generated by gridss. I will make a new environment and try re-installing, thanks!

@warthmann
Copy link

Hello, I produced tumor/normal vcfs with gridss and would now like to postprocess with 'gridss_somatic_filter'. I ran into the exact issue/error as above and would need advise what to try next. Any help is greatly appreciated!

------>8---------------------
Test passed 😸
Test passed 🥇
Loading required package: BSgenome
2023-11-01 17:25:03.646492 Reading tumor_vs_normal_all_calls.vcf
Tumour samples: tumor
Matched normals: normal
Error in str_detect():
! string must be a vector, not a object.
Backtrace:

  1. ├─global align_breakpoints(full_vcf)
  2. │ └─stringr::str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\]\[]")
  3. │ └─stringr:::check_lengths(string, pattern)
  4. │ └─vctrs::vec_size_common(...)
  5. └─vctrs:::stop_scalar_type(<fn>(<CmprssCL>), "string", <env>)
  6. └─vctrs:::stop_vctrs(...)
  7. └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
    

Execution halted
------------------>8-----------------------

Details:
It is a brand new gridss conda environment, installed with 'mamba create -n gridss gridss'
This is my command:
'gridss_somatic_filter --input tumor_vs_normal_all_calls.vcf --output test -n 1 --pondir pondir --ref BSgenome.xxx.yyy.zzz -f test-full'
I produced the necessary files (gridss_pon_breakpoint.bedpe, gridss_pon_single_breakend.bed) as instructed and provide them in 'pondir'. I am working in a plant and had to build the BSgenomes package myself. I tried to build it with R-library BSgenome version 1.68 in the gridss conda environment, but it fails to build with this error:

... Error in .TwoBits_export(mapply(.DNAString_to_twoBit, object, seqnames), :
UCSC library operation failed
(very similar error when 'ondisk_seq_format: fa')

It builds fine with Biocoductor BSgenome library version 1.70 on my system R 4.3, and I am using this BSgenomes package (BSgenome.xxx.yyy.zzz).

@warthmann
Copy link

Update: The bioconductor R-library BSgenome version 1.68 from gridss conda install fails to produce a BSgenome package. It was apparently built (R CMD build) without the --keep-empty-dirs flag, so the necessary directories /inst/extdata/ were missing. Creating them solved the issue. See https://support.bioconductor.org/p/124169/

@warthmann
Copy link

and I can confirm that my gridss produced vcf has only one REF and one ALT allele per locus. Example entries. Some do contain ".", though.

bcftools query -f '%CHROM %POS %REF %ALT\n' xxx.vcf
------>8----------
chr01 20422694 T T[chr01:20422705[
chr01 20422705 C ]chr01:20422694]C
chr01 20509080 A A.
chr01 20597157 T .TGAAAAAACAACATCCAGCTATCAGTTCTCAAGAAAAGATAT
chr01 20778566 A ]chr23:23317025]A
chr01 21198059 G G]chr01:21198094]
------>8----------

@wphillips13
Copy link

Hello,

I have been having the same error as warthmann above. Has there been any solution to this?

@hberger
Copy link

hberger commented Nov 17, 2023

A quick fix that worked for me:

  • Locate the fie libgridss.R in your conda env folder (e.g. ~/.conda/envs/<my_env>/share/gridss-2.13.2-2)
  • Replace line 780 in this file

Original:

  isbp = str_detect(VariantAnnotation::fixed(vcf)$ALT, "[\\]\\[]")

New:

  isbp = str_detect(as.character(VariantAnnotation::fixed(vcf)$ALT), "[\\]\\[]")  

Then rerun gridss_somatic_filter.

Note: this assumes that the ALT fields contain a single allele per line, which seems to be the case in my GRIDSS output VCF files.

@warthmann
Copy link

Great! thanks @hberger, your fix worked for me as well. I.e., the script now ran through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants