From 1ace0f0e15655b16c12d79c53b8fbaf2f6af7c17 Mon Sep 17 00:00:00 2001 From: mikelove Date: Mon, 9 Oct 2023 14:39:56 -0400 Subject: [PATCH] clarify piscem infer details --- DESCRIPTION | 6 +++--- R/tximeta.R | 6 ++---- README.md | 13 ++++++++++--- man/tximeta.Rd | 6 ++---- vignettes/tximeta.Rmd | 27 ++++++++++++++------------- 5 files changed, 31 insertions(+), 27 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 61e2dea..a1f2794 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,9 +1,9 @@ Package: tximeta -Version: 1.19.9 +Version: 1.19.10 Title: Transcript Quantification Import with Automatic Metadata Description: Transcript quantification import from Salmon and - alevin with automatic attachment of transcript ranges and - release information, and other associated metadata. De novo + other quantifiers with automatic attachment of transcript ranges + and release information, and other associated metadata. De novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for computational reproducibility. Authors@R: c( diff --git a/R/tximeta.R b/R/tximeta.R index dee34fe..317d4f0 100644 --- a/R/tximeta.R +++ b/R/tximeta.R @@ -55,13 +55,11 @@ NULL #' Import transcript quantification with metadata #' -#' \code{tximeta} leverages the hashed checksum of the Salmon index, +#' \code{tximeta} leverages the hashed checksum of the Salmon or piscem index, #' in addition to a number of core Bioconductor packages (GenomicFeatures, #' ensembldb, AnnotationHub, GenomeInfoDb, BiocFileCache) to automatically #' populate metadata for the user, without additional effort from the user. -#' Note that \code{tximeta} requires that the entire output directory of Salmon -#' or alevin is present and unmodified in order to identify the provenance of the -#' reference transcripts. +#' For other quantifiers see the \code{customMetaInfo} argument below. #' #' Most of the code in \code{tximeta} works to add metadata and transcript ranges #' when the quantification was performed with Salmon. However, diff --git a/README.md b/README.md index e3c799d..6999bdb 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,12 @@ metadata for transcript quantification data in Bioconductor. The `tximeta()` function imports quantification data from *Salmon* or other quantifiers, and returns a [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html#anatomy-of-a-summarizedexperiment) -object. +object. *tximeta* works natively with +[Salmon](https://salmon.readthedocs.io/en/latest/), +[alevin](https://salmon.readthedocs.io/en/latest/alevin.html), +or [piscem-infer](https://piscem-infer.readthedocs.io/en/latest/), +but can easily be configured to work with any transcript +quantification tool. If `tximeta()` recognizes the reference transcripts used for quantification, it will automatically download relevant @@ -31,12 +36,14 @@ quantification data). # How it works -The key idea behind *tximeta* is that *Salmon* propagates a hash value +The key idea behind *tximeta* is that *Salmon*, *alevin*, and +*piscem-infer* propagate a hash value summarizing the reference transcripts into each quantification directory it outputs. *tximeta* can be used with other tools as long as the [hash of the transcripts](https://github.com/COMBINE-lab/FastaDigest) -is also included in the output directories. +is also included in the output directories. See `customMetaInfo` +argument of `tximeta()` for more details. ![](man/figures/diagram.png) diff --git a/man/tximeta.Rd b/man/tximeta.Rd index 90e2cea..ee25aef 100644 --- a/man/tximeta.Rd +++ b/man/tximeta.Rd @@ -80,13 +80,11 @@ any known transcriptomes, or any locally saved \code{linkedTxome}, \code{tximeta} will just return a non-ranged SummarizedExperiment) } \description{ -\code{tximeta} leverages the hashed checksum of the Salmon index, +\code{tximeta} leverages the hashed checksum of the Salmon or piscem index, in addition to a number of core Bioconductor packages (GenomicFeatures, ensembldb, AnnotationHub, GenomeInfoDb, BiocFileCache) to automatically populate metadata for the user, without additional effort from the user. -Note that \code{tximeta} requires that the entire output directory of Salmon -or alevin is present and unmodified in order to identify the provenance of the -reference transcripts. +For other quantifiers see the \code{customMetaInfo} argument below. } \details{ Most of the code in \code{tximeta} works to add metadata and transcript ranges diff --git a/vignettes/tximeta.Rmd b/vignettes/tximeta.Rmd index 2a7df3e..d979055 100644 --- a/vignettes/tximeta.Rmd +++ b/vignettes/tximeta.Rmd @@ -10,9 +10,9 @@ output: abstract: > Tximeta performs numerous annotation and metadata gathering tasks on behalf of users during the import of transcript quantifications from - *Salmon* or *alevin* into R/Bioconductor. Metadata and transcript - ranges are added automatically, facilitating genomic analyses and - assisting in computational reproducibility. + *Salmon*, *alevin*, or *piscem-infer* into R/Bioconductor. Metadata + and transcript ranges are added automatically, facilitating genomic + analyses and assisting in computational reproducibility. bibliography: library.bib vignette: | %\VignetteIndexEntry{Transcript quantification import with automatic metadata} @@ -25,7 +25,8 @@ vignette: | The `tximeta` package [@Love2020] extends the `tximport` package [@Soneson2015] for import of transcript-level quantification data into R/Bioconductor. It automatically adds annotation metadata when the -RNA-seq data has been quantified with *Salmon* [@Patro2017] or for +RNA-seq data has been quantified with *Salmon* [@Patro2017] or +[piscem-infer](https://piscem-infer.readthedocs.io/en/latest/), or the scRNA-seq data quantified with *alevin* [@Srivastava2019]. To our knowledge, `tximeta` is the only package for RNA-seq data import that can automatically identify and attach transcriptome metadata based on @@ -34,15 +35,15 @@ For more details on these packages -- including the motivation for `tximeta` and description of similar work -- consult the **References** below. -**Note:** `tximeta` requires that the **entire output directory** of -*Salmon* / *alevin* is present and unmodified in order to identify the -provenance of the reference transcripts. In general, it's a good idea -to not modify or re-arrange the output directory of bioinformatic -software as other downstream software rely on and assume a consistent -directory structure. For sharing multiple samples, one can use, for -example, `tar -czf` to bundle up a set of Salmon output directories, -or to bundle one alevin output directory. For tips on using `tximeta` -with other quantifiers see the +**Note:** `tximeta` requires that the **entire output** of +*Salmon* / *piscem-infer* / *alevin* is present and unmodified in +order to identify the provenance of the reference transcripts. In +general, it's a good idea to not modify or re-arrange the output +directory of bioinformatic software as other downstream software rely +on and assume a consistent directory structure. For sharing multiple +samples, one can use, for example, `tar -czf` to bundle up a set of +Salmon output directories, or to bundle one alevin output +directory. For tips on using `tximeta` with other quantifiers see the [other quantifiers](#other_quantifiers) section below. ```{r echo=FALSE}