Skip to content

Commit

Permalink
Update pgxRpi.md
Browse files Browse the repository at this point in the history
  • Loading branch information
hangjiaz authored Aug 5, 2024
1 parent 33215f2 commit 8dcadd8
Showing 1 changed file with 37 additions and 19 deletions.
56 changes: 37 additions & 19 deletions docs/pgxRpi.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,71 @@
# pgxRpi, an R Library to Access Progenetix Data

`pgxRpi` is an API wrapper package to access data from Progenetix database. More details about this package are in the [vignettes](https://github.com/progenetix/pgxRpi). There are several functions in this R package.
`pgxRpi` is an API wrapper package to access data from Progenetix database. For more detailed documentation, please visit the [GitHub repository](https://github.com/progenetix/pgxRpi).

## Retrieve available filters

Filters are rules to select records based on their field values, allowing for precise queries in Progenetix. More details about filters can be found [here](https://docs.progenetix.org/common/classifications-and-ontologies/).

The following code retrieves filters with the `NCIT` prefix:

```
ncit_filters <- pgxFilter(prefix="NCIT")
```

## Retrieve biosample information

You can select biosamples from specific groups of interests, chosen by a filter. The description about _filters_ is [here](https://docs.progenetix.org/common/classifications-and-ontologies/).
You can retrieve biosample information from specific groups of interest, selected using a filter.

```
biosamples <- pgxLoader(type="biosample", filters = "NCIT:C3512",codematches = TRUE)
biosamples <- pgxLoader(type="biosamples", filters = "NCIT:C3512")
```
The returned biosample information includes biosample id, various codes for tumor types, tumor stage, survival data, associated literature or research project, etc.

## Query CNV coverage data of biosamples from specific cohorts
The returned biosample information includes details such as biosample ID, tumor types, tumor stage, and associated literature or research projects.

The coverage is calculated across 1MB genomic bins, chromosomal arms, whole chromosomes, or whole genome.
## Retrieve individual information

The CNV coverage across genomic bins can be accessed by setting `output` = "pgxmatrix". More details about the data format "pgxmatrix" see the [documentation](https://docs.progenetix.org/services/#cnv-status-matrix).
You can retrieve information about individuals from whom samples are derived, including survival data.

```
cnv.status <- pgxLoader(type="variant", filters = "NCIT:C3058", output="pgxmatrix", codematches = T)
individuals <- pgxLoader(type="individuals", filters = "NCIT:C3512")
```
## Visualize survival data

The CNV coverage across chromosomal arms, chromosomes, or whole genome can be accessed by setting `output` = "coverage".
You can visualize the survival differences between younger and older patients based on the queried individual information.

```
cnv.status <- pgxLoader(type="variant", filters = "NCIT:C4443", output="coverage", codematches = F)
pgxMetaplot(individuals,group_id="age_iso", condition="P65Y", pval=TRUE)
```

<img src="../img/pgxRpi-survival-plot.png" style="margin-left: auto; margin-right:auto" />

## Query and export segment copy number variant data

You can download the copy number variant data of individual biosamples. The biosample id can be queried by pgxRpi or by Progenetix [website](http://progenetix.org/biosamples/).
The variant data exportation supports different output formats, more information see vignettes.
You can download the copy number variant data of individual biosamples. The biosample ID can be obtained via pgxRpi or the [Progenetix website](http://progenetix.org/biosamples/).

The variant data export supports different output formats. For more information, refer to the package vignettes.

```
variants <- pgxLoader(type="variant", biosample_id = c("pgxbs-kftva6du","pgxbs-kftva6dv","pgxbs-kftva6dx"),output = "pgxseg")
pgxLoader(type="g_variants", biosample_id = c("pgxbs-kftva6du","pgxbs-kftva6dx"),output = "pgxseg", save_file=TRUE)
```

## Query and visualize CNV frequencies
## Query CNV fraction data of biosamples from specific cohorts

You can query the CNV frequency of specific filters, namely specific cohorts. There are two available data formats. One is [`.pgxseg`](https://docs.progenetix.org/services/#pgxseg-segment-cnv-frequencies), good for visualization. Another is [`.pgxmatrix`](https://docs.progenetix.org/services/#cnv-frequency-matrix), good for analysis.
CNV fractions are calculated based on segment data across various genomic scales, such as 1MB genomic bins, chromosomal arms, whole chromosomes, or the entire genome (GRCh38).

```
frequency <- pgxLoader(type="frequency", output ='pgxseg',
filters=c("NCIT:C4038","pgx:icdom-85003"),
codematches = TRUE)
cnv_fraction_across_chro_genome <- pgxLoader(type="cnv_fraction", filters = "NCIT:C2948")
cnv_fraction_across_bin <- pgxLoader(type="cnv_fraction", filters = "NCIT:C2948", output="pgxmatrix")
```

The data visualization requires the input data with `.pgxseg` format. You can plot the frequency by genome, by chromosomes, or plot like circos.
## Query and visualize CNV frequencies

You can query the CNV frequency of specific filters. There are two available data formats. One is [`.pgxfreq`](https://docs.progenetix.org/file-formats/#pgxfreq-segment-cnv-frequencies). Another is [`.pgxmatrix`]([https://docs.progenetix.org/services/#cnv-frequency-matrix].

```
frequency <- pgxLoader(type="cnv_frequency", output ='pgxfreq',
filters=c("NCIT:C4038","pgx:icdom-85003"))
```
```
pgxFreqplot(frequency, filters='pgx:icdom-85003')
```
Expand Down

0 comments on commit 8dcadd8

Please sign in to comment.