R Toolkit for Metagenomic Studies

This repository contains 4 scripts for easy processing and visualization of metagenomic data. All scripts have the necessary data import and formatting commands so they can be used independently of each other.

To correctly use any of the scripts you will need:

An abundance (count) table of OTUs or any taxonomic or functional annotation.
A table indicating the sites/treatment or any other higher category to which each sample belongs.

Required format of input files

Abundance table:

samples	OTU1	OTU2	etc
sample1	50	6	...
sample2	77	99	...
sample3	103	5	...
sample4	10	53	...
...	...	...	...

Sites/treatment table (note this table does not have headers):


sample1	earwax
sample2	earwax
sample3	armpit_sweat
sample4	armpit_sweat
...	...

diversity.R

This script provides richness as well Shannon and Simpson alpha diversity metrics as calculated by vegan 2.5-6 package (Oksanen et al., 2019).

Dependencies

ggplot2
reshape2
vegan
RColorBrewer

Steps

Import data
Richness calculation
Shannon's diversity index calculation
Simpson's diversity index calculation

->each step performs a pairwise Wilcox test, Kruskal-Wallis test for multiple comparisons and creates boxplot of sites/treatments signaling the means

Example outputs

non-metric_multidimensional_scaling.R

Non-Metric Multidimensional Scaling (NMDS) is an ordination analysis widely (but not exclusively) used in microbial ecology. It uses a dissimilarity matrix to arrange samples into a 2D plane depending on its taxonomic or functional composition. Although there are many dissimilarity matrices for many different purposes (see vegan documentation), Bray-Curtis dissimilarity matrix is the most used for microbial ecology studies.

Dependencies

ggplot2
reshape2
vegan
RColorBrewer
data.table

Steps

Import data
Data checking (normalization, Bray-Curtis dissimilarity matrix calculation, anova of beta dispersion)
Statistical comparisons of sites/treatments (permanova, anosim)
NMDS (stress score calculation and plotting, NMDS plot)

Example outputs

stacked_barplots.R

For further visualization of the composition of different samples in terms of relative abundance. Minimum relative abundance threshold can be adjusted at will.

Dependencies

ggplot2
reshape2
RColorBrewer

Steps

Import data
Data normalization
Threshold of minimum relative abundance (default: 0.005)
Plotting

Example outputs

indicators.R

This script uses the indicspecies package by De Cáceres and Lagendre (2009), which assesses the strength and statistical significance of the relationship between the occurrence/abundance of taxonomic categories and sites, providing specific taxa associated to them. I have also used this method for functions. P-value correction is included.

Dependencies

indicspecies
ggplot2
vegan
data.table

Steps

Import data
Data normalization
Indicator analysis
P-value correction (Benjamini-Hochberg)

Credits

Manuel II García-Ulloa https://github.com/manuelgug
Mariette Viladomat Jasso https://github.com/MarietteViladomat
Some parts of the diversity.R, non-metric_multidimensional_scaling.R and indicators.R scripts were based on jkzorz's tutorials (https://github.com/jkzorz/jkzorz.github.io)
Sample data comes from publicly available Eastern Mediterranean 16s Survey project https://www.mg-rast.org/mgmain.html?mgpage=project&project=mgp10029

References

Oksanen, J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., ... & Wagner, H. (2019). vegan: Community Ecology Package. R package version 2.5–6. 2019. Cáceres, M. D., & Legendre, P. (2009). Associations between species and groups of sites: indices and statistical inference. Ecology, 90(12), 3566-3574.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
example_outputs		example_outputs
LICENSE		LICENSE
README.md		README.md
abundance_table.csv		abundance_table.csv
diversity.R		diversity.R
indicator_species_and_functions.R		indicator_species_and_functions.R
non-metric_multidimensional_scaling.R		non-metric_multidimensional_scaling.R
sites.csv		sites.csv
stacked_barplots.R		stacked_barplots.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R Toolkit for Metagenomic Studies

Required format of input files

diversity.R

Dependencies

Steps

Example outputs

non-metric_multidimensional_scaling.R

Dependencies

Steps

Example outputs

stacked_barplots.R

Dependencies

Steps

Example outputs

indicators.R

Dependencies

Steps

Credits

References

About

Releases

Packages

Contributors 2

Languages

License

manuelgug/R-toolkit-for-metagenomic-studies

Folders and files

Latest commit

History

Repository files navigation

R Toolkit for Metagenomic Studies

Required format of input files

diversity.R

Dependencies

Steps

Example outputs

non-metric_multidimensional_scaling.R

Dependencies

Steps

Example outputs

stacked_barplots.R

Dependencies

Steps

Example outputs

indicators.R

Dependencies

Steps

Credits

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages