CloVR Script
Next-generation sequencing has been successfully used to characterize microbial communities based on the amplification and sequencing of phylogenetic marker genes, e.g. the 16S rRNA gene. In comparison to 16S rRNA amplicon sequencing-based procedures for bacterial and archaeal microbiota analysis, few comparable protocols have been made available to study fungal organisms. Here we describe the CloVR-ITS protocol for fungal microbiota analysis using internal transcript spacer (ITS) amplicon sequencing. CloVR-ITS includes well known bioinformatic tools for alpha and beta diversity analyses, suitable to process even large sequence datasets:
CloVR Script
The QIIME script pick_otus.py is used to cluster all non-chimeric reads from all samples into genus-level operational taxonomic units (OTUs) based on anucleotide sequence identity threshold. The clustering program for this step is UCLUST [2] and the nucleotide sequence identity threshold for all reads within an OTU is 85%. UCLUST is set to examine both the forward and reverse complementsequences during clustering.
Genus-level OTUs created by the QIIME commands above are reorganized and input to Mothur which uses the scripts read.otu, rarefaction.single, andsummary.single to generate rarefaction curves and estimators of species diversity for each sample. Finally a custom program called Leech is used to plotall rarefaction curves together defined by varying color schemes related to the input mapping file.
The output from the taxonomic classification of each sequence from all samples by the BLAST-based classification step is further analyzed and graphicallyrepresented using the Metastats program [6] and customized scripts in the R programming language.
Custom R scripts are used to normalize taxonomic group counts to relative abundances. Stacked histograms of the relative abundances are generated in the.pdf format, if there are at most 50 samples and at most 25 taxon groups. Beyond these limits a visualized histogram is not generated.
A custom R script called skiff is used to normalize taxon counts and to calculate distance matrices for samples and taxonomic groups, using a Euclideandistance metric. Complete-linkage (furthest neighbor) clustering is employed to create dendrograms of samples and taxa in the .pdf format. The R packagesRColorBrewer and gplots are included in this task.
Custom R scripts are used to form pie charts displaying proportions of sequences assigned to specific functional and taxonomic levels for up to 12 samples.Outputs are in .pdf format. For more than 12 samples this function is not performed, as the visual comparison for the user would be cumbersome.
Many bio/medical cloud systems have emerged and some systems began to meetincreased biological and computational needs. Most bio/medical cloud systemsprovide simple command-line execution of scripts or pipelines to the user.Although they are systems for elementary, a single task, they are well-suitedfor the IaaS concept of cloud computing as users can configure systems withthe capacity to analyze the data. Some bio/medical cloud systems are moretightly coupled with cloud resources. Existing non-cloud applications havebeen redesigned to launch service on the cloud by utilizing features of cloudcomputing such as flexible resource allocation on demands, automatic systemconfiguration, cost efficiency, and unlimited resources.
FX[10] is another recently released cloud system that provides auser-friendly web-based interface to provide high usability of biologicaltools for users who are not familiar with softwares. Utilizing the cloudcomputing infrastructure, FX provides analysis tasks such as estimating geneexpression level and genomic variant calling from the RNA-seq reads usingtranscriptome-based references. Based on the Amazon Web Service (AWS), itprovides a web-based working environment where the user can upload data andconfigure analysis settings based on options that the system provides. Theanalysis steps are however not as flexibly arrangeable as the workflowcomposing systems, but may be set with specific parameters for each analysistask (e.g., hit count for SNP, INDEL or alignment options). Since it isdesigned for specific tasks, it doesn't require manual arrangement offunction pipelines. Parameter settings and execution of the analysis are doneby using a web-interface.
Authors' contributions: HC and SK defined the scope of the paper and drafted the manuscript. IJparticipated in drafting the bio health cloud feature part. HL participatedin drafting the BioVLab part. SM participated in drafting the cloud computingpart. SWL participated in drafting the cloud systems' trend. All authorsread and approved the final manuscript. 041b061a72