Use of Nextflow tool for toxicogenomics-based prediction and mechanism identification in OpenRiskNet e-infrastructure
Monday, 27 May 2019, 16:00 CEST
Presenter: Evan Floden (Fundacio Centre De Regulacio Genomica, Spain)
Predictive toxicology and risk assessments increasingly rely on Big Data analyses for more informed decision making. Toxicogenomics uses transcriptomic readouts to predict the characteristics of compounds based on the gene-expression profiling of cells in response to exposure. These large genomic analysis place new computational demands on researchers with the handling of datasets, the combining of tools and the reproducible deployment of an analysis presenting significant challenges.
This webinar highlights how these challenges can be overcome using the open source workflow management tool Nextflow within the OpenRiskNet e-infrastructure. Using the toxicogenomics example, we will show how a workflow can be created, deployed and shared across Kubernetes-based environments.
To illustrate the use of external resources we developed a toxicogenomics use case:
- The workflow termed “nf-toxomix” is is a pipeline for toxicology predictions based on transcriptomic profiles. The workflow is built using Nextflow with an accompanying docker/singularity container.
- The workflow as adapted from Magkoufopoulou et al., 2012 research titled "A transcriptomics-based in vitro assay for predicting chemical genotoxicity in vivo".
- The method focuses on training the genotoxicity model with gene expression data from treated/non-treated samples and then assesses the prediction of genotoxicity using a training test validation approach.
- We expanded the workflow to include preprocessing steps where raw transcriptomic data is searched for and downloaded from the Sequence Read Archive, then mapped against the human reference genome and read counts generated.
- This step is computationally demanding and therefore was configured to be deployed on public cloud resources (AWS Batch) from the OpenRiskNet VRE.