Case Study
TGX – Toxicogenomics-based prediction and mechanism identification

Summary

In this case study a transcriptomics-based hazard prediction model for identification of specific molecular initiating events (MIE) will be applied based on (A) top-down and (B) bottom-up approaches.

The MIEs can include, but are not limited to: (1) Genotoxicity (p53 activation), (2) Oxidative stress (Nrf2 activation), (3) Endoplasmic Reticulum Stress (unfolded protein response), (4) Dioxin-like activity (AhR receptor activation), (5) HIF1 alpha activation and (6) Nuclear receptor activation (e.g. for endocrine disruption).

Objectives

  • Creation of prediction models based on differentially regulated genes (top-down approach);
  • Using knowledge of stress response pathways to integrate data sets for their activation or inhibition (bottom-up approach).

Risk assessment framework

This case study is associated with all 3 tiers of the selected framework and in particular the following steps:

  • Collection of support data;
  • Identification of analogues / suitability assessment and existing data;
  • Mode of Action hypothesis generation.

Use Cases Associated

This case study is associated with UC1 - Merge existing data by a common structure identifier and UC2 - Building and using a prediction model.

These two use cases are relevant for the top-down approaches:

  • Reproducing the prediction models published by Herwig et al., 2016 using data from the EU-project carcinoGENOMICs;
  • Advanced predictions using as much data as possible from the diXa data warehouse and other repositories giving free access to the data.

Databases and tools

Databases:

  • diXa (carcinoGENOMICs, Predict-IV), TG-GATEs, EU-ToxRisk (nascent), HeCaToS (nascent), ArrayExpress/GEO BioStudies.

Tools:

  • top-down: Data normalisation tools, prediction tools such as Caret;
  • bottom-up: ToxPi.

Service integration

Service integration will be needed for the omics databases; knowledge bases and data mining; processing and analysis.

Currently available services:

  • A database for curated toxicogenomic datasets
    Service type: Database / data source, Application, Visualisation tool, Software
  • Discover your variants of interest in human omics datasets
    Service type: Application, Software, Service
  • Programmatically retrieve metadata from the European Genome-phenome Archive
    Service type: Application, Service
  • Service to run Nextflow pipelines
    Service type: Workflow, Software, Service
  • Interactive computing and workflows sharing
    Service type: Workflow, Visualisation tool, Helper tool, Software, Analysis tool, Processing tool
  • Computation research made simple and reproducible
    Service type: Workflow, Database / data source, Service

Related resources

Poster
OpenRiskNet Part II: Predictive Toxicology based on Adverse Outcome Pathways and Biological Pathway Analysis
Marvin Martens, Thomas Exner, Nofisat Oki, Danyel Jennen, Jumamurat Bayjanov, Chris Evelo, Tim Dudgeon, Egon Willighagen
28 Aug 2019
Abstract:
The OpenRiskNet project (https://openrisknet.org/) is funded by the H2020-EINFRA-22-2016 Programme. Here we present how the concept of Adverse Outcome Pathways (AOPs), which captures mechanistic knowledge from a chemical exposure causing a Molecular Initiating Event (MIE), through Key Events (KEs) towards an Adverse Outcome (AO), can be extended with additional knowledge by using tools and data available through the OpenRiskNet e-Infrastructure. This poster describes how the case study of AOPLink, together with DataCure, TGX, and SysGroup, can utilize the AOP framework for knowledge and data integration to support risk assessments. AOPLink involves the integration of knowledge captured in AOPs with additional data sources and experimental data from DataCure. TGX feeds this integration with prediction models of the MIE of such AOPs using either gene expression data or knowledge about stress response pathways. This is complemented by SysGroup, which is about the grouping of chemical compounds based on structural similarity and mode of action based on omics data. Therefore, the combination of these case studies extends the AOP knowledge and allows biological pathway analysis in the context of AOPs, by combining experimental data and the molecular knowledge that is captured in KEs of AOPs.

Target audience: Risk assessors, Researchers, Students, Nanosafety community, Regulators, Bioinformaticians
Open access: yes
Licence: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Organisations involved: EwC, UM, IM
Poster
Report
Compute and data federation (Deliverable 2.5)
Evan Floden, Audald Lloret-Villas, Paolo Di Tommaso (CRG), Ola Spjuth (UU), Lucian Farcal (EwC), Tim Dudgeon (IM), Danyel Jennen (UM)
25 Jun 2019
Abstract:
This report details the work involved in the federation of compute and data resources between the OpenRiskNet e-infrastructure and external resources. The reference environment has been designed to be capable of handling the majority of requirements for users’ wishes to deploy and run services. However specific situations demand solutions where either the computation, the data or both reside outside the OpenRiskNet e-infrastructure. This deliverable is related to Tasks 2.7 (Interconnecting virtual environment with external infrastructures) and Tasks 2.8 (Federation between virtual environments). Resource intensive analyses, such as those performed in toxicogenomics, can have CPU, memory or disk requirements that cannot be assumed to be available across all deployment scenarios. Human sequencing data may have restrictions on where it can be processed and the vast quantity of this data often predicates that it is more efficient to “bring the computation to the data”. In achieving Tasks 2.7 and 2.8, we can demonstrate how the virtual environment can utilise external infrastructure including commercial cloud providers and data stores.
Related services:
EGA Beacon
EGA Metadata API

Target audience: Researchers, Data managers, Data owners, Data modellers, Bioinformaticians, Data providers
Open access: yes
Licence: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Organisations involved: EwC, CRG, UM, UU, IM
Report
Webinar recording
Use Nextflow for toxicogenomics-based prediction
Evan Floden (Centre for Genomic Regulation)
3 Jun 2019

Target audience: Researchers, Developers, Data modellers, Bioinformaticians, Software developers
Open access: yes
Licence: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Organisations involved: CRG
Webinar recording
Presentation
Use Nextflow for toxicogenomics-based prediction
Evan Floden (Centre for Genomic Regulation)
27 May 2019
Additional materials:
Slides

Target audience: Researchers, Students, Developers, Data modellers, Bioinformaticians
Open access: yes
Licence: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Organisations involved: CRG
Presentation
Poster
Meta-analysis for genotoxicity prediction using data from multiple human in vitro cell models
Jumamurat R. Bayjanov Jos Kleinjans Danyel Jennen
12 Sep 2018
Abstract:
Whole genome transcriptional profiling allows global measurement of gene expression changes induced by particular experimental conditions. Toxic treatments of biological systems, such as cell models, may perturb interactions among genes and, in toxicogenomics, such perturbations assessed by transcriptional profiling are used to predict impact of toxic compounds. Form early days on, this toxicogenomics-based approach for predicting apical toxicities, has been dedicated to the purpose of improving predictions of genotoxicity and carcinogenicity in vivo. Over the past decade large amounts of transcriptional profiling data have been generated for in vitro study models using various chemical compounds, across different doses and time points as well as different organisms. As part of the H2020 EU project OpenRiskNet, we propose a large-scale integrative analysis approach using these data sets for predicting genotoxicity and carcinogenicity in vivo. From the diXa Data Warehouse, NCBI GEO, and EBI ArrayExpress we collected gene expression data for human in vitro liver cell models exposed to 125 compounds with known genotoxicity information at different time points and dosages resulting in 822 experiments. We analyzed these data sets using ten different classification algorithms, thereby using 80% of the data for training and 20% for testing. Support Vector Machines algorithm had the best accuracy for predicting genotoxicity in vivo at 92.5% with 95% specificity and 87% sensitivity. Upon identifying deregulated gene-gene interaction networks by applying ConsensusPathDB, the top 5 of affected pathways are related to p53-centered pathways. The results from our meta-analysis demonstrate both high accuracy and robustness of transcriptomic profiling of genotoxicity hazards across a large set of genotoxicants and across multiple human liver cell models. We propose that these assays can be used for regulatory purposes, certainly when applied in combination with the traditional genotoxicity in vitro test battery. Next, we want to perform similar analyses on rat and mouse data and identify core orthologous genes among the three different species that are potential predictive targets for assessing genotoxicity and carcinogenicity across different biological systems.

Target audience: Risk assessors, Researchers, Regulators
Open access: yes
Organisations involved: UM
Poster
Presentation
Big Data in Toxicogenomics: Towards FAIR predictions
Danyel Jennen
26 Jul 2018
Additional materials:
Slides

Target audience: Risk assessors, Researchers, Bioinformaticians
Organisations involved: UM
Presentation