Resources & Training
This page contains resources and training materials to support OpenRiskNet users in getting familiar with the services and tools available in the e-infrastructure. On top of tutorials and video demonstrations, you will also find information on our publications (e.g. peer-review articles, presentations, posters) that may help you further in learning about OpenRiskNet concepts and implementations.
Metabolites may well play an important role in adverse effects of parent drug or other xenobiotic compounds. In this case study VU (CS leader), HITeC/HHU (associate partner and implementation challenge winner), JGU, and UU have worked together on making methods and tools available for metabolite and site-of-metabolism (SOM) prediction. For that purpose we integrated and used ligand-based metabolism predictors (e.g. MetPred, enviPath, FAME, SMARTCyp) and we incorporated protein-structure and -dynamics based approaches to predict SOMs by Cytochrome P450 enzymes (P450s). P450s metabolise ~75% of the currently marketed drugs and their active-site shape and plasticity often play an important role in determining the substrate’s SOM. It is expected that this work will be continued after the end of the project to make services available for the prediction of microbial biotransformation pathways by integrating the enviPath data and software developed in part by JGU. During method development, model calibration and validation we used databases such as XMetDB and other open-access databases for drugs, xenobiotics and their respective metabolites. To facilitate the combined use of the metabolite prediction approaches and their outcomes, we benefited of ongoing development in workflow management systems and we made Jupyter Notebooks available to facilitate collection and visualization of results from the different available services. We illustrated the added value of having multiple predictors and our Jupyter notebooks available, in a pilot study on retrospective consensus predictions of known SOMs for drug compounds for which possible metabolite-associated toxicity was previously reported.
In this case study a transcriptomics-based hazard prediction model for identification of specific molecular initiating events (MIE) was foreseen based on (A) top-down and (B) bottom-up approaches. The MIEs can include, but are not limited to: (1) Genotoxicity (p53 activation), (2) Oxidative stress (Nrf2 activation), (3) Endoplasmic Reticulum Stress (unfolded protein response), (4) Dioxin-like activity (AhR receptor activation), (5) HIF1 alpha activation and (6) Nuclear receptor activation (e.g. for endocrine disruption). This case study focussed on two top-down approaches for genotoxicity prediction. The first approach resulted in the creation of a Nextflow-based workflow from the publication “A transcriptomics-based in vitro assay for predicting chemical genotoxicity in vivo” by Magkoufopoulou et al. (2012), thereby reproducing their work as proof of principle. The Nextflow-based workflow has been translated into a more generic approach, especially for step 1, forming the basis of the second top-down approach. In this approach transcriptomics data together with toxicological compound information were collected from multiple toxicogenomics studies and used for building a metadata genotoxicity prediction model.
Transcriptomics data from human, mouse, rat in vitro liver models
This case study will use the approach of the diXa / DECO2 (Cefic-LRI AIMT4) projects to reproduce and extend the results obtained on the identification of hepatotoxicant groups based on similarity in mechanisms of action (omics-based) and chemical structure using services from OpenRiskNet.
The Adverse Outcome Pathway (AOP) concept has been introduced to support risk assessment (Ankley et al., 2010). An AOP is initiated upon exposure to a stressor that causes a Molecular Initiating Event (MIE), followed by a series of Key Events (KEs) on increasing levels of biological organization. Eventually, the chain of KEs ends with the Adverse Outcome (AO), which describes the phenotypic outcome, disease, or the effect on the population. In general, an AOP captures mechanistic knowledge of a sequence of toxicological responses after exposure to a stressor. While starting with molecular information, for example, the initial interaction of a chemical with a cell, the AOPs contain information of downstream responses of the tissue, organ, individual and population. Currently, AOPs are stored in the AOP-Wiki, a collaborative platform to exchange mechanistic toxicological knowledge as a part of the AOP-KB, an initiative by the OECD. Normally, AOP development starts with a thorough literature search for existing knowledge, describing the sequence of KEs that form the AOP. However, the use of AOPs for regulatory purposes also requires detailed validation and linking to existing knowledge (Knapen et al., 2015; Burgdorf et al., 2017). Part of the development of AOPs is the search for data that supports the occurrence and biological plausibility of KEs and their relationships (KERs). This type of data can be found in literature, and increasingly in public databases. The main goal of this case study is to establish the links between AOPs of the AOP-Wiki and experimental data to support a particular AOP. This will allow finding AOPs related to experimental data, and finding data related to a particular AOP.
This case-study demonstrates and documents the use of a web interface to physiologically-based pharmacokinetic models for forward and reverse dosimetry calculations. Forward calculations compute internal concentrations from given exposure doses. Reverse calculations compute exposure doses from internal concentrations or measured biomarker levels (e.g., urine concentration data). The result of those calculations can be used in risk assessments to help with in vitro to in vivo extrapolations or interspecies extrapolations. Three tools have been developed for this case-study at NTUA and have been integrated into the OpenRiskNet infrastructure through the Jaqpot web-based computational platform. More specifically, the popular high-throughput toxicokinetic (httk) R package and the PKSim software tool for whole-body physiologically based pharmacokinetic modeling were integrated, but we also developed infrastructure for developing and deploying user-defined model. For each of these three web tools, simulations are performed and results are presented for reference chemicals or drugs, namely Imazalil for the httk model, Diazepam and Chlorpyrifos for showcasing the In-house R PBPK workflow and Theophylline for the PKSim model. The exposure scenarios chosen are in the range of corresponding environmental or therapeutic levels.
DataCure establishes a process for data curation and annotation that makes use of APIs (eliminating the need for manual file sharing) and semantic annotations for a more systematic and reproducible data curation workflow. In this case study, users are provided with capabilities to allow access to different OpenRiskNet data sources and target specific entries in an automated fashion for the purpose of identifying data and metadata associated with a chemical in general to identify possible areas of concern or for a specific endpoint of interest (Figure 1B). The datasets can be curated using OpenRiskNet workflows developed for this case study and, in this way, cleansed e.g. for their use in model development (Figure 1A). Text mining facilities and workflows are also included for the purposes of data searching, extraction and annotation (Figure 1C). A first step in this process was to define APIs and provide the semantic annotation for selected databases (e.g. FDA datasets, ToxCast/Tox21 and ChEMBL). During the preparation for these use cases, it became clear that the existing ontologies do not cover all requirements of the semantic interoperability layer. Nevertheless, the design of the annotation process as an online or an offline/preprocessing step forms an ancillary part of this case study even though the ontology development and improvement cannot be fully covered by OpenRiskNet and is instead organized as a collaborative activity of the complete chemical and nano risk assessment community.
The ModelRX case study was designed to cover the important area of generating and applying predictive models, and more specifically QSAR models in hazard assessment endorsed by different regulations, as completely in silico alternatives to animal testing and useful also in early research when no data is available for a compound. The QSAR development process schematically presented in Figure 1 begins by obtaining a training data set from an OpenRiskNet data source. A model can then be trained with OpenRiskNet modelling tools and the resulting models are packaged into a container, documented and ontologically annotated. To assure the quality of the models, they are validated using OECD guidelines (Jennings et al. 2018). Prediction for new compounds can be obtained using a specific model or a consensus of predictions of all models. This case study will present this workflow with the example of blood-brain-barrier (BBB) penetration, for which multiple models were generated using tools from OpenRiskNet consortium and associated partners used individually as well as in a consensus approach using Dempster-Shafer theory (Park et al. 2014; Rathman et al. 2018).
Nano-QSAR to predict cytotoxicity of metal and metal oxide nanoparticles
Lazar Toxicity Predictions
JGU WEKA REST Service
In this exercise, the deployment of an OpenRiskNet application (Lazar) was demonstrated using OpenShift Command-Line (OpenRiskNet development Workshop - Exercise C: https://github.com/OpenRiskNet/workshop/blob/master/wp2-deployment-workshop-2019/exercise-c/README.md)
The Final OpenRiskNet Workshop was organised on 23-24 October 2019 in Amsterdam, The Netherlands. The topic of the events was "Creating powerful workflows combining data and software services demonstrated on risk assessment case studies". The workshop was attended by 53 participants, representing all OpenRiskNet stakeholders (scientific, industrial and regulatory communities). This ensured that all relevant and targeted groups that need to be aware of the project achievements have access to this information and are enabled to give feedback, and also be trained on the provided solutions.
Metabolites can play an important role in adverse effects of parent drug (or other xenobiotic) compounds. During the EU-H2020 OpenRiskNet project, several partners (VU Amsterdam, HHU/HITeC Hamburg, Uppsala University, JGU Mainz) have worked together on making methods and tools available within the OpenRiskNet platform for metabolite and site-of-metabolism (SOM) prediction. For that purpose we have integrated ligand-based metabolite predictors (e.g., MetPred, FAME 3, SMARTCyp) and protein-structure and -dynamics based models to predict SOMs of Cytochrome P450 (CYP450) substrates. CYP450s metabolize ~75% of the currently marketed drugs and their active-site shape and plasticity often play an important role in determining the substrate's SOM. To facilitate the combined use of the metabolite prediction approaches and their outcomes, we made Jupyter notebooks available that gather and visualize results from the integrated services. Here we illustrate the possible added value of their combined use in the context of a pilot study on SOM prediction for compounds with known metabolite-associated toxicity. Finally we shortly discuss related work from our laboratory, on predicting Cytochrome P450 binding affinity prediction.
The OpenRiskNet project (https://openrisknet.org/) is funded by the H2020-EINFRA-22-2016 Programme and its main objective is the development of an open e-infrastructure providing data and software resources and services to a variety of industries requiring risk assessment (e.g. chemicals, cosmetic ingredients, pharma or nanotechnologies). We will present the WEKA machine learning services within the infrastructure and how they can be used to solve complex prediction tasks: the prediction of (i) half-life of chemicals under given environmental conditions and of (ii) nanoparticle transport behavior from physicochemical properties. For that purpose, we will reconstruct previous efforts using complex workflows and architectures and simplify the models while maintaining their prediction performance. In both cases, the overall problem (predicting the fate of a compound depending on its properties and external conditions) is modeled as a cascaded prediction model, where the prediction of one model is, with particular attention to validity and performance, entering another model as input. The approach performs well on the half-life data, while the nanoparticle data are too noisy and incomplete to warrant more than the most basic models. Overall, the reconstruction of the two applications within OpenRiskNet provides more evidence for the power and versatility of the framework.
The OpenRiskNet project (https://openrisknet.org/) is funded by the H2020-EINFRA-22-2016 Programme. Here we present how the concept of Adverse Outcome Pathways (AOPs), which captures mechanistic knowledge from a chemical exposure causing a Molecular Initiating Event (MIE), through Key Events (KEs) towards an Adverse Outcome (AO), can be extended with additional knowledge by using tools and data available through the OpenRiskNet e-Infrastructure. This poster describes how the case study of AOPLink, together with DataCure, TGX, and SysGroup, can utilize the AOP framework for knowledge and data integration to support risk assessments. AOPLink involves the integration of knowledge captured in AOPs with additional data sources and experimental data from DataCure. TGX feeds this integration with prediction models of the MIE of such AOPs using either gene expression data or knowledge about stress response pathways. This is complemented by SysGroup, which is about the grouping of chemical compounds based on structural similarity and mode of action based on omics data. Therefore, the combination of these case studies extends the AOP knowledge and allows biological pathway analysis in the context of AOPs, by combining experimental data and the molecular knowledge that is captured in KEs of AOPs.
The OpenRiskNet project (https://openrisknet.org/) is funded by the H2020-EINFRA-22-2016 Programme and its main objective is the development of an open e-infrastructure providing data and software resources and services to a variety of industries requiring risk assessment (e.g. chemicals, cosmetic ingredients, pharma or nanotechnologies). The concept of case studies was followed in order to test and evaluate proposed solutions and is described in https://openrisknet.org/e-infrastructure/development/case-studies/. Two case studies, namely ModelRX and RevK, focus on modelling within risk assessment. The ModelRX – Modelling for Prediction or Read Across case study provides computational methods for predictive modelling and support of existing data suitability assessment. It supports final risk assessment by providing calculations of theoretical descriptors, gap filling of incomplete datasets. computational modelling (QSAR) and predictions of adverse effects. Services are offered through Jaqpot (UI/API), JGU WEKA (API), Lazar (UI) and Jupyter & Squonk Notebooks. In the RevK – Reverse dosimetry and PBPK prediction case study, physiologically based pharmacokinetic (PBPK) models are made accessible for the purpose of risk assessment-relevant scenarios. The PKSim software, the httk R package and custom-made PBPK models have been integrated. RevK offers services through Jaqpot (UI/API).
Nano-QSAR to predict cytotoxicity of metal and metal oxide nanoparticles
Lazar Toxicity Predictions
OpenRiskNet (https://openrisknet.org/) is a 3-year project funded by the EU within Horizon 2020 EINFRA-22-2016 Programme, with the main objective to develop an open e-infrastructure providing data and software resources and services to a variety of industries requiring risk assessment (e.g. chemicals, cosmetic ingredients, pharma or nanotechnologies). The infrastructure is built on virtual research environments (VREs), which can be deployed to workstations as well as public and in-house cloud infrastructures. Services providing data, data analysis, modelling and simulation tools for risk assessment are integrated into the e-infrastructure and can be combined into workflows using harmonised and interoperable application programming interfaces (APIs) (https://openrisknet.org/e-infrastructure/services/). For complete risk assessment and safe-by-design studies, OpenRiskNet e-infrastructure functionality is combined via a variety of incorporated services demonstrated within a set of case studies (see figure 1). The case studies present real-world settings such as data curation, systems biology approaches for grouping compounds, read-across applications using chemical and biological similarity, and identification of areas of concern based only on alternative methods (non-animal testing) approaches. OpenRiskNet is working with a network of partners, organised within an Associated Partners Programme, aiming to strengthen the working ties to other organisations developing relevant solutions or tools.
BridgeDb identifier mapping service
AOP-Wiki SPARQL Endpoint
The Adverse Outcome Pathway Database (AOP-DB)
BridgeDb identifier mapping service
AOP-Wiki SPARQL Endpoint
The Adverse Outcome Pathway Database (AOP-DB)
This report details the work involved in the federation of compute and data resources between the OpenRiskNet e-infrastructure and external resources. The reference environment has been designed to be capable of handling the majority of requirements for users’ wishes to deploy and run services. However specific situations demand solutions where either the computation, the data or both reside outside the OpenRiskNet e-infrastructure. This deliverable is related to Tasks 2.7 (Interconnecting virtual environment with external infrastructures) and Tasks 2.8 (Federation between virtual environments). Resource intensive analyses, such as those performed in toxicogenomics, can have CPU, memory or disk requirements that cannot be assumed to be available across all deployment scenarios. Human sequencing data may have restrictions on where it can be processed and the vast quantity of this data often predicates that it is more efficient to “bring the computation to the data”. In achieving Tasks 2.7 and 2.8, we can demonstrate how the virtual environment can utilise external infrastructure including commercial cloud providers and data stores.
OpenRiskNet’s work on case studies is entering the final phase with the focus on demonstrating infrastructure capabilities and on testing different risk assessment scenarios. The case studies provide examples and prototypes for solutions provided to the predictive toxicology and risk assessment community and demonstrate the usage of the developed APIs and the interoperability features to build integrated workflows. • DataCure - Data curation and creation of pre-reasoned datasets and searching; • ModelRX - Modelling for Prediction or Read Across; • SysGroup - A systems biology approach for grouping compounds; • MetaP- Metabolism Prediction; • AOPLink - Identification and Linking of Data related to AOPWiki; • TGX - Toxicogenomics-based prediction and mechanism identification; • RevK - Reverse dosimetry and PBPK prediction. These cases also demonstrate how OpenRiskNet offers customised approaches for the different stakeholder groups (e.g., researchers, risk assessors and regulators) and provides fit-for-purpose services and solutions to real-world applications (e.g., systems biology approaches for grouping compounds; read-across applications using chemical and biological similarity).
OpenRiskNet (https://openrisknet.org/) is a 3-year project funded by the EU within Horizon 2020 EINFRA-22-2016 Programme, with the main objective to develop an open e-infrastructure providing resources and services to a variety of industries requiring risk assessment (e.g. chemicals, cosmetic ingredients, drugs or nanotechnologies). OpenRiskNet is working with a network of partners, organised within an Associated Partners Programme, aiming to strengthen the working ties between the OpenRiskNet members and other organisations developing relevant solutions or tools within the scientific community. The infrastructure is built on virtual research environments (VREs), which can be deployed to workstations as well as public and in-house cloud infrastructures. Services providing data, data analysis, modelling and simulation tools for risk assessment are integrated into the e-infrastructure and can be combined into workflows using harmonised and interoperable application programming interfaces (APIs) (https://openrisknet.org/e-infrastructure/services/). For complete risk assessment and safe-by-design studies, data and tools from different areas have to be available, thus the OpenRiskNet e-infrastructure functionality is defined by a variety of incorporated services demonstrated within a set of case studies. The case studies present real-world settings such as data curation, systems biology approaches for grouping compounds, read-across applications using chemical and biological similarity, and identification of areas of concern based only on alternative methods (non-animal testing) approaches. OpenRiskNet resources (training materials, publications, reports, webinar recordings, etc.) are publicly available in the project's library (https://openrisknet.org/library/). Also, OpenRiskNet is listed in EOSC and eInfraCentral catalogues, and is sharing its resources in OpenAIRE (via Zenodo), TeSS (ELIXIR's Training Portal), EU NanoSafety Cluster, e-IRG Knowledge Base and other scientific communities.
The aim of this study is to benchmark two Bayesian software tools, namely Stan and GNU MCSim, that use different Markov chain Monte Carlo (MCMC) methods for the estimation of physiologically based pharmacokinetic (PBPK) model parameters. The software tools were applied and compared on the problem of updating the parameters of a Diazepam PBPK model, using time-concentration human data. Both tools produced very good fits at the individual and population levels, despite the fact that GNU MCSim is not able to consider multivariate distributions. Stan outperformed GNU MCSim in sampling efficiency, due to its almost uncorrelated sampling. However, GNU MCSim exhibited much faster convergence and performed better in terms of effective samples produced per unit of time.
BridgeDb identifier mapping service
AOP-DB SPARQL Endpoint
AOP-Wiki SPARQL Endpoint
EdelweissData serving ToxCast, ToxRefDB and TG-GATEs data
This report describes the status of the service integration including numbers of active services provided by the consortium, associated partners and other third parties. The work described in this report addresses all areas and tasks of WP4 (i.e. Toxicology, Chemical Properties and Bioassay Databases, Omics Databases, Knowledge Bases and Data Mining, Ontology Services, Processing and Analysis, Predictive Toxicology, Workflows, Visualisation and Reporting). Due to their importance for service integration, we also reference to work performed in WP1 on case studies and in WP2 on e-infrastructure interoperability and deployment.
This document reports the work on the final API specification for semantic interoperability that was developed as part of the OpenRiskNet e-infrastructure. It briefly outlines the challenges encountered and the solution that has been implemented and is now in use in OpenRiskNet. This deliverable is related to Task 2.2 (API specification and semantic interoperability) and in a continuation of the work performed within Deliverable 2.2 (Initial API version provided to providers of services), that are now in the process of being incorporated into OpenRiskNet as part of the service catalogue. Deliverable 2.2 gives in-depth information on the evaluation of the various technologies for describing APIs we considered, whereas this report focuses on the solution that was finally chosen and put in place, and the justification of this choice to support other databases / e-infrastructures in their deliberations on API solutions.
1. Introduction With the ever-growing number of chemicals that require toxicological risk assessment, there is a need for faster, more efficient use of existing data to assemble effective assessment strategies . Therefore, the concept of Adverse Outcome Pathways (AOPs) was introduced , a framework to organize existing mechanistic information about toxicological processes into a chain of smaller pieces of knowledge, called Key Events (KEs). These allow the structuring of toxicological knowledge and reduce the effort needed to capture all information before performing risk assessment [2, 3]. In order to facilitate a community effort in gathering toxicological knowledge, the AOP-Wiki was created by the European Commission JRC and the US EPA. To integrate this knowledge base more easily with other resources, we explored the use of semantic web technologies to link AOP-Wiki with other chemical and biological databases. 2. Approach The AOP-Wiki provides quarterly permanent downloads for the full database XML (https://aopwiki.org/downloads/). We parsed the AOP-Wiki knowledge with Python 3.5 and the ElementTree XML API and converted it into a semantic web RDF format, which allows for accurate description with ontological annotations, including the AOPO, CHEMINF, and Dublin Core. Chemical compounds are identified in the AOP-Wiki with CAS numbers and biological processes with a variety of ontologies, e.g. GO, Mammalian Phenotype Ontology, and Molecular Interactions ontology. These annotations are used to create Internationalized Resource Identifiers. To integrate and test the RDF, a variety of federated SPARQL queries were written and executed in Blazegraph (build version 2.1.4). 3. Results We created an AOP-Wiki RDF scheme and converted the XML into Turtle syntax. The RDF was tested with a variety of SPARQL queries to answer biological question relevant to risk assessment, such as: - What measurement / test-method information is available for a given AOP? - Which of the stressor chemicals on the AOP-Wiki can be linked molecular pathways on WikiPathways? 4. Discussion The RDF transformation of AOP-Wiki content can assist in the accessibility and expansion of toxicological knowledge by allowing semantic interoperability. The created RDF of the AOP-Wiki allows the querying and providing of additional information for stressor chemicals, genes, and proteins involved in KEs, the underlying molecular pathways, but also for the applicability of AOPs by cell types or species. This semantic approach allows novel ways to explore the rapidly growing AOP knowledge with every new publication related to toxicological studies. There is work in progress on a Virtuoso SPARQL endpoint Docker image to simplify the use of the data, and integrate the database in the OpenRiskNet e-infrastructure to provide AOP knowledge useful for automated risk assessment workflows. Funding This project has received funding from the European Union’s Horizon 2020 (EU 2020) research and innovation program under grant agreement no. 681002 (EU-ToxRisk) and EINFRA-22-2016 program under grant agreement no. 731075 (OpenRiskNet).
To ensure the usability of the infrastructure, alignment with the community needs as well as pursuing complete coverage of important tools incorporated into the e-infrastructure and available to Users, OpenRiskNet will work with a continuously expanding network of partners. To make the interaction with the OpenRiskNet consortium, to give feedback and to contribute to the developments and case study work as simple as possible but also give the opportunity to share confidential information, foster collaborative developments between consortium and associated partners and even allow integration of tools as in-kind work, three different ways to associate with the project as a third party are available: 1) testing the infrastructure as a early adopter and give feedback and state requirements via the online surveys (see also updated deliverable report D1.1), 2) become an official associated partner allowing the exchange of confidential information like source code with examples of service integration and 3) applying to the implementation challenge, which will provide additionally to the technical support also financial support for service integration. This updated deliverable report describes these options organised in the Associated Partner Programme.
This report describes the first updated version of the data management plan (DMP) for the OpenRiskNet e-infrastructure projects. The current DMP covers the general aspects of the OpenRiskNet data management based on the FAIR (findable, accessible, interoperable and reusable) guidelines, ethics considerations for re-sharing of public datasets and the first examples of shared data sources including diXa, BridgeDb, WikiPathways, AOP-Wiki and ToxCast/Tox21. More specific data source and clearly-defined measures will be added in parallel to their integration into the infrastructure, which will follow the time plan enforced by the case study requirements on data availability.
OpenRiskNet is developing an e-infrastructure for predictive toxicology and safety assessment (of chemicals, pharmaceuticals, nanomaterials etc.) in the form of virtual research environments (VREs) and data, analysis and modelling services coming from the consortium but also from third parties, which can be deployed to these. This deliverable describes the support functions set up and how they were customised for specific user groups. Due to its specific purpose of being an infrastructure where public and commercial services are offered to the community, OpenRiskNet services different types of users and roles, which can be grouped into: End users (e.g. members of academia, industry, and regulatory agencies) who are defined as users who log into an OpenRiskNet VRE and consume one or more services or applications; Developers who are involved in developing or setting up parts of the OpenRiskNet VRE, such as middleware, frameworks, data, or tools packaged as services or applications; in the latter case also referred to as tool provider; System administrators, who are in charge of creating and managing the OpenShift environment for the VRE and deploying the basic services.
This report describes the results obtained from the survey, interviews and interactions with associated partners and project stakeholders as part of the requirements analysis. The requirements analysis included surveys sent out to a large number of experts and designed to address issues relevant to: End users (e.g. members of academia, industry and regulatory agencies), and Developers (tools developers, infrastructure provider and data managers.
Prioritization of variants in personal genomic data is a major challenge. Recently, computational methods that rely on comparing phenotype similarity have shown to be useful to identify causative variants. In these methods, pathogenicity prediction is combined with a semantic similarity measure to prioritize not only variants that are likely to be dysfunctional but those that are likely involved in the pathogenesis of a patient’s phenotype. We have developed DeepPVP, a variant prioritization method that combined automated inference with deep neural networks to identify the likely causative variants in whole exome or whole genome sequence data. We demonstrate that DeepPVP performs significantly better than existing methods, including phenotype-based methods that use similar features. DeepPVP is freely available at https://github.com/bio-ontology-research-group/phenomenet-vp. DeepPVP further improves on existing variant prioritization methods both in terms of speed as well as accuracy.
A paradigm shift is taking place in risk assessment to replace animal models, reduce the number of economic resources, and refine the methodologies to test the growing number of chemicals and nanomaterials. Therefore, approaches such as transcriptomics, proteomics, and metabolomics have become valuable tools in toxicological research, and are finding their way into regulatory toxicity. One promising framework to bridge the gap between the molecular-level measurements and risk assessment is the concept of Adverse Outcome Pathways (AOPs). These pathways comprise mechanistic knowledge and connect biological events from a molecular level towards an adverse effect outcome after exposure to a chemical. However, the implementation of omics-based approaches in the AOPs and their acceptance by the risk assessment community is still a challenge.
Tis report describes the status of selection of services of high priority for the OpenRiskNet infrastructure and their integration including active services provided by the consortium, associated partners and other third parties.
This report cover the possible risks with respect to security, privacy and re-identification of personal data as well as present the private by design risk management concept.
This report summarises the management process adopted within the OpenRiskNet project. This process envisaged the implementation of best project management practices to ensure the effective execution of the work plan, tracking and documentation of task progress, an effective communication between partners on technical and administrative matters, as well as the communication with the EC office and external stakeholders.
This report describes the first documentation of the core OpenRisknet e-infrastructure with examples of an initial development status implementation. This report forms part of this documentation, along with other parts located in the OpenRiskNet GitHub repository. This documentation describes the creation of the e-infrastructure and the deployment of the first partner application to the e-infrastructure.
This document reports the work towards the first version of the OpenRiskNet application programming interfaces (APIs) to be released to all partners of the consortium and associated partners for feedback and usage. Based on the diversity of the requirement foreseeable when developing the case studies to validate the infrastructure with real-world applications across all areas of predictive toxicology and risk assessment, a bottom-up approach to start with existing APIs and then harmonize them and bring them collectively to higher levels by integrating richer scientific annotation (semantic interoperability layer) was adopted in contrast to a top-down approach, where the API specification is defined by the consortium first and then all services have to be changed to comply to this specification.
This report describes the dissemination and training activities undertaken by the OpenRiskNet partners in the first half of the project. These activities were gathered formally within WP3 (Training, Support and Dissemination) but they cover aspects related to all WPs. Details and links to the various activities developed in the first half of the project were included, e.g. organisation of workshops, training events and hackathons, participation at conferences and workshops, peer-review publications, tutorials, public communication activities, the project website and visual identity development as well as the interactions initiated with other EU initiatives. The main dissemination activities related to OpenRiskNet are summarised on the project website at: https://openrisknet.org/library/. The report, follows the agreed Plan for the Exploitation and Dissemination of Results (PEDR).
This report documents the Demonstrator for the Deliverable 2.3, describing the deployment of virtual infrastructure and applications making up the OpenRiskNet Virtual Research Environment (VRE). It outlines the system analysis, deployment fundamentals, service discovery, and a list of the currently available services. The production reference instance is deployed on the Swedish Science Cloud (SSC), and end user access is available at https://home.prod.openrisknet.org.
This Deliverable describes the computational infrastructure, frameworks and systems for the development and testing of the Virtual Research Environments, APIs and data and software integration of the OpenRiskNet project. Development tools selected for source code control, issue tracking, continuous integration and deployment, and containerization and container orchestration are discussed and guideline for service development are outlined. Finally the current development environment is described and its operation is demonstrated with a simple example.
OpenRiskNet case studies are used to test and evaluate solutions provided by the project to the predictive toxicology and risk assessment community, especially regarding the usability of the developed Application Programming Interfaces (APIs) and the interoperability layer. These case studies will demonstrate the capabilities to satisfy the requirements of the different stakeholder groups, including researchers, risk assessors and regulators and present real-world applications such as systems biology approaches for grouping compounds, read-across applications using chemical and biological similarity, and identifying areas of concern based on in vitro and in silico approaches for compounds lacking any previous knowledge from animal experiments (ab initio case).
Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only providing point predictions. We here describe the use of conformal prediction for predicting off-target interactions with models trained on data from 31 targets in the ExCAPE dataset, selected for their utility in broad early hazard assessment. Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. The full pre-processing and model training process is openly available as scientific workflows on GitHub, rendering it fully reproducible. We illustrate the usefulness of the methodology on a set of compounds extracted from DrugBank. The resulting models are published online and are available via a graphical web interface and an OpenAPI interface for programmatic access.
Case studies are used to test and evaluate the solutions provided by OpenRiskNet to the predictive toxicology and risk assessment community especially regarding the usability of the developed APIs and the interoperability layer. Associated services are demonstrated. Within the implementation challenge, external tools, especially in areas of risk assessment not completely covered by the OpenRiskNet consortium are selected for prioritized integration partial financially and strongly technically supported by researchers of OpenRiskNet partners.
Case studies are used to test and evaluate the solutions provided by OpenRiskNet to the predictive toxicology and risk assessment community especially regarding the usability of the developed APIs and the interoperability layer.
The main concept of the OpenRiskNet infrastructure are virtual research environments (VRE) integrating data, analysis, modelling and simulation services for all areas of risk assessment, which can be deployed to workstations as well as public and in-house cloud infrastructures.
OpenRiskNet is a 3 year project funded by the European Commission within Horizon2020 EINFRA-22-2016 Programme (Grant Agreement 731075). Full project name: Open e-Infrastructure to Support Data Sharing, Knowledge Integration and in silico Analysis and Modelling in Predictive Toxicology and Risk Assessment. The main objective is to develop an open e-Infrastructure providing resources and services to a variety of communities requiring risk assessment, including chemicals, cosmetic ingredients, therapeutic agents and nanomaterials. OpenRiskNet is working with a network of partners, organized within an Associated Partners Programme.
This document is addressed to OpenRiskNet and associated partners. Version 1 contains the following topics: How do I access the admin interface of the OpenRiskNet website?, How do I add and describe a new publication, training material or other resources?, How do I add and describe my services? and How do I add and describe a new event?
Nano-QSAR to predict cytotoxicity of metal and metal oxide nanoparticles
This workshop aims at providing the users with experience on using the OpenRiskNet infrastructure for risk assessment. Users investigate a problem from nanosafety, studying a panel of nanoparticles (including dry powders of oxides of titanium, zinc, cerium and silicon, dry powders of silvers, suspensions of polystyrene latex beads and dry particles of carbon black, nanotubes and fullerene, as well as diesel exhaust particles). The OpenRiskNet infrastructure is employed to generate models on the toxicity of studied nanoparticles, more specifically the Jaqpot platform for modelling, available at http://www.jaqpot.org. Event pages available at http://www.opentox.net/events/opentox-euro-2018/s6 .
OpenRiskNet (https://openrisknet.org/) is an EU funded infrastructure project with the main objective to develop an open e-infrastructure providing resources and services to a variety of industries requiring risk assessment, including chemicals, cosmetic ingredients, drugs and nanomaterials. The OpenRiskNet approach is to work on different case studies to test and evaluate requirements to overcome the fragmentation of data and tools and to provide solutions improving the harmonization of data, usability and interoperability of application programming interfaces (APIs) and the deployment into virtual infrastructure. The cases present real-world settings such as systems biology approaches for grouping compounds, read-across applications using chemical and biological similarity, and identifying areas of concern based only on alternative methods approaches. We discuss our progress on the OpenRiskNet goal of harmonizing data and metadata within APIs that can be added to already existing analysis and modeling services and data warehouses. We also demonstrate how these APIs can easily be used towards the generation of full risk assessment workflows either using scripting languages or workflow managers. Finally, we show the first approaches to make these APIs semantically rich by annotating data with human- and computer-readable data schemata. OpenRiskNet has initiated the Associated Partners Programme strengthening the working ties between the OpenRiskNet members and other organizations within the scientific community.
The e-infrastructure project OpenRiskNet developing a platform providing data and modelling tools for predictive toxicology and risk assessment, is entering its second stage, in which the platform is made accessible to everyone. In the first phase, we developed advanced concepts and implemented these into the first version of the platform including building and deploying of virtual research environments (VREs) and integrating the first services for different task in risk assessment accessible by everyone for testing. The platform includes harmonised and partly semantically annotated data and modelling services, corresponding training material as well as seven risk assessment case studies, which are used to evaluate and optimize the infrastructure.
Aim: understanding the use, form, inputs and outputs of physiologically based (PBPK) pharmacokinetic models. Presentation of software applications for developing PBPK models. Customising PBPK to individual time-drug concentration data. Creating optimal drug dosage regimens
An increasing number of disorders have been identified for which two or more distinct alleles in two or more genes are required to either cause the disease or to significantly modify its onset, severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of alleles underlying digenic and oligogenic diseases in individual whole exome or whole genome sequences. Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical or non-human model organism research can provide useful information and improve variant prioritization for genetic diseases. Additional background knowledge about interactions between genes can be utilized to identify sets of variants in different genes in the same individual which may then contribute to the overall disease phenotype. We have developed OligoPVP, an algorithm that can be used to prioritize causative combinations of variants in digenic and oligogenic diseases, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods in the case of digenic diseases. Our results show that OligoPVP can efficiently prioritize sets of variants in digenic diseases using a phenotype-driven approach and identify etiologically important variants in whole genomes. OligoPVP naturally extends to oligogenic disease involving interactions between variants in two or more genes. It can be applied to the identification of multiple interacting candidate variants contributing to phenotype, where the action of modifier genes is suspected from pedigree analysis or failure of traditional causative variant identification. Figures (1)
Example workflow based on OpenRiskNet tools - Pathway identification workflow related to DataCure and AOPlink case studies. This notebook downloads TG-Gates data of 4 compounds and selects genes overexpressed in all sample. The Affymetrix probe sets are then translated into Ensembl gene identifiers using the BridgeDB service and pathways associated with the genes are identified using the WikiPathways service.
BridgeDb identifier mapping service
EdelweissData serving ToxCast, ToxRefDB and TG-GATEs data
Whole genome transcriptional profiling allows global measurement of gene expression changes induced by particular experimental conditions. Toxic treatments of biological systems, such as cell models, may perturb interactions among genes and, in toxicogenomics, such perturbations assessed by transcriptional profiling are used to predict impact of toxic compounds. Form early days on, this toxicogenomics-based approach for predicting apical toxicities, has been dedicated to the purpose of improving predictions of genotoxicity and carcinogenicity in vivo. Over the past decade large amounts of transcriptional profiling data have been generated for in vitro study models using various chemical compounds, across different doses and time points as well as different organisms. As part of the H2020 EU project OpenRiskNet, we propose a large-scale integrative analysis approach using these data sets for predicting genotoxicity and carcinogenicity in vivo. From the diXa Data Warehouse, NCBI GEO, and EBI ArrayExpress we collected gene expression data for human in vitro liver cell models exposed to 125 compounds with known genotoxicity information at different time points and dosages resulting in 822 experiments. We analyzed these data sets using ten different classification algorithms, thereby using 80% of the data for training and 20% for testing. Support Vector Machines algorithm had the best accuracy for predicting genotoxicity in vivo at 92.5% with 95% specificity and 87% sensitivity. Upon identifying deregulated gene-gene interaction networks by applying ConsensusPathDB, the top 5 of affected pathways are related to p53-centered pathways. The results from our meta-analysis demonstrate both high accuracy and robustness of transcriptomic profiling of genotoxicity hazards across a large set of genotoxicants and across multiple human liver cell models. We propose that these assays can be used for regulatory purposes, certainly when applied in combination with the traditional genotoxicity in vitro test battery. Next, we want to perform similar analyses on rat and mouse data and identify core orthologous genes among the three different species that are potential predictive targets for assessing genotoxicity and carcinogenicity across different biological systems.
Get familiar with the eNanoMapper solutions for data management and data access.
In the last decade, omics-based approaches such as transcriptomics, proteomics and metabolomics have become valuable tools in toxicological research, and are finding their way into regulatory toxicity. A promising framework to bridge the gap between the molecular-level measurements and risk assessment is the concept of Adverse Outcome Pathways (AOPs). These pathways comprise mechanistic knowledge and connect biological events from a molecular level towards an adverse effect after exposure to a chemical or nanomaterial. However, the implementation of omics-based approaches in the AOPs and acceptance by the risk assessment community is still a challenge. Therefore, tools are required for omics-based data analysis and visualization, and to link the data to the traditional AOPs. Here we show how WikiPathways, an open science pathway database, can serve as a viable tool for this purpose. Therefore, an AOP Portal (aop.wikipathways.org)has been created with a rapidly growing collection of molecular-level AOPs on which omics datasets can be mapped an analyzed, currently consisting of 15 pathways by 14 authors that are structured in various ways. Besides that, we are making WikiPathways more interoperable with aopwiki.org, the main knowledge-base that collects and stores AOPs. The open and collaborative nature makes WikiPathways a fast growing platform that is applicable in a wide range of biomedical research fields in which omics-based approaches are used. Also, its use of ontologies, OpenAPI documentation and FAIR (Findable, Accessible, Interoperable, Reusable) approaches makes WikiPathways interoperable with many other data sources. By introducing AOPs in WikiPathways and linking these with the AOPs in aopwiki.org, we aimed to make WikiPathways a useful tool for the regulatory toxicity community and for toxicological research in general. Eventually this could lead to implementation of WikiPathways as a data-source for decision-making in REACH (Registration, Evaluation, Authorization, and restriction of Chemicals) dossiers for risk assessment of chemicals. This project has received funding from the European Union’s Horizon 2020 research and innovation programme project EU-ToxRisk under grant agreement No. 681002 and EINFRA-22-2016 programme project OpenRiskNet under grant agreement No. 731075.
Containers are gaining popularity in life science research as they encompass all dependencies of provisioned tools and simplifies software installations for end users, as well as offering a form of isolation between processes. Scientific workflows are ideal to chain containers into data analysis pipelines to sustain reproducible science. In this manuscript we review the different approaches to use containers inside the workflow tools Nextflow, Galaxy, Pachyderm, Luigi, and SciPipe when deployed in cloud environments. A particular focus is placed on the workflow tool’s interaction with the Kubernetes container orchestration framework.
Transcriptomics data from human, mouse, rat in vitro liver models
The eNM ontology might be accessed through three different ways, namely online via BioPortal and AberOWL or locally using the open-source Protégé software. This tutorial focusses on browsing through the eNM ontology when one would be interested in finding a Unique Resource Identifier (URI) for mapping a term originating from for example a database schema. Using URIs for database schemas will facilitate the harmonization of data originating from different sources and will make them more comparable.
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of Q2=0.973 and with the best performing nonconformity measure having median prediction interval of ± 0.39 log units at 80% confidence and ± 0.60 log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.
cpLogD confidence predictor for logD
This ontology describes how terms can be added to the eNanoMapper ontology.
OpenRiskNet (https://openrisknet.org/) is an EU funded infrastructure project with the main objective to develop an open e-infrastructure providing resources and services to a variety of industries requiring risk assessment, including chemicals, cosmetic ingredients, drugs and nanomaterials. The OpenRiskNet approach is to work on different case studies to test and evaluate requirements to overcome the fragmentation of data and tools and to provide solutions improving the harmonization of data, usability and interoperability of application programming interfaces (APIs) and the deployment into virtual infrastructure. The cases present real-world settings such as systems biology approaches for grouping compounds, read-across applications using chemical and biological similarity, and identifying areas of concern based only on alternative methods approaches.
The “OpenRiskNet: Open e-Infrastructure to Support Data Sharing, Knowledge Integration and in silico Analysis and Modelling in Risk Assessment” project, funded by the European Commission within Horizon2020 Programme, intends to improve the harmonization and interoperability (i.e. the ability to work together, be combined to workflows and transfer data between each other without manual intervention) of data and software tools used in predictive toxicology and risk assessment. This includes (1) in vivo, in vitro and in silico data and derived knowledge sources, (2) the processing, analysis, modeling and visualization services, in order to facilitate their easy discovery, sharing and usage. The vision is that knowledge management, including a better documentation and reproducibility, will lead finally to validated risk assessment approaches for safe-by-design studies and regulatory setting supporting the goals of replacement, reduction and refinement (3R). The approach taken by OpenRiskNet to tackle these challenges is based on the development of a semantic interoperability layer responsible for the communication between the user and the services or between two services. This semantic layer will provide detailed annotations on (1) the scientific background and the meaningful usage of the services and their limitations, (2) the required input as well as (3) the obtainable results including possible output formats and standards used. In order to generate the technical solutions for this layer, some domain specific standards have first to be developed to describe the data sources and computer services in this metadata-rich fashion. This first part of a multi-part post is intended to provide background information on the current status of data sharing efforts and guidelines enforced by funding agencies and publishers’ and describes the problems we are additionally facing with respect to the interoperability layer. Additional posts will follow whenever we have at least partly solved a problem.
OpenRiskNet, a pan-European consortium funded by Horizon 2020 to develop open e-infrastructure for predictive toxicology and risk assessment, has demonstrated the results on approaches for harmonised application programming interfaces and semantic interoperability during training sessions and scientific presentations at the OpenTox Euro 2017 Conference in Basel, November 21-23 2017. Prior to these activities, the annual consortium meeting was held on 20 November to discuss the progress after twelve months in the project and plan the next steps. Moreover, the 11 European partners that make up OpenRiskNet are now launching an Associate Partner Programme to build global reach into its work.
Besides new and improved in vitro methods, in silico plays a major role in the endeavor to reach the 3R goals of replacement, reduction and refinement in both toxicodynamics and biokinetics and especially in the interplay with the first mentioned methods in forming integrated approaches to testing and assessment (IATA) and integrated testing strategies (ITS). However, for the most efficient usage of these methods and making them available to all stakeholder, these tools need to be available and accessible not necessarily in an open (not in the sense of free of charge but of open, interpretable and reproducible science), harmonized and interoperable way. This session will present international approaches and platforms providing discovery services and repositories for predictive-toxicology and risk-assessment software solutions and related disciplines, activities to harmonize the description, access and usage of these and ways to combine them into workflows for more complex analysis and modeling task.
OpenRiskNet e-infrastructure aims to provide resources and services to a variety of communities requiring risk assessment, including the NanoSafety Cluster (NSC) as a primary target community customer. Specific needs identified by the nanosafety community will be addressed and defined based on NSC projects requirements, hence identifying the key areas where the OpenRiskNet infrastructure can be deployed and tested. The possibilities to incorporate data and tools developed by other projects and to combine with other type of resources will be evaluated. Alignment and interoperability with the nano-specific ontology, protocols and templates, as initially developed under eNanoMapper, will also be pursued.
Our world is awash with chemicals, many of which still need risk and safety testing. The Horizon 2020 project “OpenRiskNet” (Open e-Infrastructure to Support Data Sharing, Knowledge Integration and in silico Analysis and Modelling in Risk Assessment) aims to provide a global infrastructure and network to integrate and harmonise data from experiments and computer models of toxicology.
Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.
Open e-Infrastructure to Support Data Sharing, Knowledge Integration and in silico Analysis and Modelling in Risk Assessment OpenRiskNet is a 3 year project funded under the Horizon 2020 EINFRA-22-2016 Programme (Project Number 731075; start date 1 December 2016). The main objective is to provide an open e-Infrastructure providing resources and services to a variety of communities requiring risk assessment, including chemicals, cosmetic ingredients, therapeutic agents and nanomaterials. OpenRiskNet will work with a network of partners, organized within an Associated Partners Programme. One of the OpenRiskNet case studies will address specific needs identified by the nanosafety community. The case study will be defined based on project partners’ experience in NanoEHS projects and activities within NanoSafety Cluster (NSC) working groups and task forces. Interactions with nanosafety projects have already been established in order to identify the key questions to be addressed, and where the OpenRiskNet infrastructure could be deployed and tested.
OpenRiskNet kicked off at Technology Park in Basel, Switzerland (on 15 and 16 December 2016). All partners were present on this two-day event.