Case Study
ModelRX – Modelling for Prediction or Read Across

Summary

A training data set will be obtained from an OpenRiskNet data source. The model has then to be trained with OpenRiskNet modelling tools and the resulting model has to be packaged into a container, documented and ontologically annotated. The model will be validated using OECD guidelines. Finally, a prediction can be run. References (Jennings et al. 2018).

ModelRX

Objectives

The objectives of this case study are: support similarity identification in the DataCure case study (by providing tools for calculating theoretical descriptors of substances), fill gaps in incomplete datasets and use in-silico predictive modelling approaches (read-across, QSAR) to support final risk assessment.

Risk assessment framework

The ModelRX case study contributes in two tiers (Berggren et al. 2017):

  • On the one hand, provides computational methods to support suitability assessment of existing data and identification of analogues (Tier 0);
  • Secondly, it provides predictive modelling functionalities, which are essential in the field of final risk assessment (Tier 2).

Use Cases Associated

The ModelRX case study is associated with UC2 - Building and using a prediction model including the pseudocode.

Technical implementation

From the developer’s perspective, the Services used for this case study are deployed following the general steps that have been agreed for developing the OpenRiskNet infrastructure. Each application is deployed and delivered within containers. Docker is used as the basic engine for the containerisation of the applications. Above that a container orchestration tool Openshift, is used for the management of the containers and the services. Openshift provides many different options for deploying applications. Some recipes and examples have been documented in the OpenRiskNet GitHub page.

When an application is deployed, a service discovery mechanism is responsible for discovering the most suitable services for each application. Based upon the OpenAPI specification each API should be deployed with the swagger definition. This swagger file should then be integrated with the Json-LD annotations as dictated by the json-ld specification. The discovery service mechanism parses the resulting json-ld and resolves the annotations into RDF triplets. These triplets can then be queried with SPARQL. The result of the query lets the user to know which services are responsible for making models or predictions. The documentation can be found via the swagger definition of each application. This way the services are assigned into the OpenRiskNet deployment and can be used and digested from end user applications and other services.

Specific to this case study, the technical implementation is based on rigorously defined modelling APIs based on the OpenTox standards. Modelling APIs need a high level of integration into the OpenRiskNet ecosystem. Integration with the DataCure CS is vital. On the semantic interoperability layer, training datasets should be compatible with an algorithm and prediction datasets should be compatible with a prediction model. Additionally, the generated models and datasets need to be accompanied with semantic metadata on their life cycle, thus enforcing semantic enrichment of the dynamically-created entities. Algorithms, models and predicted datasets are built as services, discoverable by OpenRiskNet discovery service. This is a step that should occur whenever an entity (algorithm, model, predicted dataset) is created.

From the user’s point of view the steps that need to be taken in order to produce a model and use it for predictions are the following:

  1. Selecting a training data set

    The user will be able to choose from a variety of different OpenRiskNet compliant data sets already accessible through the discovery service. A Dataset should include, at a minimum:

    • a dataset URI
    • substances: substance URIs (each substance URI will be associated with a term from the ontology)
    • features:
      • feature URIs (each feature URI will be associated with a term from the ontology)
      • values in numerical format
      • category (experimental/computed)
      • if computed, the URI of the model used to generate the values
      • units
  2. Selecting a (suitable) modelling algorithm

    The user will be able to choose from a list of suitable algorithms. Algorithms should include at a minimum:

    • algorithm URI
    • title
    • description
    • algorithm type (regression/classification)
    • default values for its parameters (where applicable)
  3. Specifying parameters and Generating Predictive model

    Once an algorithm has been selected, the user should define the endpoint, select the tuning parameters, (only if different values from the default ones are desired) and run the algorithm. The generated Model should contain, at a minimum:

    • model URI
    • title
    • description
    • the URI of the dataset that was used to create it
    • the URIs of the input features
    • the URI of the predicted feature
    • values of tuning parameters

    Possible extensions:

    • Include services/APIs for validation of the generated model
    • Provide mechanisms to pick out the best algorithm for a specific dataset (e.g. RRegrs)
    • Include algorithms to calculate domain of applicability

  4. Selecting a prediction data set

    After the creation of a model, the user will be able to select a prediction dataset, which should meet all the requirements specified in (Chomenidis et al. 2017). This dataset will be tested for compatibility against the required features of the model in terms of feature URIs, i.e. the dataset should contain all the subset of features used to produce the model. Additional features are allowed, however they will be ignored.

  5. Running predictive model on the prediction data set

    The predictive model is applied on the prediction dataset to generate the predicted dataset, which should be compatible with the requirements specified in (Chomenidis et al. 2017). The predicted dataset augments the prediction dataset with all necessary information about the predicted feature:

    • prediction feature URIs (each feature URI will be associated with a term from the ontology)
    • values in numerical format
    • category (computed)
    • the URI of the model used to generate the values
    • units

Examples of implementation are annexed below under related resources.

Outcomes

This case study suggests a workflow that produces semantically annotated predictive models that can be shared, tested, validated and eventually applied for predicting adverse effect of substances in a safe by design and/or risk assessment regulatory framework. OpenRiskNet provides the necessary functionalities that allows researchers and practitioners to easily produce and expose their models as ready-to use web applications. The OpenRiskNet e-infrastructure can serve as a central model repository in the area of predictive toxicology. For example, when a research group publishes a predictive model in a scientific journal, they can additionally provide the implementation of the model as a web service using the OpenRiskNet implementation. The produced models contain all the necessary metadata and ontological information to make them easily searchable by the users and define systematically and rigorously their domain of applicability. Most importantly, the produced resources are not just static representation of the models, but actual web applications where the users can supply the necessary information for query substances and receive the predictions for their adverse effects.

Currently available services:

  • Service type: Trained model
  • Generate, store and share predictive statistical and machine learning models
    Service type: Workflow, Application, Visualisation tool, Analysis tool, Processing tool, Trained model, Model generation tool, Model, Data mining tool
  • Generate, store and share predictive statistical and machine learning models
    Service type: Analysis tool, Processing tool, Trained model, Model generation tool, Model, Data mining tool, Service
  • Toxicity predictions
    Service type: Application, Trained model, Service
  • Service type: Trained model
  • Service type: Trained model
  • Service type: Trained model, Model generation tool, Service
  • Interactive computing and workflows sharing
    Service type: Workflow, Visualisation tool, Helper tool, Software, Analysis tool, Processing tool
  • Computation research made simple and reproducible
    Service type: Workflow, Database / data source, Service

Related resources

Report
Case Study description - Modelling for Prediction or Read Across [ModelRX]
28 Jun 2019
Abstract:
A training data set will be obtained from an OpenRiskNet data source. The model has then to be trained with OpenRiskNet modelling tools and the resulting model has to be packaged into a container, documented and ontologically annotated. The model will be validated using OECD guidelines. Finally, a prediction can be run.
Additional materials:
Case Study report

Target audience: OpenRiskNet stakeholders
Open access: yes
Licence: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Organisations involved: NTUA
Report
Webinar recording
Demonstration on OpenRiskNet approach on modelling for prediction or read across (ModelRX case study)
Philip Doganis and Haralambos Sarimveis (National Technical University of Athens, Greece)
25 Jun 2019

Target audience: Risk assessors, Researchers, Data modellers, Bioinformaticians
Open access: yes
Licence: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Organisations involved: NTUA
Webinar recording
Presentation
Demonstration on OpenRiskNet approach on modelling for prediction or read across (ModelRX case study)
Philip Doganis and Haralambos Sarimveis (National Technical University of Athens, Greece)
24 Jun 2019
Additional materials:
Slides

Publisher: OpenRiskNet
Target audience: Risk assessors, Researchers, Students, Data modellers, Bioinformaticians
Open access: yes
Licence: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Organisations involved: NTUA
Presentation
Tutorial
Model RX OpenRiskNet - Case study using Jaqpot web modelling platform
Philip Doganis
15 Oct 2018
Related services:
Jaqpot GUI
Jaqpot API

Target audience: Risk assessors, Researchers, OpenRiskNet stakeholders, Data modellers
Open access: yes
Licence: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Organisations involved: NTUA
Tutorial