ModelRX – Modelling for Prediction or Read Across
A training data set will be obtained from an OpenRiskNet data source. The model has then to be trained with OpenRiskNet modelling tools and the resulting model has to be packaged into a container, documented and ontologically annotated. The model will be validated using OECD guidelines. Finally, a prediction can be run. References (Jennings et al. 2018).
The objectives of this case study are: support similarity identification in the DataCure case study (by providing tools for calculating theoretical descriptors of substances), fill gaps in incomplete datasets and use in-silico predictive modelling approaches (read-across, QSAR) to support final risk assessment.
Risk assessment framework
The ModelRX case study contributes in two tiers (Berggren et al. 2017):
- On the one hand, provides computational methods to support suitability assessment of existing data and identification of analogues (Tier 0);
- Secondly, it provides predictive modelling functionalities, which are essential in the field of final risk assessment (Tier 2).
Use Cases Associated
From the developer’s perspective, the Services used for this case study are deployed following the general steps that have been agreed for developing the OpenRiskNet infrastructure. Each application is deployed and delivered within containers. Docker is used as the basic engine for the containerisation of the applications. Above that a container orchestration tool Openshift, is used for the management of the containers and the services. Openshift provides many different options for deploying applications. Some recipes and examples have been documented in the OpenRiskNet GitHub page.
When an application is deployed, a service discovery mechanism is responsible for discovering the most suitable services for each application. Based upon the OpenAPI specification each API should be deployed with the swagger definition. This swagger file should then be integrated with the Json-LD annotations as dictated by the json-ld specification. The discovery service mechanism parses the resulting json-ld and resolves the annotations into RDF triplets. These triplets can then be queried with SPARQL. The result of the query lets the user to know which services are responsible for making models or predictions. The documentation can be found via the swagger definition of each application. This way the services are assigned into the OpenRiskNet deployment and can be used and digested from end user applications and other services.
Specific to this case study, the technical implementation is based on rigorously defined modelling APIs based on the OpenTox standards. Modelling APIs need a high level of integration into the OpenRiskNet ecosystem. Integration with the DataCure CS is vital. On the semantic interoperability layer, training datasets should be compatible with an algorithm and prediction datasets should be compatible with a prediction model. Additionally, the generated models and datasets need to be accompanied with semantic metadata on their life cycle, thus enforcing semantic enrichment of the dynamically-created entities. Algorithms, models and predicted datasets are built as services, discoverable by OpenRiskNet discovery service. This is a step that should occur whenever an entity (algorithm, model, predicted dataset) is created.
From the user’s point of view the steps that need to be taken in order to produce a model and use it for predictions are the following:
- Selecting a training data set
The user will be able to choose from a variety of different OpenRiskNet compliant data sets already accessible through the discovery service. A Dataset should include, at a minimum:
- a dataset URI
- substances: substance URIs (each substance URI will be associated with a term from the ontology)
- feature URIs (each feature URI will be associated with a term from the ontology)
- values in numerical format
- category (experimental/computed)
- if computed, the URI of the model used to generate the values
- Selecting a (suitable) modelling algorithm
The user will be able to choose from a list of suitable algorithms. Algorithms should include at a minimum:
- algorithm URI
- algorithm type (regression/classification)
- default values for its parameters (where applicable)
- Specifying parameters and Generating Predictive model
Once an algorithm has been selected, the user should define the endpoint, select the tuning parameters, (only if different values from the default ones are desired) and run the algorithm. The generated Model should contain, at a minimum:
- model URI
- the URI of the dataset that was used to create it
- the URIs of the input features
- the URI of the predicted feature
- values of tuning parameters
- Include services/APIs for validation of the generated model
- Provide mechanisms to pick out the best algorithm for a specific dataset (e.g. RRegrs)
- Include algorithms to calculate domain of applicability
- Selecting a prediction data set
After the creation of a model, the user will be able to select a prediction dataset, which should meet all the requirements specified in (Chomenidis et al. 2017). This dataset will be tested for compatibility against the required features of the model in terms of feature URIs, i.e. the dataset should contain all the subset of features used to produce the model. Additional features are allowed, however they will be ignored.
- Running predictive model on the prediction data set
The predictive model is applied on the prediction dataset to generate the predicted dataset, which should be compatible with the requirements specified in (Chomenidis et al. 2017). The predicted dataset augments the prediction dataset with all necessary information about the predicted feature:
- prediction feature URIs (each feature URI will be associated with a term from the ontology)
- values in numerical format
- category (computed)
- the URI of the model used to generate the values
Examples of implementation are annexed below under related resources.
This case study suggests a workflow that produces semantically annotated predictive models that can be shared, tested, validated and eventually applied for predicting adverse effect of substances in a safe by design and/or risk assessment regulatory framework. OpenRiskNet provides the necessary functionalities that allows researchers and practitioners to easily produce and expose their models as ready-to use web applications. The OpenRiskNet e-infrastructure can serve as a central model repository in the area of predictive toxicology. For example, when a research group publishes a predictive model in a scientific journal, they can additionally provide the implementation of the model as a web service using the OpenRiskNet implementation. The produced models contain all the necessary metadata and ontological information to make them easily searchable by the users and define systematically and rigorously their domain of applicability. Most importantly, the produced resources are not just static representation of the models, but actual web applications where the users can supply the necessary information for query substances and receive the predictions for their adverse effects.
Currently available services:
Service type: Trained model
Generate, store and share predictive statistical and machine learning modelsService type: Workflow, Application, Visualisation tool, Analysis tool, Processing tool, Trained model, Model generation tool, Model, Data mining tool
Generate, store and share predictive statistical and machine learning modelsService type: Analysis tool, Processing tool, Trained model, Model generation tool, Model, Data mining tool, Service
Toxicity predictionsService type: Application, Trained model, Service
Service type: Trained model
Service type: Trained model
Service type: Trained model, Model generation tool, Service
Interactive computing and workflows sharingService type: Workflow, Visualisation tool, Helper tool, Software, Analysis tool, Processing tool
Computation research made simple and reproducibleService type: Workflow, Database / data source, Service
A training data set will be obtained from an OpenRiskNet data source. The model has then to be trained with OpenRiskNet modelling tools and the resulting model has to be packaged into a container, documented and ontologically annotated. The model will be validated using OECD guidelines. Finally, a prediction can be run.
Case Study report