cpLogD confidence predictor for logD

The model predicts Log D based on a support vector machine trained on data from ChEMBL version 23 comprising approximately 1.6 million compounds. The confidence interval is calculated for the confidence specified by the slider using the conformal prediction approach.

For developers
For end-users
Type:
Trained model
Categories:
API Definitions for OpenRiskNet applications and data
Applicability domain:
Computational modelling
Topic:
Structure-activity relationship (SAR / QSAR), Predictive modelling, Chemical properties
Targeted industry:
Drugs, Chemicals
Targeted users:
Risk assessors, Researchers, Students, Software Developers, Data managers, Regulators, Informed public
Relevant OpenRiskNet case study:
ModelRX - Modelling for Prediction or Read Across
References and training materials:

A confidence predictor for logD using conformal regression and a support-vector machine
Maris Lapins, Staffan Arvidsson, Samuel Lampa, Arvid Berg, Wesley Schaal, Jonathan Alvarsson and Ola Spjuth Journal of Cheminformatics10.1 (2018): 17. 
https://link.springer.com/article/10.1186/s13321-018-0271-1


Provided by:
Uppsala University
Login required:
No
Implementation status:
Available as web service, Application programming interface available
Integration status:
Integrated application
Service integration operations completed:
Utilises the OpenRiskNet APIs to ensure that each service is accessible to our proposed interoperability layer.
Is annotated according to the semantic interoperability layer concept using defined ontologies.
Is containerised for easy deployment in virtual environments of OpenRiskNet instances.
Has documented scientific and technical background.
Is deployed into the OpenRiskNet reference environment.
Is listed in the OpenRiskNet discovery services.
Is listed in other central repositories like eInfraCentral, bio.tools and TeSS (ELIXIR).
Provides legal and ethical statements on how the service can be used.

Resources & Training

Peer-reviewed publication
A confidence predictor for logD using conformal regression and a support-vector machine
Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O
3 Apr 2018
Abstract:
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of Q2=0.973 and with the best performing nonconformity measure having median prediction interval of ± 0.39 log units at 80% confidence and ± 0.60 log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.

Published in: J. Cheminformatics
Publisher: BMC (Springer Nature)
Target audience: Risk assessors, Researchers, Students, Developers
Open access: yes
Organisations involved: UU
Peer-reviewed publication