Transformer-decoder GPT models for generating virtual screening libraries of HMG-Coenzyme A reductase inhibitors: effects of temperature, prompt-length and transfer-learning strategies

[thumbnail of Open Access]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.
| Preview
Available under license: Creative Commons Attribution
[thumbnail of R2_Statin_GPT_Cafiero_2024.pdf]
Text - Accepted Version
· Restricted to Repository staff only
Restricted to Repository staff only

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Cafiero, M. orcid id iconORCID: https://orcid.org/0000-0002-4895-1783 (2024) Transformer-decoder GPT models for generating virtual screening libraries of HMG-Coenzyme A reductase inhibitors: effects of temperature, prompt-length and transfer-learning strategies. Journal of Chemical Information and Modeling, 64 (22). pp. 8464-8480. ISSN 1549-960X doi: 10.1021/acs.jcim.4c01309

Abstract/Summary

Attention-based decoder models were used to generate libraries of novel inhibitors for the HMG-Coenzyme A reductase (HMGCR) enzyme. These deep neural network models were pre-trained on previously synthesized drug-like molecules from the ZINC15 database to learn the syntax of SMILES strings, and then fine-tuned with a set of ~1,000 molecules that inhibit HMGCR. The numbers of layers used for pre-training and fine-tuning were varied to find the optimal balance for robust library generation. Virtual screening libraries were also generated with different temperatures and numbers of input tokens (prompt-length) to find the most desirable molecular properties. The resulting libraries were screened against several criteria, including: IC50 values predicted by a Dense Neural Network (DNN) trained on experimental HMGCR IC50 values, docking scores from AutoDock Vina (via Dockstring), a calculated Quantitative Estimate of Druglikeness (QED), and Tanimoto similarity to known HMGCR inhibitors. It was found that 50/50 or 25/75% pre-trained/fine-tuned models with a non-zero temperature and shorter prompt-lengths produced the most robust libraries, and the DNN-predicted IC50 values had good correlation with docking scores and statin-similarity. 42% of generated molecules were classified as statin-like by k-means clustering, with the rosuvastatin-like group having the lowest IC50 values and lowest docking scores.

Altmetric Badge

Item Type Article
URI https://reading-clone.eprints-hosting.org/id/eprint/119218
Identification Number/DOI 10.1021/acs.jcim.4c01309
Refereed Yes
Divisions Life Sciences > School of Chemistry, Food and Pharmacy > Department of Chemistry
Publisher American Chemical Society
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar