Statistical study design for analyzing multiple gene loci correlation in DNA sequences

[thumbnail of Open Access]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.
| Preview
Available under license: Creative Commons Attribution

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Kamoljitprapa, P. orcid id iconORCID: https://orcid.org/0000-0002-5547-7354, Baksh, F. M. orcid id iconORCID: https://orcid.org/0000-0003-3107-8815, De Gaetano, A. orcid id iconORCID: https://orcid.org/0000-0001-7712-056X, Polsen, O. and Leelasilapasart, P. orcid id iconORCID: https://orcid.org/0000-0002-0198-9944 (2023) Statistical study design for analyzing multiple gene loci correlation in DNA sequences. Mathematics, 11 (23). 4710. ISSN 2227-7390 doi: 10.3390/math11234710

Abstract/Summary

This study presents a novel statistical and computational approach using nonparametric regression, which capitalizes on correlation structure to deal with the high-dimensional data often found in pharmacogenomics, for instance, in Crohn’s inflammatory bowel disease. The empirical correlation between the test statistics, investigated via simulation, can be used as an estimate of noise. The theoretical distribution of −log10(p-value) is used to support the estimation of that optimal bandwidth for the model, which adequately controls type I error rates while maintaining reasonable power. Two proposed approaches, involving normal and Laplace-LD kernels, were evaluated by conducting a case-control study using real data from a genome-wide association study on Crohn’s disease. The study successfully identified single nucleotide polymorphisms on the NOD2 gene associated with the disease. The proposed method reduces the computational burden by approximately 33% with reasonable power, allowing for a more efficient and accurate analysis of genetic variants influencing drug responses. The study contributes to the advancement of statistical methodology for analyzing complex genetic data and is of practical advantage for the development of personalized medicine.

Altmetric Badge

Item Type Article
URI https://reading-clone.eprints-hosting.org/id/eprint/114194
Identification Number/DOI 10.3390/math11234710
Refereed Yes
Divisions Science > School of Mathematical, Physical and Computational Sciences > Department of Mathematics and Statistics
Uncontrolled Keywords General Mathematics, Engineering (miscellaneous), Computer Science (miscellaneous)
Publisher MDPI AG
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar