Evaluation of the lasso and the elastic net in genome-wide association studies
The number of publications performing genome-wide association studies (GWAS) has increased dramatically. Penalized regression approaches have been developed to overcome the challenges caused by the high dimensional data, but these methods are relatively new in the GWAS field. In this study we have c...
Main Authors: | , , , , |
---|---|
Format: | Online |
Language: | English |
Published: |
Frontiers Media S.A.
2013
|
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3850240/ |
id |
pubmed-3850240 |
---|---|
recordtype |
oai_dc |
spelling |
pubmed-38502402013-12-20 Evaluation of the lasso and the elastic net in genome-wide association studies Waldmann, Patrik Mészáros, Gábor Gredler, Birgit Fuerst, Christian Sölkner, Johann Genetics The number of publications performing genome-wide association studies (GWAS) has increased dramatically. Penalized regression approaches have been developed to overcome the challenges caused by the high dimensional data, but these methods are relatively new in the GWAS field. In this study we have compared the statistical performance of two methods (the least absolute shrinkage and selection operator—lasso and the elastic net) on two simulated data sets and one real data set from a 50 K genome-wide single nucleotide polymorphism (SNP) panel of 5570 Fleckvieh bulls. The first simulated data set displays moderate to high linkage disequilibrium between SNPs, whereas the second simulated data set from the QTLMAS 2010 workshop is biologically more complex. We used cross-validation to find the optimal value of regularization parameter λ with both minimum MSE and minimum MSE + 1SE of minimum MSE. The optimal λ values were used for variable selection. Based on the first simulated data, we found that the minMSE in general picked up too many SNPs. At minMSE + 1SE, the lasso didn't acquire any false positives, but selected too few correct SNPs. The elastic net provided the best compromise between few false positives and many correct selections when the penalty weight α was around 0.1. However, in our simulation setting, this α value didn't result in the lowest minMSE + 1SE. The number of selected SNPs from the QTLMAS 2010 data was after correction for population structure 82 and 161 for the lasso and the elastic net, respectively. In the Fleckvieh data set after population structure correction lasso and the elastic net identified from 1291 to 1966 important SNPs for milk fat content, with major peaks on chromosomes 5, 14, 15, and 20. Hence, we can conclude that it is important to analyze GWAS data with both the lasso and the elastic net and an alternative tuning criterion to minimum MSE is needed for variable selection. Frontiers Media S.A. 2013-12-04 /pmc/articles/PMC3850240/ /pubmed/24363662 http://dx.doi.org/10.3389/fgene.2013.00270 Text en Copyright © 2013 Waldmann, Mészáros, Gredler, Fuerst and Sölkner. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
repository_type |
Open Access Journal |
institution_category |
Foreign Institution |
institution |
US National Center for Biotechnology Information |
building |
NCBI PubMed |
collection |
Online Access |
language |
English |
format |
Online |
author |
Waldmann, Patrik Mészáros, Gábor Gredler, Birgit Fuerst, Christian Sölkner, Johann |
spellingShingle |
Waldmann, Patrik Mészáros, Gábor Gredler, Birgit Fuerst, Christian Sölkner, Johann Evaluation of the lasso and the elastic net in genome-wide association studies |
author_facet |
Waldmann, Patrik Mészáros, Gábor Gredler, Birgit Fuerst, Christian Sölkner, Johann |
author_sort |
Waldmann, Patrik |
title |
Evaluation of the lasso and the elastic net in genome-wide association studies |
title_short |
Evaluation of the lasso and the elastic net in genome-wide association studies |
title_full |
Evaluation of the lasso and the elastic net in genome-wide association studies |
title_fullStr |
Evaluation of the lasso and the elastic net in genome-wide association studies |
title_full_unstemmed |
Evaluation of the lasso and the elastic net in genome-wide association studies |
title_sort |
evaluation of the lasso and the elastic net in genome-wide association studies |
description |
The number of publications performing genome-wide association studies (GWAS) has increased dramatically. Penalized regression approaches have been developed to overcome the challenges caused by the high dimensional data, but these methods are relatively new in the GWAS field. In this study we have compared the statistical performance of two methods (the least absolute shrinkage and selection operator—lasso and the elastic net) on two simulated data sets and one real data set from a 50 K genome-wide single nucleotide polymorphism (SNP) panel of 5570 Fleckvieh bulls. The first simulated data set displays moderate to high linkage disequilibrium between SNPs, whereas the second simulated data set from the QTLMAS 2010 workshop is biologically more complex. We used cross-validation to find the optimal value of regularization parameter λ with both minimum MSE and minimum MSE + 1SE of minimum MSE. The optimal λ values were used for variable selection. Based on the first simulated data, we found that the minMSE in general picked up too many SNPs. At minMSE + 1SE, the lasso didn't acquire any false positives, but selected too few correct SNPs. The elastic net provided the best compromise between few false positives and many correct selections when the penalty weight α was around 0.1. However, in our simulation setting, this α value didn't result in the lowest minMSE + 1SE. The number of selected SNPs from the QTLMAS 2010 data was after correction for population structure 82 and 161 for the lasso and the elastic net, respectively. In the Fleckvieh data set after population structure correction lasso and the elastic net identified from 1291 to 1966 important SNPs for milk fat content, with major peaks on chromosomes 5, 14, 15, and 20. Hence, we can conclude that it is important to analyze GWAS data with both the lasso and the elastic net and an alternative tuning criterion to minimum MSE is needed for variable selection. |
publisher |
Frontiers Media S.A. |
publishDate |
2013 |
url |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3850240/ |
_version_ |
1612034096298459136 |