Genetic algorithm in the wavelet domain for large p small n regression

Eylem Deniz Howe, Orietta Nicolis

Resultado de la investigación: Article

4 Citas (Scopus)

Resumen

Many areas of statistical modeling are plagued by the "curse of dimensionality," in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.

Idioma originalEnglish
Páginas (desde-hasta)1144-1157
Número de páginas14
PublicaciónCommunications in Statistics: Simulation and Computation
Volumen44
N.º5
DOI
EstadoPublished - 1 ene 2015

Huella dactilar

Regression Model
Wavelets
Regression
Genetic algorithms
Genetic Algorithm
Wavelet Transformation
Functional Model
Subset
Near-infrared Spectroscopy
Multicollinearity
Stochastic Search
Spectral Decomposition
Curse of Dimensionality
Statistical Modeling
Set theory
Model
Demonstrate
Objective function
Flexibility
Filter

ASJC Scopus subject areas

  • Statistics and Probability
  • Modelling and Simulation

Citar esto

@article{36efde6f93834692b961e94d43b13a45,
title = "Genetic algorithm in the wavelet domain for large p small n regression",
abstract = "Many areas of statistical modeling are plagued by the {"}curse of dimensionality,{"} in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.",
keywords = "Functional regression, Genetic algorithm, Wavelet domain.",
author = "Howe, {Eylem Deniz} and Orietta Nicolis",
year = "2015",
month = "1",
day = "1",
doi = "10.1080/03610918.2013.809101",
language = "English",
volume = "44",
pages = "1144--1157",
journal = "Communications in Statistics Part B: Simulation and Computation",
issn = "0361-0918",
publisher = "Taylor and Francis Ltd.",
number = "5",

}

Genetic algorithm in the wavelet domain for large p small n regression. / Howe, Eylem Deniz; Nicolis, Orietta.

En: Communications in Statistics: Simulation and Computation, Vol. 44, N.º 5, 01.01.2015, p. 1144-1157.

Resultado de la investigación: Article

TY - JOUR

T1 - Genetic algorithm in the wavelet domain for large p small n regression

AU - Howe, Eylem Deniz

AU - Nicolis, Orietta

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Many areas of statistical modeling are plagued by the "curse of dimensionality," in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.

AB - Many areas of statistical modeling are plagued by the "curse of dimensionality," in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.

KW - Functional regression

KW - Genetic algorithm

KW - Wavelet domain.

UR - http://www.scopus.com/inward/record.url?scp=84908611433&partnerID=8YFLogxK

U2 - 10.1080/03610918.2013.809101

DO - 10.1080/03610918.2013.809101

M3 - Article

AN - SCOPUS:84908611433

VL - 44

SP - 1144

EP - 1157

JO - Communications in Statistics Part B: Simulation and Computation

JF - Communications in Statistics Part B: Simulation and Computation

SN - 0361-0918

IS - 5

ER -