TY - GEN
T1 - A Machine Learning Suite to Halo-Galaxy Connection
AU - Santi, Natalí S.M.de
AU - Rodrigues, Natália V.N.
AU - Montero-Dorta, Antonio D.
AU - Abramo, L. Raul
AU - Tucci, Beatriz
AU - Artale, M. Celeste
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - As far as we know, galaxies form inside dark matter halos and elucidating this connection is a key element in theories of galaxy formation and evolution. In this work we propose a suite of machine learning tools to predict baryonic from halo properties in the IllustrisTNG300 magnetohydrodynamical simulation. We apply four methods: extremely randomized trees, K-nearest neighbors, light gradient boosting machine, and neural networks. Moreover, we combine the results of them in a stacked model. In addition, we apply all these methods in an augmented dataset using the synthetic minority over-sampling technique for regression with Gaussian noise, to deal with the problem of imbalanced data sets. Altogether, the ML algorithms are consistent at predicting central galaxy properties from a set of input halo properties such as halo mass, concentration, spin, and halo overdensity. For stellar mass, the Pearson correlation coefficient is 0.98, while for specific star formation rate, color, and size it is between 0.7–0.8. Lastly, the presented analysis adds evidence to previous works indicating that certain galaxy properties cannot be reproduced using halo features alone.
AB - As far as we know, galaxies form inside dark matter halos and elucidating this connection is a key element in theories of galaxy formation and evolution. In this work we propose a suite of machine learning tools to predict baryonic from halo properties in the IllustrisTNG300 magnetohydrodynamical simulation. We apply four methods: extremely randomized trees, K-nearest neighbors, light gradient boosting machine, and neural networks. Moreover, we combine the results of them in a stacked model. In addition, we apply all these methods in an augmented dataset using the synthetic minority over-sampling technique for regression with Gaussian noise, to deal with the problem of imbalanced data sets. Altogether, the ML algorithms are consistent at predicting central galaxy properties from a set of input halo properties such as halo mass, concentration, spin, and halo overdensity. For stellar mass, the Pearson correlation coefficient is 0.98, while for specific star formation rate, color, and size it is between 0.7–0.8. Lastly, the presented analysis adds evidence to previous works indicating that certain galaxy properties cannot be reproduced using halo features alone.
UR - http://www.scopus.com/inward/record.url?scp=85175996004&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-34167-0_7
DO - 10.1007/978-3-031-34167-0_7
M3 - Conference contribution
AN - SCOPUS:85175996004
SN - 9783031341663
T3 - Astrophysics and Space Science Proceedings
SP - 31
EP - 34
BT - Machine Learning for Astrophysics - Proceedings of the ML4Astro International Conference
A2 - Bufano, Filomena
A2 - Riggi, Simone
A2 - Sciacca, Eva
A2 - Schilliro, Francesco
PB - Springer Science and Business Media B.V.
T2 - 1st International Conference on Machine Learning for Astrophysics, ML4ASTRO 2022
Y2 - 30 May 2022 through 1 June 2022
ER -