TY - JOUR
T1 - Wide Area VISTA Extra-galactic Survey (WAVES)
T2 - unsupervised star-galaxy separation on the WAVES-Wide photometric input catalogue using UMAP and HDBSCAN
AU - Cook, Todd L.
AU - Bandi, Behnood
AU - Philipsborn, Sam
AU - Loveday, Jon
AU - Bellstedt, Sabine
AU - Driver, Simon P.
AU - Robotham, Aaron S.G.
AU - Bilicki, Maciej
AU - Kaur, Gursharanjit
AU - Tempel, Elmo
AU - Baldry, Ivan
AU - Gruen, Daniel
AU - Longhetti, Marcella
AU - Iovino, Angela
AU - Holwerda, Benne W.
AU - Demarco, Ricardo
N1 - Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press on behalf of Royal Astronomical Society.
PY - 2024/12/1
Y1 - 2024/12/1
N2 - Star-galaxy separation is a crucial step in creating target catalogues for extragalactic spectroscopic surveys. A classifier biased towards inclusivity risks including high numbers of stars, wasting fibre hours, while a more conservative classifier might overlook galaxies, compromising completeness and hence survey objectives. To avoid bias introduced by a training set in supervised methods, we employ an unsupervised machine learning approach. Using photometry from the Wide Area VISTA Extragalactic Survey (WAVES)-Wide catalogue comprising nine-band data, we create a feature space with colours, fluxes, and apparent size information extracted by ProFound. We apply the non-linear dimensionality reduction method UMAP (Uniform Manifold Approximation and Projection) combined with the classifier hdbscan (Hierarchical Density-Based Spatial Clustering of Applications with Noise) to classify stars and galaxies. Our method is verified against a baseline colour and morphological method using a truth catalogue from Gaia, SDSS (Sloan Digital Sky Survey), GAMA (Galaxy And Mass Assembly), and DESI (Dark Energy Spectroscopic Instrument). We correctly identify 99.75 per cent of galaxies within the AB magnitude limit of, with an F1 score of across the entire ground truth sample, compared to from the baseline method. Our method's higher purity () compared to the baseline () increases efficiency, identifying 11 per cent fewer galaxy or ambiguous sources, saving approximately 70 000 fibre hours on the 4MOST (4-m Multi-Object Spectroscopic Telescope) instrument. We achieve reliable classification statistics for challenging sources including quasars, compact galaxies, and low surface brightness galaxies, retrieving 92.7 per cent, 84.6 per cent, and 99.5 per cent of them, respectively. Angular clustering analysis validates our classifications, showing consistency with expected galaxy clustering, regardless of the baseline classification.
AB - Star-galaxy separation is a crucial step in creating target catalogues for extragalactic spectroscopic surveys. A classifier biased towards inclusivity risks including high numbers of stars, wasting fibre hours, while a more conservative classifier might overlook galaxies, compromising completeness and hence survey objectives. To avoid bias introduced by a training set in supervised methods, we employ an unsupervised machine learning approach. Using photometry from the Wide Area VISTA Extragalactic Survey (WAVES)-Wide catalogue comprising nine-band data, we create a feature space with colours, fluxes, and apparent size information extracted by ProFound. We apply the non-linear dimensionality reduction method UMAP (Uniform Manifold Approximation and Projection) combined with the classifier hdbscan (Hierarchical Density-Based Spatial Clustering of Applications with Noise) to classify stars and galaxies. Our method is verified against a baseline colour and morphological method using a truth catalogue from Gaia, SDSS (Sloan Digital Sky Survey), GAMA (Galaxy And Mass Assembly), and DESI (Dark Energy Spectroscopic Instrument). We correctly identify 99.75 per cent of galaxies within the AB magnitude limit of, with an F1 score of across the entire ground truth sample, compared to from the baseline method. Our method's higher purity () compared to the baseline () increases efficiency, identifying 11 per cent fewer galaxy or ambiguous sources, saving approximately 70 000 fibre hours on the 4MOST (4-m Multi-Object Spectroscopic Telescope) instrument. We achieve reliable classification statistics for challenging sources including quasars, compact galaxies, and low surface brightness galaxies, retrieving 92.7 per cent, 84.6 per cent, and 99.5 per cent of them, respectively. Angular clustering analysis validates our classifications, showing consistency with expected galaxy clustering, regardless of the baseline classification.
KW - catalogues
KW - galaxies: photometry
KW - large-scale structure of Universe
KW - methods: data analysis
KW - surveys
UR - http://www.scopus.com/inward/record.url?scp=85209654746&partnerID=8YFLogxK
U2 - 10.1093/mnras/stae2389
DO - 10.1093/mnras/stae2389
M3 - Article
AN - SCOPUS:85209654746
SN - 0035-8711
VL - 535
SP - 2129
EP - 2148
JO - Monthly Notices of the Royal Astronomical Society
JF - Monthly Notices of the Royal Astronomical Society
IS - 3
ER -