Acoustic signals, especially those of biological source, remain unexplored in the Belgian part of the North Sea (BPNS). The BPNS, although dominated by anthrophony (sounds from human activities), is expected to be acoustically diverse given the presence of biodiverse sandbanks, gravel beds and artificial hard structures. Under the framework of the LifeWatch Broadband Acoustic Network, sound data have been collected since the spring of 2020. These recordings, encompassing both biophony, geophony and anthrophony, have been listened to and annotated for unknown, acoustically salient sounds. To obtain the acoustic features of these annotations, we used two existing automatic feature extractions: the Animal Vocalization Encoder based on Self-Supervision (AVES) and a convolutional autoencoder network (CAE) retrained on the data from this study. An unsupervised density-based clustering algorithm (HDBSCAN) was applied to predict clusters. We coded a grid search function to reduce the dimensionality of the feature sets and to adjust the hyperparameters of HDBSCAN. We searched the hyperparameter space for the most optimized combination of parameter values based on two selected clustering evaluation measures: the homogeneity and the density-based clustering validation (DBCV) scores. Although both feature sets produced meaningful clusters, AVES feature sets resulted in more solid, homogeneous clusters with relatively lower intra-cluster distances, appearing to be more advantageous for the purpose and dataset of this study. The 26 final clusters we obtained were revised by a bioacoustics expert. We were able to name and describe 10 unique sounds, but only clusters named as ‘Jackhammer’ and ‘Tick’ can be interpreted as biological with certainty. Although unsupervised clustering is conventional in ecological research, we highlight its practical use in revising clusters of annotated unknown sounds. The revised clusters we detailed in this study already define a few groups of distinct and recurring sounds that could serve as a preliminary component of a valid annotated training dataset potentially feeding supervised machine learning and classifier models. |