Standard Standard

The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets. / Chimienti, Marianna; Kato, Akiko; Hicks, Olivia et al.
In: Scientific Reports, Vol. 12, No. 1, 17.11.2022, p. 19737.

Research output: Contribution to journalArticlepeer-review

HarvardHarvard

Chimienti, M, Kato, A, Hicks, O, Angelier, F, Beaulieu, M, Ouled-Cheikh, J, Marciau, C, Raclot, T, Tucker, M, Wisniewska, DM, Chiaradia, A & Ropert-Coudert, Y 2022, 'The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets', Scientific Reports, vol. 12, no. 1, pp. 19737. https://doi.org/10.1038/s41598-022-22258-1

APA

Chimienti, M., Kato, A., Hicks, O., Angelier, F., Beaulieu, M., Ouled-Cheikh, J., Marciau, C., Raclot, T., Tucker, M., Wisniewska, D. M., Chiaradia, A., & Ropert-Coudert, Y. (2022). The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets. Scientific Reports, 12(1), 19737. https://doi.org/10.1038/s41598-022-22258-1

CBE

Chimienti M, Kato A, Hicks O, Angelier F, Beaulieu M, Ouled-Cheikh J, Marciau C, Raclot T, Tucker M, Wisniewska DM, et al. 2022. The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets. Scientific Reports. 12(1):19737. https://doi.org/10.1038/s41598-022-22258-1

MLA

VancouverVancouver

Chimienti M, Kato A, Hicks O, Angelier F, Beaulieu M, Ouled-Cheikh J et al. The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets. Scientific Reports. 2022 Nov 17;12(1):19737. doi: 10.1038/s41598-022-22258-1

Author

RIS

TY - JOUR

T1 - The role of individual variability on the predictive performance of machine learning applied to large bio-logging datasets

AU - Chimienti, Marianna

AU - Kato, Akiko

AU - Hicks, Olivia

AU - Angelier, Frédéric

AU - Beaulieu, Michaël

AU - Ouled-Cheikh, Jazel

AU - Marciau, Coline

AU - Raclot, Thierry

AU - Tucker, Meagan

AU - Wisniewska, Danuta Maria

AU - Chiaradia, André

AU - Ropert-Coudert, Yan

N1 - © 2022. The Author(s).

PY - 2022/11/17

Y1 - 2022/11/17

N2 - Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure.

AB - Animal-borne tagging (bio-logging) generates large and complex datasets. In particular, accelerometer tags, which provide information on behaviour and energy expenditure of wild animals, produce high-resolution multi-dimensional data, and can be challenging to analyse. We tested the performance of commonly used artificial intelligence tools on datasets of increasing volume and dimensionality. By collecting bio-logging data across several sampling seasons, datasets are inherently characterized by inter-individual variability. Such information should be considered when predicting behaviour. We integrated both unsupervised and supervised machine learning approaches to predict behaviours in two penguin species. The classified behaviours obtained from the unsupervised approach Expectation Maximisation were used to train the supervised approach Random Forest. We assessed agreement between the approaches, the performance of Random Forest on unknown data and the implications for the calculation of energy expenditure. Consideration of behavioural variability resulted in high agreement (> 80%) in behavioural classifications and minimal differences in energy expenditure estimates. However, some outliers with < 70% of agreement, highlighted how behaviours characterized by signal similarity are confused. We advise the broad bio-logging community, approaching these large datasets, to be cautious when upscaling predictions, as this might lead to less accurate estimates of behaviour and energy expenditure.

KW - Animals

KW - Artificial Intelligence

KW - Machine Learning

KW - Supervised Machine Learning

KW - Energy Metabolism

U2 - 10.1038/s41598-022-22258-1

DO - 10.1038/s41598-022-22258-1

M3 - Article

C2 - 36396680

VL - 12

SP - 19737

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

ER -