Comparing Keyframe Summaries of Egocentric Videos: Closest-to-Centroid Baseline
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl Cynhadledd › adolygiad gan gymheiriaid
StandardStandard
Yn: International Conference on Image Processing Theory, Tools and Applications (IPTA), 12.03.2018.
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl Cynhadledd › adolygiad gan gymheiriaid
HarvardHarvard
APA
CBE
MLA
VancouverVancouver
Author
RIS
TY - JOUR
T1 - Comparing Keyframe Summaries of Egocentric Videos: Closest-to-Centroid Baseline
AU - Kuncheva, Ludmila
AU - Yousefi, Paria
AU - Almeida, Jurandy
N1 - 978-1-5386-1842-4/17/
PY - 2018/3/12
Y1 - 2018/3/12
N2 - Evaluation of keyframe video summaries is a notoriously difficult problem. So far, there is no consensus on guidelines, protocols, benchmarks and baseline models. This study contributes in three ways: (1) We propose a new baseline model for creating a keyframe summary, called Closest-to-Centroid, and show that it is a better contestant compared to the two most popular baselines: uniform sampling and choosing the mid-event frame. (2) We also propose a method for matching the visual appearance of keyframes, suitable for comparing summaries of egocentric videos and lifelogging photostreams. (3) We examine 24 image feature spaces (different descriptors) including colour, texture, shape, motion and a feature space extracted by a pretrained convolutional neural network (CNN). Our results using the four egocentric videos in the UTE database favour low-level shape and colour feature spaces for use with CC.
AB - Evaluation of keyframe video summaries is a notoriously difficult problem. So far, there is no consensus on guidelines, protocols, benchmarks and baseline models. This study contributes in three ways: (1) We propose a new baseline model for creating a keyframe summary, called Closest-to-Centroid, and show that it is a better contestant compared to the two most popular baselines: uniform sampling and choosing the mid-event frame. (2) We also propose a method for matching the visual appearance of keyframes, suitable for comparing summaries of egocentric videos and lifelogging photostreams. (3) We examine 24 image feature spaces (different descriptors) including colour, texture, shape, motion and a feature space extracted by a pretrained convolutional neural network (CNN). Our results using the four egocentric videos in the UTE database favour low-level shape and colour feature spaces for use with CC.
M3 - Conference article
JO - International Conference on Image Processing Theory, Tools and Applications (IPTA)
JF - International Conference on Image Processing Theory, Tools and Applications (IPTA)
SN - 2154-512X
T2 - Seventh International Conference on Image Processing Theory, Tools and Applications
Y2 - 28 November 2017 through 1 December 2017
ER -