Keyframe Summarisation of Egocentric Video
Electronic versions
Documents
22.4 MB, PDF document
- summarisation, keyframe summary, edited nearest neighbour, prototype selection, egocentric, lifelog, time tagged summary, online summary, real time summary, query based summary, personalised summary, baseline summary, greedy tabu selector, online video summarisation, Doctor of Philosophy (PhD), deep learning, Machine Learning, computer vision, PhD, School of Computer Science and Electronic Engineering
Research areas
Abstract
Egocentric data refers to collections of images by a user wearing a camera
over a period of time. The pictures taken provide considerable potential for
knowledge mining related to the user’s life, and consequently open up a wide
range of opportunities for new applications on health-care, protection and
security, law enforcement and training, leisure, and self-monitoring. As a
result, large volumes of egocentric data are being continually collected every
day, which highlights the importance of developing video analysis techniques
to facilitate browsing the created video data. Generating condensed yet
informative version from the original unstructured egocentric frame stream
eases comprehending content, and browsing the narratives.
Given the great interest in creating keyframe summaries from video, it
is surprising how little has been done to formalise their evaluation and
comparison. The thesis first carries out a series of investigations related
to automatic evaluation of video summaries, and their comparisons. A
discrimination capacity measure is proposed as a formal way to quantify
the improvement over the uniform baseline, assuming that one or more
ground truth summaries are available. Subsequently, a formal protocol for
comparing summaries when ground truth is available is proposed.
We noticed the mostly used benchmark summarisation methods: random,
uniform, and mid-event selections, are weak competitors. Therefore, we
propose a new benchmark method for creating a keyframe summary, called
“closest-to-centroid”. We examined the presented baseline method on 20
different image descriptors to demonstrate its performance against the typical
choices of baseline methods.
Thereafter, the problem of selecting a keyframe summary is addressed
as a problem of prototype (instance) selection for the nearest neighbour
classifier (1-nn). Assuming that the video is already segmented into events of
interest (classes), and represented as a data set in some feature space, we
propose a Greedy Tabu Selector algorithm which picks one frame to represent
each class. Summaries generated by the algorithm are evaluated on a
widely-used egocentric video database, and compared against the proposed
baseline (closest-to-centroid). The Greedy Tabu Selector algorithm leads to an
improved match to the user ground truth, compared to the closest-to-centroid
baseline summarisation method.
Next, a method for selective video summarisation of egocentric video is
introduced. It extracts multiple summaries from the same stream based upon
different user queries. The result is a time-tagged summary of keyframes
related to the query concept. The method is evaluated on two commonly
used egocentric and lifelog databases.
Further to this, it is noted that despite the existence of a large number of
approaches for generating summaries from egocentric video, on-line video
summarisation has not been fully explored yet. This type of summary can
be useful where memory constraints mean it is not practical to wait for
the full video to be available for processing. We propose a classification
(taxonomy) for on-line video summarisation methods based upon their
descriptive and distinguishing properties. Afterwards, we develop an on-line
video summarisation algorithm to generate keyframe summaries during video
capture. Results are evaluated on an egocentric database. The summaries
generated by the proposed method outperform those generated by the two
competitors.
over a period of time. The pictures taken provide considerable potential for
knowledge mining related to the user’s life, and consequently open up a wide
range of opportunities for new applications on health-care, protection and
security, law enforcement and training, leisure, and self-monitoring. As a
result, large volumes of egocentric data are being continually collected every
day, which highlights the importance of developing video analysis techniques
to facilitate browsing the created video data. Generating condensed yet
informative version from the original unstructured egocentric frame stream
eases comprehending content, and browsing the narratives.
Given the great interest in creating keyframe summaries from video, it
is surprising how little has been done to formalise their evaluation and
comparison. The thesis first carries out a series of investigations related
to automatic evaluation of video summaries, and their comparisons. A
discrimination capacity measure is proposed as a formal way to quantify
the improvement over the uniform baseline, assuming that one or more
ground truth summaries are available. Subsequently, a formal protocol for
comparing summaries when ground truth is available is proposed.
We noticed the mostly used benchmark summarisation methods: random,
uniform, and mid-event selections, are weak competitors. Therefore, we
propose a new benchmark method for creating a keyframe summary, called
“closest-to-centroid”. We examined the presented baseline method on 20
different image descriptors to demonstrate its performance against the typical
choices of baseline methods.
Thereafter, the problem of selecting a keyframe summary is addressed
as a problem of prototype (instance) selection for the nearest neighbour
classifier (1-nn). Assuming that the video is already segmented into events of
interest (classes), and represented as a data set in some feature space, we
propose a Greedy Tabu Selector algorithm which picks one frame to represent
each class. Summaries generated by the algorithm are evaluated on a
widely-used egocentric video database, and compared against the proposed
baseline (closest-to-centroid). The Greedy Tabu Selector algorithm leads to an
improved match to the user ground truth, compared to the closest-to-centroid
baseline summarisation method.
Next, a method for selective video summarisation of egocentric video is
introduced. It extracts multiple summaries from the same stream based upon
different user queries. The result is a time-tagged summary of keyframes
related to the query concept. The method is evaluated on two commonly
used egocentric and lifelog databases.
Further to this, it is noted that despite the existence of a large number of
approaches for generating summaries from egocentric video, on-line video
summarisation has not been fully explored yet. This type of summary can
be useful where memory constraints mean it is not practical to wait for
the full video to be available for processing. We propose a classification
(taxonomy) for on-line video summarisation methods based upon their
descriptive and distinguishing properties. Afterwards, we develop an on-line
video summarisation algorithm to generate keyframe summaries during video
capture. Results are evaluated on an egocentric database. The summaries
generated by the proposed method outperform those generated by the two
competitors.
Details
Original language | English |
---|---|
Awarding Institution |
|
Supervisors/Advisors |
|
Thesis sponsors |
|
Award date | 4 Nov 2019 |