Mark Everingham, Josef Sivic and Andrew Zisserman

buffy 1
buffy 2
buffy 3
buffy 4
buffy 5
buffy 6

Overview

The objective of this work is to label television or movie footage with the names of the people present in each frame of the video. To the right are few example frames with automatically named characters from an episode of the TV series "Buffy the Vampire Slayer".

Why is it hard?

TV and movie material is extremely challenging visually as characters exhibit significant variation in their imaged appearance due to changes in scale, pose, lighting, expressions, hair style etc. There are additional problems of poor image quality and motion blur.

So how do we do it?

The novelty we bring is to employ visual recognition together with readily available textual annotation for TV and movie footage, in the form of subtitles and transcripts, to automatically assign the correct name to each face image.

Alone, neither the script nor the subtitles contain the required information to label the identity of the people in the video -- the subtitles record what is said, but not by whom, whereas the script records who says what, but not when. However, by automatic alignment of the two sources, it is possible to extract who says what and when. Knowledge that a character is speaking then gives a very weak cue that the person may be visible in the video. A key to the success of our method is to leverage this cue by visually detecting which (if any) character in the video corresponds to the speaker. This gives us sufficient annotated data from which to learn to recognize the other instances of the character.

The visual recognition involves automatically detecting facial feature points, as shown in the images lower right for frontal and profile examples, and building face descriptors based on these points.

Video example results

Videos showing naming results on part of the episode "Real Me" from the TV series "Buffy the Vampire Slayer" can be downloaded:

  1. For [1], frontal and profile faces. The video lasts 1 min 27 secs. Correctly assigned names are shown in white. Wrongly assigned names are shown in red.
  2. For [3], frontal faces only. The video last 1 min 17 secs. Correctly assigned names are shown in green. Wrongly assigned names are shown in red.

Data

The dataset section of these pages has downloadable data (frontal and profile face detections tracked, facial feature locations, and face descriptors) for an entire episode of the TV series "Buffy the Vampire Slayer" as used in [3], and for two episodes as used in [1].

Code

A Matlab implementation of the face processing pipeline.

Publications


[1] Sivic, J. , Everingham, M. and Zisserman, A.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009)

[2] Everingham, M. , Sivic, J. and Zisserman, A.
Image and Vision Computing (2009)

[3] Everingham, M., Sivic, J. and Zisserman, A.
Proceedings of the 17th British Machine Vision Conference (BMVC 2006)

Acknowledgements

This work is funded by European Project CLASS and ERC Grant VisRec .

class logo erc logo