Segmentation is a pre-requisite for many tasks in image and video understanding. Since pixels associated with the same object tend to move coherently, motion can be a strong cue for segmentation. Recently, there has been interest in layered representations for video, where the video sequence is decomposed into a set of layers akin to those used by animators to produce cartoons and animated movies. Each layer integrates the information required to represent an object throughout the entire sequence. To produce each frame of the video, layers are warped according to the motion parameters of the associated object and the frame composed according to a segmentation mask which assigns each pixel to one of the objects. In this work we tackle the segmentation problem.
In layered representations, the overall motion is usually modeled as a mixture distribution, where each component is a Gaussian centered (in motion parameter space) on the affine parameters of one object. The motion parameters are then learned by an expectation-maximization algorithm that alternates between segmenting the video and computing motion parameters for the segmented regions. One of the major limitations of this models is that it fails to enforce spatial consistency of the segmentation masks, e.g. the fact that one pixel is associated with a given object does not make it more likely that the same will hold for its neighbors. In result, the segmentation masks can be quite noisy. One solution to this problem, which is adopted in this work, is to include in the motion model a prior that enforces spatial consistency. It turns out, however, that the parameters of such priors can have a significant impact on the resulting segmentations and there are few principled ways to determine good parameter values.
In this project, we explore the use of an empirical Bayesian estimation framework to address this problem. This framework, as well as parameter estimation algorithms, is described in the PAMI 01 paper [ps, pdf]. Below, we illustrate he segmentation masks obtained on various challenging sequences, as well as a comparison with the traditional solution.
|Frame from the sequence||Segmentation mask|
|Traditional prior:||Empirical Bayesian prior:|