Layered Image Representation

This Demo is created by John Y. A. Wang

Abstract

We describe a system for representing moving images with sets of overlapping layers. Each layer contains an intensity map that defines the additive values of each pixel, along with an alpha map that serves as a mask indicating the transparency. The layers are ordered in depth and they occlude each other in accord with the rules of compositing. Velocity maps define how the layers are to be warped over time. The layered representation is more flexible than standard image transforms and can capture many important properties of natural image sequences. We describe some methods for decomposing image sequences into layers using motion analysis, and we discuss how the representation may be used for image coding and other applications.

Keywords: Image coding, motion analysis, image segmentation, image representation, robust estimation.

Description

We describe the decomposition of a sequence into its layered image representation and demonstrate the flexibility of the layered decomposition with applications in image compression and video special effects.

A general block diagram of our algorithm is shown below. The algorithm consists of motion estimation, motion segmentation, and temporal integration.

Block diagram of layer image decomposition

A detailed block diagram of our layered image decomposition alogorithm is shown below. The alogrithm consists of robust iterative framework which identifies the likely affine motion models that exist in the image sequence. Physical constraints are imposed in the loop to improve stability and sensible solution.

Block diagram of segmentation algorithm

Thirty frames of the MPEG Flower Garden sequence (1 sec) is processed by an optic flow estimator to obtain a motion vector of each pixel. Affine motions parameters are estimated from the optic flow data by the model estimator for each subregion. We initialize these subregions with an array of 20x20 pixel blocks. Similar models are merged by the model merger to improve stability. Thus the model estimator and the model merger cooperatively determines the set of likely affine motions present in the optic flow maps.

Affine motion segmentation results from applying these motion models in a classification framework on the motion map. Additional constrants on physical connectivity and region size are enforced by the region splitter and region filter. Results of segmentation are iteratively refined.

Once the affine motions and the corresponding regions are identified, data are collected from all the frames in the sequence and layer components are obtained. For example, when the estimated affine models accurately describe the motions of the coherent regions, these regions can be "tracked" by motion compenstation. The stability of various regions after motion compenstation verify that the affine motion parameters have been correctly estimated for these regions.

The motion compensated sequences help us determine the intensity and color textures of the "tracked" regions. Assisted by the segmentation maps, stabilized regions are processed with a temporal median filter to recover the image intensity maps of "tracked" regions. Because these layer intensity maps are obtained by processing data in all the frames, occluded regions in the sequence can be recovered and composited in these layer maps. Likewise, the accumulation of data result in image mosaics. Furthermore, this sort of motion compenstated temporal processing can produce images maps that are higher in resolution and and lower in noise than images of any one frame.

Finally, the depth ordering of these image maps are determined by a verification stage. The resulting representation is shown below.

Layered Image Representation

Applications:

The layered image representation provides a compact representation of image sequence. The layered decomposition captures spatial coherence of object motion and temporal coherence of object shape and texture in a few layers. Because of the efficiency at which the layers encode the sequence, we can obtained a 300 to 1 data reduction with minimal artifacts. A layered description where each layer represents a coherent moving objects provies a more semantic representation of sequences and result in a richer mid-level visual language for sequences. The layered visual language supports coherent moving objects, surfaces, object opacity, occlusions, oridinal depths, image mosaics, and object tracking. These properties make the layered representation attractive for video databases and applications involving retrieval by content of compressed video data. Furthermore, sequences can be easily synthesize from the layers with standard computer graphics techniques.

Layers facilitate video editing and video manipulation because they are similar to elements used computer graphics representation. For special effects, objects can be easily modified and propagated to the entire sequence. Graphical elements can be added. For video editing, sequences can be synthesized with a subset of layers to remove unwanted objects in the sequences.

References:

(PDF copies available for download)

J. Y. A Wang and E. H. Adelson. Representing Moving Images with Layers. The IEEE Transactions on Image Processing Special Issue: Image Sequence Compression, 3(5):625-638, September 1994.

J. Y. A. Wang and E. H. Adelson. Spatio-Temporal Segmentation of Video Data Proceedings of SPIE on Image and Video Processing II, Vol. 2182, pp. 120-131, San Jose, February 1994.

J. Y. A. Wang and E. H. Adelson. Apply Mid-level Vision Techniques for Video Data Compression and Manipulation. Proceedings of SPIE on Digital Video Compression on Personal Computers: Algorithms and Technologies, Vol. 2187, pp. 116-127, San Jose, February 1994.

J. Y. A. Wang and E. H. Adelson. Layered Representation for Motion Analysis. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1993, pp. 361-366, New York, June 1993.

J. Y. A. Wang and E. H. Adelson. Layered Representation for Image Sequence Coding. Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 5, pp. 221-224, Minneapolis, April 1993.

Patent

Wang and Adelson, System for encoding image data intyo multiple layers representing regions of coherent motion and associated motion parameters. US Patent #05557684 (1996). [Abstract], [Patent].

Quick Index

Last modified: Thu Jul 24 22:17:05 1997