Layered Image Representation
This Demo is created by John Y. A. Wang
Abstract
We describe a system for representing moving images with sets of
overlapping layers. Each layer contains an intensity map that defines
the additive values of each pixel, along with an alpha map that serves
as a mask indicating the transparency. The layers are ordered in
depth and they occlude each other in accord with the rules of
compositing. Velocity maps define how the layers are to be warped
over time. The layered representation is more flexible than standard
image transforms and can capture many important properties of natural
image sequences. We describe some methods for decomposing image
sequences into layers using motion analysis, and we discuss how the
representation may be used for image coding and other applications.
Keywords:
Image coding, motion analysis, image segmentation, image
representation, robust estimation.
Description
We describe the decomposition of a sequence into its layered image
representation and demonstrate the flexibility of the layered
decomposition with applications in image compression and video special
effects.
A general block diagram of our algorithm is shown below. The algorithm
consists of motion estimation, motion segmentation, and temporal
integration.
Block diagram of layer image decomposition
A detailed block diagram of our layered image decomposition alogorithm
is shown below. The alogrithm consists of robust iterative framework
which identifies the likely affine motion models that exist in the
image sequence. Physical constraints are imposed in the loop to
improve stability and sensible solution.
Block diagram of segmentation algorithm
Thirty frames of the MPEG
Flower Garden sequence (1 sec) is processed by an optic flow
estimator to obtain a motion vector of each pixel. Affine motions
parameters are estimated from the optic flow data by the model
estimator for each subregion. We initialize these subregions with
an array of 20x20 pixel blocks. Similar models are merged by the
model merger to improve stability. Thus the model
estimator and the model merger cooperatively determines
the set of likely affine motions present in the optic flow maps.
Affine motion segmentation results from
applying these motion models in a classification framework on the
motion map. Additional constrants on physical connectivity and
region size are enforced by the region splitter and region
filter. Results of segmentation are iteratively refined.
Once the affine motions and the corresponding regions are identified,
data are collected from all the frames in the sequence and layer
components are obtained. For example, when the estimated affine models
accurately describe the motions of the coherent regions, these regions
can be "tracked" by motion
compenstation. The stability of various regions after motion
compenstation verify that the affine motion parameters have been
correctly estimated for these regions.
The motion compensated sequences help us determine the intensity and
color textures of the "tracked" regions. Assisted by the segmentation
maps, stabilized regions are processed with a temporal median filter
to recover the image intensity maps of
"tracked" regions. Because these layer
intensity maps are obtained by processing data in all the frames,
occluded regions in the sequence can be recovered and composited in
these layer maps. Likewise, the accumulation of data result in image
mosaics. Furthermore, this sort of motion compenstated temporal
processing can produce images maps that are higher in resolution and
and lower in noise than images of any one frame.
Finally, the depth ordering of these image maps are determined by a
verification stage. The resulting representation is shown below.
Layered Image Representation
Applications:
The layered image representation provides a compact representation of
image sequence. The layered decomposition captures spatial coherence
of object motion and temporal coherence of object shape and texture in
a few layers. Because of the efficiency at which the layers encode
the sequence, we can obtained a 300 to 1 data reduction with minimal
artifacts. A layered description where each layer represents a
coherent moving objects provies a more semantic representation of
sequences and result in a richer mid-level visual language for
sequences. The layered visual language supports coherent moving
objects, surfaces, object opacity, occlusions, oridinal depths, image
mosaics, and object tracking. These properties make the layered
representation attractive for video databases and applications
involving retrieval by content of compressed video data. Furthermore,
sequences can be easily synthesize from
the layers with standard computer graphics techniques.
Layers facilitate video editing and video manipulation because they
are similar to elements used computer graphics representation. For
special effects, objects can be easily modified and propagated to the
entire sequence. Graphical elements can be added. For video editing,
sequences can be synthesized with a subset
of layers to remove unwanted objects in the sequences.
References:
(PDF
copies available for download)
- J. Y. A Wang and E. H. Adelson.
Representing Moving Images with Layers.
The IEEE Transactions on Image Processing Special
Issue: Image Sequence Compression,
3(5):625-638, September 1994.
- J. Y. A. Wang and E. H. Adelson.
Spatio-Temporal Segmentation of Video Data
Proceedings of SPIE on Image and Video Processing II,
Vol. 2182, pp. 120-131, San Jose, February 1994.
- J. Y. A. Wang and E. H. Adelson.
Apply Mid-level Vision Techniques for Video Data Compression
and Manipulation.
Proceedings of SPIE on Digital Video Compression on Personal
Computers: Algorithms and Technologies,
Vol. 2187, pp. 116-127, San Jose, February 1994.
- J. Y. A. Wang and E. H. Adelson.
Layered Representation for Motion Analysis.
Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition 1993,
pp. 361-366, New York, June 1993.
- J. Y. A. Wang and E. H. Adelson.
Layered Representation for Image Sequence Coding.
Proceedings of the 1993 IEEE International Conference on
Acoustics, Speech, and Signal Processing,
Vol. 5, pp. 221-224, Minneapolis, April 1993.
Patent
Wang and Adelson, System for encoding image data intyo multiple
layers representing regions of coherent motion and associated motion
parameters. US Patent #05557684 (1996).
[Abstract],
[Patent].
Quick Index
Abstract |
Description |
Original Frames |
Motion Segmentation |
Motion Compensation |
Three Primary Layers |
Layered Representation |
Synthesized Image Sequence |
Synthesized without Tree|
References
-
-
MIT BCS Perceptual Science Group.
Demo by John Y. A. Wang.
<jyawang@psyche.mit.edu>
Copyright 1995. All rights reserved.
Last modified: Thu Jul 24 22:17:05 1997