You're seeing a degraded version of this site because the site's layout requires JavaScript. Please activate JavaScript and reload this page.

How It Works

By Andrew Mihal, 17 October 2004

Note: This page refers to the older 2.X releases of Enblend. The new 3.X versions feature a new seam line optimization algorithm that tries to automatically avoid placing the seam in areas where the input images mismatch. This reduces the chance of having ghosts and cut-off people in the final output. This feature will be documented in a future update to this page.

Enblend is a tool for compositing images. Given a set of images that overlap in some irregular way, Enblend overlays them in such a way that the seam between the images is invisible, or at least very difficult to see. Enblend does not line up the images for you. Use a tool like Hugin to do that.

Enblend uses a multiresolution spline to blend images together [1,2]. The basic idea is that different image features should be blended across a transition zone proportional in size to the spatial frequency of the features. Big, smooth objects like the sky and clouds have low spatial frequency and should be blended across a very wide region. Our eyes expect the sky to be very uniform in appearance, so any sudden color change will be very noticeable. So it is important to smooth out the difference over as large a zone as possible. On the other hand, areas of the image with high spatial frequency, such as trees and windowpanes, have sudden changes from light to dark. Our eyes expect to see color changes here, and if you try to blend over a wide area there is the possibility of noticeable ghosting. So high-frequency components are blended across a narrow transition zone. The separate treatment of different image components leads to better results than what I was ever able to do by hand in the Gimp.

Finding a Transition Line

The first step is to calculate a transition line between the images. This line will be used as a template for creating narrow blending masks (for high-frequency details) and wide blending masks (for low-frequency areas). Ideally, the transition line should be near the middle of the intersection region between the images. This way, there will be plenty of room on the left side of the line for the right image to fade out, and plenty of room on the right side for the left image to fade out.

Enblend uses an algorithm suggested in [4] based on the Nearest Feature Transform [3] to find the transition line. The algorithm finds a line which is as far away as possible from the edges of the area where two images intersect. Here is an example:

The red and green outlines show the alpha channel of the input images. The black area of the mask indicates where the left image will have priority over the right image. The white area of the mask indicates where the right image will have priority over the left image. To avoid spatial confusion, I will usually refer to the input images as the "black" image and the "white" image from now on. Now you know where the Enblend logo comes from.

You can see from the double image that Jerry moved in between these two photos. In the black image, he is half missing. By making some adjustments to the alpha channels of the input images, we can make sure Jerry appears whole in the output, and that the half-Jerry won't adversely affect the blending. I'll erase Jerry from the left image by making the alpha channel transparent in the affected areas. It's important to get the entire area where the images disagree. Here is the result:

You can see how Enblend re-routed the transition line to avoid the part of the image I cut out.

Creating the Laplacian Pyramids

Next, Enblend makes three pyramids from the black image, the white image, and the blend mask. The black and white images are turned into Laplacian pyramids. A Laplacian pyramid breaks up an image into components based on spatial frequency. The top level of the pyramid will contain just the highest spatial frequency components - the edgiest of the edges. The bottom level will contain the lowest spatial frequency components - smooth areas like the sky. The intermediate levels contain features gradually decreasing in spatial frequency from high to low.

A Laplacian pyramid is made by repeatedly applying a high-pass filter to the image. The high-pass filter picks out all of the high spatial frequency components of the image and passes everything else down to the next level. The image that gets passed down actually contains less information (because the edges have been removed) so we can downsample it. This reduces the size of the next level by half in each dimension. This shrinking is what gives the pyramid its pyramidal shape.

At the next level, the filter picks out the next-highest spatial frequency components, and so on. After we have created the number of levels we want, the bottom level is left with only the lowest spatial frequency components.

  Black Laplacian Pyramid White Laplacian Pyramid
Level 0
Level 1
Level 2
Level 3
Level 4

These scaled-down images don't look like much. Here is a bigger version of the black pyramid level zero. This should make it clear that the biggest level of the Laplacian pyramid contains only the highest spatial frequency features in the image. I have enhanced the contrast of all of these images to bring out the detail.

The deepest levels of the pyramids have the smallest number of pixels, but they represent the biggest, smoothest features in the image. Consequently, each pixel in these bottom levels will influence a large number of pixels in the final result. To demonstrate this, I will reverse the Laplacian pyramid process. This is called "collapsing" the pyramid. This recombines all of the pyramid levels and gives you back the original image. Turning an image into a Laplacian pyramid and then collapsing it is a lossless transformation. However, to show how the bottom level contains the lowest spatial frequency components in the image, I will first zero out all of the pyramid levels except for the bottom one. This is equivalent to throwing away all but the lowest spatial frequency components of the image. When this is collapsed, we will see only the contribution of the bottom level. Here is what you get:

  Level 4 Collapsed
Black
Laplacian
Pyramid

White
Laplacian
Pyramid

Creating the Gaussian Pyramid

The key of the multiresolution spline technique is to blend image features across a transition zone proportional in size to the spatial frequency of the features. This is accomplished by blending the black Laplacian pyramid and the white Laplacian pyramid together, one level at a time. Each level will use a different blending mask. At the top level we want to use a sharp blend mask so that high-frequency details are blended over a narrow region. At the bottom level we can use a wide blend mask so that low-frequency details are blended over a large region.

These blend masks are constructed from the transition line template we calculated above by creating a Gaussian pyramid. This process is similar to making a Laplacian pyramid. Instead of high-pass filtering each level, we use a low-pass filter. At the top level we start with the sharp blend mask we get from the transition line template itself. The low-pass filter makes this transition line blurrier and blurrier as we go down the levels. This gives us the progression of blending zones that we want. We get a sharp blending zone at the top and a wide blending zone at the bottom.

Since the low-pass filter removes detail from the image, we still do the downsampling as before. Here is the result:

  Mask Gaussian Pyramid
Level 0
Level 1
Level 2
Level 3
Level 4

As before, the downsampling makes it hard to see that the blending mask is really getting wider. Here is the actual influence that the bottom mask exerts on the final result:

  Level 4 Collapsed
Mask
Gaussian
Pyramid

Blending the Pyramids

The next step is to blend the Laplacian pyramids together, one level at a time. Each level will use the corresponding blend mask from the mask Gaussian pyramid. Here is the result:

 
Result = Mask Gaussian Pyramid ( Black Laplacian Pyramid,

White Laplacian Pyramid)

Level 0
Level 1
Level 2
Level 3
Level 4

Collapsing the Result

The final step is to collapse the blended Laplacian pyramid. This is then pasted on top of the parts of the input images that were not involved in the blending. This gives us the output image:

References

[1] P. Burt and E. Adelson. "A Multiresolution Spline With Application to Image Mosaics". ACM Transactions on Graphics, Vol. 2, No. 4, October 1983. Pg. 217-236.
[2] P. Burt and E. Adelson. "The Laplacian Pyramid as a Compact Image Code". IEEE Transactions on Communications, April 1983.
[3] M. Alsuwaiyel and M. Gavrilova. "On the Distance Transform of Binary Images". International Conference on Imaging Science, Systems, and Technology. 2000.
[4] Y. Xiong and K. Turkowski. "Registration, Calibration, and Blending in Creating High Quality Panoramas". 4th IEEE Workshop on Applications of Computer Vision. October, 1998.
[5] J. Cychosz. "Efficient Binary Image Thinning using Neighborhood Maps". In Graphics Gems IV, P. Heckbert editor. Academic Press, 1994.