Depth Map Generation for 2D to 3D Video Conversion

Depth Map Generation for 2D to 3D Video
Jagadeesh B .K1 AP Dept of E&C, HKBKCE, Bangalore jbkanade_99@yahoo.com Ravindra A2 AP Dept of E&C, Sir M.V.I.T, Bangalore rva2000@gmail.com
Abstract: The main purpose of the 2D-to-3D video

conversion is to generate the second view video based on the content of the 2D video.3D video is an illusion effect of the human eye due to the difference in the perception of the left and right eye images which leads to stereo effect. The purpose of this work is to generate low quality of stereoscopic video from the monoscopic video. The steps involved is depth map generation and video(left and right) generation based on depth image based rendering(DIBR) In this work using source video estimate the depth map based on motion vectors and segmentation map based on colors. Later fusion with depth and segmentation map is a step to get the final depth map. Filtering is applied to final depth map to avoid the clipping effect in the next step, constructing the stereoscopic video based on the DIBR Keywords 2D, 3D, MV, DIBR
Fig1 (a) Depth map Generation
Depth Map
The depth map is a grey scale image that grey level 255 indicates the minimum depth value and 0 indicates the nearest distance of the object, while the grey level 0 specifies the max depth value. The depth map translates the real depth value into the corresponding level or relative depth value shift-sense model. Depth value for the background usually assumes to be 0 since the background is the farthest point in the scene and through the transformation, the value should be 0.
Introduction
With the development of stereoscopic technology, the stereoscopic video became more and more popular in the daily life since they offered a more realistic sense to viewer than the 2D video. The maturity of the 3D technology enabled a good effect of the stereoscopic sense which lead to the 3D movie obtained a better benefit than the 2D in the movie industry. And now in European, more and more researchers focus on the 3D TV development. The main objectives of this work is to generate depth map from the monoscopic video based on motion estimation and motion cues. For the depth map generation, it is to develop a depth-from-motion estimation and segmentation from color method. Difference The process of 2D to 3D video converting includes two parts. One is the depth map estimation and the other is left and right eye videos generation from the source video and its related Fig2 .Depth map depth map based on Depth Image based Rendering (DIBR). In Motion Estimation and Compensation this work we have done depth map estimation .DIBR will be taken as the continuation of the current work.
Fig3.Motion Estimation and Compensation
Motion estimation (ME) creates a model of the current frame based on available data in one or more previously encoded frames known as reference frames. These reference frames may be past frames (i.e. earlier than the current frame in temporal order) or future frames (i.e. later in temporal order). The design goals for a ME algorithm are to model the current frame as accurately as possible whilst maintaining acceptable computational complexity. In Figure 2.1, the ME module creates a model by modifying one or more reference frames to match the current frame as closely as possible according to a matching criterion. Motion compensation describes a picture in terms of where each section of that picture came from, in a previous picture. This is often employed in video compression. It can be also used for the interlacing. A video sequence consists of a number of pictures - usually called frames. Subsequent frames are very similar, thus, containing a lot of redundancy. Removing this redundancy helps achieve the goal of better compression ratios. A first approach would be to simply subtract a reference frame from a given frame. The difference is then called residual and usually contains less energy (or information) than the original frame. The residual can be encoded at a lower bit-rate with the same quality. The decoder can reconstruct the original frame by adding the reference frame again. In this work we have used block motion method(EBMA)
Fusion is a key step for final depth map generation since information for the depth map generation is not enough if only perform the block motion estimation or color segmentation. With fusion, combining different data from different assumption can bring more accurate information for the depth estimation. As the result shows, after the block motion estimation, there should be some part of the objects have a great different depth value compared to their neighbour and it will cause the clipping effect at the next step, DIBR processing. The fusion between the motion depth map and color segmentation can eliminate the incorrect motion estimation effect. In the current research area, seldom research experts indicate the way for the depth map fusion between the block motion depth map and color estimation. However, I believe this process of fusion will be a breakthrough for the depth map generation. Investigation of the fusion steps are done by this project. Since the segmentation map provide the information about different part of the objects. In most of the situation, the area of different part of the objects should have the same depth value or similar depth value. Based on this assumption, the main problem becomes how to decide the depth value.
Depth map filtering

This is the final step for the depth map generation. The filtering here performance a important role for further 3D video reconstruction based on DIBR. For a depth map own a sharp edge for the objects, it causes a large data loss in the process of 3D image warping. For increasing the significant clipping effect, all the depth value greater than 0 is change to 255, Figures illustrate output video for left eye both without filtering and with filtering. The gap without filtering (black) is much larger than that of with filtering. Due to the lack of a great deal of information, the further hole filling process leads a larger data distortion and cannot fill up with the hole during the hole filling step. Compared with the filtering image, the output image only has a small part of data loss. With the linear interpolation of hole filling, the resulting 3D video is much better than that without filtering. At the other hand, another key problem of filtering is that it can reduce the noise caused by the incorrect motion estimation
Color Segmentation
Colour Segmentation includes two processes, color quantization and segmentation. Colour quantization aims at reducing the number of color and find domain colors in the image. The segmentation map provides the objects information that is the motion depth map lack of. The quantization image provides information of distribution of M colors. But this is still what not need for this step. However, the distribution of the same colors does not indicate difference part of the objects. It only gets a roughly idea of the distribution of the same colors. Based on this distribution, separated the continue area of the same color in the distribution map can find out each area of the objects with the similar color and achieved the segmentation. In this work author has used the basic segmentation algorithm. The algorithm as follows While (Value for the segmentation map) { Index = Get next index point from the Segmentation map. Find the continue area begin with the index in Qmap with the value equal to Qmap [index] Assign segmentation value to Smap } Where Qmap is the quantization map and Smap is the segmentation map.
Proposed method
Fig 2
Depth Fusion
Figure 4: Design of psychoacoustic model with wavelet Decomposition The signal spectrum will be obtained by applying wavelet decomposition and whose connections are selected in such a way that sub bands correspond to the best possible one to the critical bands. The implementation steps as follows, Step 1: Framing Step 2: Wavelet Decomposition Step 3: Local maxima Step 4: Elimination of frequency components masked by Step 5: Localization of tonal and non tonal components Step 6: Individual masking threshold Step 8: Filtering Filtering is the last step in generation of depth map. This step is used only to enhance the quality of depth map. This nonlinear filter, when applied reduces noise considerably giving a better image. Also, this median filter is more effective than convolution when the goal is to simultaneously reduce noise and preserve the edge.
Figure7. Anchor image
Figure8. Motion Field
Experimental Results
In this work we have implemented Psychoacoustic model-1 based on the standard MPEG-1 audio with wavelet decomposition instead of FFT. Figures7 to figure10 captures the results of various stages of the algorithm. Figure 11 shows the plot of minimum masking threshold for the various subands this plot is the result of Psychoacoustic implementation using wavelet decomposition. Figure12 shows the implementation using FFT.From the figure11 and Figure12 it is evident that minimum masking threshold that is resulted from wavelet decompostion is better estimate for subbands compare to FFT method.
Figure9.
Prediction error
Figure10. Quantized Image
Figure11Segmented Image(ForeGround)
Figure13. Fused Image Conclusion and Future work: We proposed, in this project, a novel method to obtain depth map from video input. The result shown is based on the motion parallax is possible to get the depth information for source video with static background. Depth from the block motion estimation results a clipping effect for the stereo video reconstruction and cause the unstable depth value for the same part of the objects. Color segmentation is to eliminate the clipping effect by providing the valuable object information that depth-map from motion that it lacks. The fusion process makes the depth map smooth and avoids the clipping effect. We have used many methods from many sources and integrated it so that we can obtain best results in terms of PSNR. For example, we have used this method of EBMA for motion estimation. But if we use a diamond search pattern in our algorithm and produce the code for that, I suppose, not a very degrading, but a good result will be obtained in very very small amount of time. However, generation of depth map is not the final step to obtaining a stereoscopic video. The next step is the process of DIBR (depth image based rendering). References: [1] M.N.M. Van Lieshout, Depth Map Calculation for a Variable Number of Moving Objects Using Markov Sequential Object Processes, IEEE Transactions on Pattern Analysis and Ysis and Machine Intelligence, vol.30, no.7, pp.1308-1312, July 2008. [2] C.Fehn, 3D-TV Using Depth-Image-Based Rendering (DIBR), Fraunhofer Institute for Telecommunications and Heinrich-Hertz Institut. Germany, 10587 Berlin: Einsteinufer37
Figure12. Segmented Image(Back ground)
[3]ISO/IEC JTC1/SC29/WG11, Survey of algorithms used for multi-view video coding (MVC), Doc. N6909, Jan. 2005 [4]Z. Wang, A. C. Bovik, H. R. Sheik, and E. P. Simon celli, Image quality assessment: From error visibility to structural similarity, Proc. of IEEE Transaction on Image Processing vol. 13, no. 4, pp. 600612, 2004.
[5]F. Xiao, DCT-based video quality evaluation, Final Project for EE392J Stanford Univ. 2000. . [6] M. Bosi and R. E. Goldberg, Introduction to Digital Audio Coding and Standards, Kluwer Academic Publishers, New York, NY, USA, 2003. [7]A. Telea, An image inpainting technique based on the fast marching method, Proc. of Journal of Graphics Tools, vol.9, no.1, pp.2536, 2004 [8] [4] D. Kim, D.Min and K.Sohn, A Stereoscopic Viedo Generation Method Using Stereoscopic Display Characterization and Motion Analysis, IEEE Transactions On Broadcasting, vol.54, no.2, pp.188-197, June 2008. [9] Yin Zhao, et al., Perceptual measurement for evaluating quality of view synthesis, MPEG Doc. M16407, April 2009, Maui, USA.
Jagadeesh B.K holds the Master degree from VTU, Karnataka, and currently pursuing PhD at PRIST University Thanjavur, India. Earlier worked at several software industries in Audio, Video codecs and Audio post processing algorithms.Curreently working as Assistant professor in HKBK College of engineering, Bangalore Karnataka India. Ravindra A holds PhD the Master degree from pune university Maharashtra, Currently pursuing PhD at Vellore university Tamilnadu India, and currently working as AP at Sir MVIT Bangalore PSNR of predicted video (dB): 42.63 PSNR of predicted video (dB): 38.49 PSNR of predicted video (dB): 38.32

Depth Map Generation for 2D to 3D Video Conversion

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Depth Map Generation for 2D to 3D Video Conversion

Uploaded by

Copyright:

Available Formats

Depth Map Generation for 2D to 3D Video

Abstract: The main purpose of the 2D-to-3D video

Fig1 (a) Depth map Generation

Fig3.Motion Estimation and Compensation

Depth map filtering

Figure7. Anchor image

Figure8. Motion Field

Figure10. Quantized Image

Figure12. Segmented Image(Back ground)

You might also like