Type of Document Dissertation Author Donate, Arturo Author's Email Address firstname.lastname@example.org URN etd-08022011-184912 Title Three-Dimensional Scene Estimation From Monocular Videos with Applications in Video Analysis Degree Doctor of Philosophy Department Computer Science, Department of Advisory Committee
Advisor Name Title Xiuwen Liu Committee Chair Gary Tyson Committee Member Piyush Kumar Committee Member Victor Patrangenaru University Representative Keywords
- structure from motion
- video analysis
- shot boundary detection
- activity recognition
- image processing
Date of Defense 2011-06-24 Availability unrestricted AbstractThis research explores the idea of extracting three-dimensional features from video clips, in order to aid various video analysis and mining tasks. Although video analysis problems are well-established in the literature, the use of three-dimensional information in is scarce due to the inherent difficulties of building such a system. When the only input to the system is a video stream with no previous knowledge of the scene or camera (a typical scenario in video analysis), extracting meaningful and accurate 3D representations becomes a very difficult task.
However, several recently proposed methods have shown some progress in working towards this goal by applying techniques from various other topics including simultaneous localization and mapping, structure from motion, and 3D reconstruction. In the research presented here, I present two main contributions towards solving this problem. First, I propose a method capable of generating a three-dimensional representation of a scene as observed by a monocular video, using no previous information. The method exploits the movement of the camera while robustly tracking features over time in order to obtain multiple views of a scene and perform 3D reconstruction. This system performs automatic camera calibration, estimates the three-dimensional structure of the scene, and tracks the scene across time while refining its results as new frames are obtained. Additionally, the system can track a scene even under the presence of moving people, a limitation of most SLAM and SFM approaches available in the literature. Secondly, I present a method for extracting the three-dimensional pose and motion of a person in a video. The method extends previously published work related to two-dimensional human pose estimation by incorporating a human motion model and expands the two-dimensional pose onto three dimensions using several heuristics. Together, these methods yield an intrinsic 3D representation of the static background and the people in a scene which can be used to solve various video analysis tasks.
To prove the feasibility of my proposed method, I show how it can be used to solve a selection of video analysis tasks. First, I show how a three-dimensional point cloud of the scene can be used along with robust feature tracking to detect shot- boundaries in the video. Next, I present an automatic approach to stereoscopic video conversion using no prior knowledge of the input video. Finally, I illustrate how a three-dimensional human model can be incorporated with simple linear classifiers to perform human action recognition with high classification results.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Donate_A_Dissertation_2011.pdf 29.30 Mb 02:15:38 01:09:45 01:01:02 00:30:31 00:02:36