FSU ETD Logo

Title page for ETD etd-08022011-184912


Type of Document Dissertation
Author Donate, Arturo
Author's Email Address ad06h@my.fsu.edu
URN etd-08022011-184912
Title Three-Dimensional Scene Estimation From Monocular Videos with Applications in Video Analysis
Degree Doctor of Philosophy
Department Computer Science, Department of
Advisory Committee
Advisor Name Title
Xiuwen Liu Committee Chair
Gary Tyson Committee Member
Piyush Kumar Committee Member
Victor Patrangenaru University Representative
Keywords
  • 3D
  • structure from motion
  • video analysis
  • shot boundary detection
  • stereoscopic
  • activity recognition
  • image processing
Date of Defense 2011-06-24
Availability unrestricted
Abstract
This research explores the idea of extracting three-dimensional features from video clips, in order to aid various video analysis and mining tasks. Although video analysis problems are well-established in the literature, the use of three-dimensional information in is scarce due to the inherent difficulties of building such a system. When the only input to the system is a video stream with no previous knowledge of the scene or camera (a typical scenario in video analysis), extracting meaningful and accurate 3D representations becomes a very difficult task.

However, several recently proposed methods have shown some progress in working towards this goal by applying techniques from various other topics including simultaneous localization and mapping, structure from motion, and 3D reconstruction. In the research presented here, I present two main contributions towards solving this problem. First, I propose a method capable of generating a three-dimensional representation of a scene as observed by a monocular video, using no previous information. The method exploits the movement of the camera while robustly tracking features over time in order to obtain multiple views of a scene and perform 3D reconstruction. This system performs automatic camera calibration, estimates the three-dimensional structure of the scene, and tracks the scene across time while refining its results as new frames are obtained. Additionally, the system can track a scene even under the presence of moving people, a limitation of most SLAM and SFM approaches available in the literature. Secondly, I present a method for extracting the three-dimensional pose and motion of a person in a video. The method extends previously published work related to two-dimensional human pose estimation by incorporating a human motion model and expands the two-dimensional pose onto three dimensions using several heuristics. Together, these methods yield an intrinsic 3D representation of the static background and the people in a scene which can be used to solve various video analysis tasks.

To prove the feasibility of my proposed method, I show how it can be used to solve a selection of video analysis tasks. First, I show how a three-dimensional point cloud of the scene can be used along with robust feature tracking to detect shot- boundaries in the video. Next, I present an automatic approach to stereoscopic video conversion using no prior knowledge of the input video. Finally, I illustrate how a three-dimensional human model can be incorporated with simple linear classifiers to perform human action recognition with high classification results.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Donate_A_Dissertation_2011.pdf 29.30 Mb 02:15:38 01:09:45 01:01:02 00:30:31 00:02:36

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact the FSU Digital Library Center.