Type of Document Dissertation Author Yao, Bin Author's Email Address firstname.lastname@example.org URN etd-07112011-215416 Title Spatial Approximate String Search Degree Doctor of Philosophy Department Computer Science, Department of Advisory Committee
Advisor Name Title Feifei Li Committee Chair Ashok Srinivasan Committee Member Piyush Kumar Committee Member Washington Mio University Representative Keywords
- Selectivity Estimation
Date of Defense 2011-06-24 Availability unrestricted AbstractMany applications are featured with both text and location information, which leads to a novel type of search: spatial approximate string search (Sas). The Sas is gaining attention from the database community only recently. A large variety of problems remain open. Spatial and text data have been independently studied for decades. Spatial data have many unique features that are drastically different from text data. As a result, most of the existing techniques for string processing are either inapplicable or inefficient when adapted to spatial databases. We have investigated four issues in the general area of Sas. They are: (i) Spatial approximate string search in Euclidean space (Esas); (ii) Spatial approximate string search on road networks (Rsas); (iii) Selectivity Estimation for Esas Range Queries; (iv) Multi-Approximate-Keyword Routing query on road networks.
For efficiently answering spatial approximate string queries in Euclidean space, we propose a novel index structure, MhR-tree, which is based on the R-tree augmented with the min-wise signature and the linear hashing technique. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the subtree of u. We analyze the pruning functionality of such signatures based on set resemblance between the query string and the q-grams from the sub-trees of index nodes.
For the Rsas, we propose a novel method, RsasSol, which is superior to the base line algorithm in practice. The RsasSol is a combination technique of inverted list and reference points. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our methods.
We also discuss how to estimate range query selectivity in Euclidean space since the query selectivity estimation is always important for the query optimization. Based on the spatial uniformity principle and the minimum number of neighborhoods principle for strings, we define a partitioning metric to facilitate the construction of buckets. We present a novel adaptive algorithm for finding balanced partitions using both the spatial and string information stored in the R-tree, which works well in practice.
Lastly, we go further beyond the basic kNN and range queries for Sas and consider an interesting application of Sas on the road network: find the shortest path with keywords constraints on the road network. We propose both exact and approximate solutions for this issue. For small set of query keywords (m ≤ 6), our exact solutions can quickly find the top-k shortest paths. For large set of query keywords, our approximate solutions can still return the answer on real time with good approximate quality.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Yao_B_Dissertation_2011.pdf 1.40 Mb 00:06:27 00:03:19 00:02:54 00:01:27 00:00:07