|
Type of Document Dissertation Author Kim, Jungkee URN etd-04142005-183110 Title Hybrid Keyword Search across Peer-to-Peer Federated Data Degree Doctor of Philosophy Department Computer Science, Department of Advisory Committee
Advisor Name Title Gregory Riccardi Committee Chair Geoffrey C. Fox Committee Co-Chair David Whalley Committee Member Gordon Erlebacher Committee Member Lawrence Dennis Committee Member Keywords
- Keyword Search
- Data Integration
- Peer-To-Peer
- Information Retrieval
Date of Defense 2005-04-06 Availability unrestricted Abstract The Internet provides a general communication environment for distributed resource sharing. XML has become a key technology for information representation and exchange on the Internet, increasing the opportunity for integration of the various data formats. The World Wide Web (WWW) is the example par excellence of a document-based distributed system on the Internet. As the size of the Web has increased, various problems with looking up a resource location on the Internet have emerged. Web search engines provide clues for resource location, but they have no semantic schema and often produce meaningless keyword search results. The Semantic Web suggests an alternative solution for the semantic problem on the Web. It provides multiple relation links with directed labeled graphs, and machines like Web crawlers can understand the relationship between different resources. But due to the need for sophisticated domain description and lack of unified definitions, many Web pages are not part of the Semantic Web. Meanwhile, recent public attention to peer-to-peer (P2P) networks has stimulated research on overlay P2P networks on top of the Internet. Those studies open possibilities for another form of distributed resource sharing on the Internet.
In this dissertation we describe the design of a hybrid search that combines metadata search with a traditional keyword search over unstructured context data. This hybrid search paradigm provides the inquirer additional options to narrow the search with some semantic aspects through the XML metadata query. We tackle the scalability limitations of a single-machine implementation by adopting a distributed architecture. This scalable hybrid search provides a total query result from the collection of individual inquiries against independent data fragments distributed in a computer cluster. We demonstrate our architecture extends the scalability of a native XML query limited in a single machine and improves the performance of queries. Finally we generalize our hybrid architecture to more scalable searches over a P2P overlay network. This generalization may give an intermediate search paradigm on the Internet---providing semantic value through XML metadata that are simpler than those of the Semantic Web.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access JKimFEDis.pdf 959.19 Kb 00:04:26 00:02:17 00:01:59 00:00:59 00:00:05