Type of Document Dissertation Author Huang, Hong URN etd-06192010-060533 Title Perception of Quality in Genome Annotation Work Degree Doctor of Philosophy Department Accounting, Department of Advisory Committee
Advisor Name Title Corinne Jörgensen Committee Chair Besiki Stvilia Committee Member Paul Marty Committee Member Hank Bass University Representative Keywords
- Biomedical Informatics
Date of Defense 2010-04-19 Availability unrestricted AbstractThe rapid accumulation of genome annotation data, as well as their widespread re-use in clinical and scientific practice, poses new challenges to scientific data quality management. In particular, there is a lack of understanding the need for quality in genome annotation and also of the requirements of end users and intermediaries like the data curator, which makes it difficult to systematize effective methods and approaches to quality assessment and the management of genome annotation data. This study closes the above-mentioned gap by identifying perceptions of quality, as well as quality skill requirements, in genome annotation processes.
The study was guided by Activity Theory and Scenario Based Task Analysis; and the survey method was used to collect data. Two groups of stakeholders (end users and data curators) were identified based on hypothesized differences in their quality needs for genome annotations. The study used an earlier developed general framework of information quality assessment, and a taxonomy of data quality skills, to develop survey questions. In addition, to contextualize the questions and motivate subjects to provide answers, the survey’s questionnaire included two genome annotation scenarios, each consisting of a series of genome annotation actions. When considering these scenarios, subjects were asked to rank the importance of both data quality dimensions and related data quality skills. The survey data collected from 158 subjects were further explored by factor analysis, in order to identify the quality concepts or criteria that stakeholders considered important in the genome annotation process.
Seventeen data quality dimensions were reduced to five-factor constructs; and seventeen data quality skills were aggregated into four-factor constructs. End users and data curators prioritized data quality aspects differently. The end users cared more about whether annotation records came from reliable sources (Believability), and whether information was the most current (Up-to-date); while the data curators focused more on the displays, formats and usefulness of the annotation products (Consistency and Interpretability). Both groups believed the Accessibility and Accuracy are the most important data quality dimensions in genome annotation work, and the Security issue was trivial since genome annotation environments were highly publicly sharable. Both groups thought that data quality error detection skills, as well as data quality literacy skills, were essential and necessary for improving data quality work in genome annotation.
Analysis of annotation user survey results revealed that users have specific sets of “virtues” or criteria in the genome annotation context. According to all respondents, differences are interpreted to indicate that users with different roles value diverse data quality dimensions and skills differently. The end users, as data consumers, indirectly assess the data quality of annotation records by relying on source credibility. The curators, as data intermediaries and/or producers, directly assess data quality and virtue in genome annotation. Since users ultimately decide the usefulness and value of annotation information, a tighter collaboration among the users is required to incorporate their data quality needs into annotation process management. Subject matter experts might not have sufficient knowledge of data quality. The highly open and dynamic annotation environment requires users to be adaptive for optimizing annotation strategies due to the limited resources. Identifying the data quality requirements of different stakeholders, and defining models for quality should help maximize the efficiency of resource management in this genome bioinformatics example, but also in other information systems that combine data, users, and contexts, which ultimately help systematize quality assurance activities.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Huang_H_Dissertation_2010.pdf 1.01 Mb 00:04:39 00:02:23 00:02:05 00:01:02 00:00:05