|
Type of Document Dissertation Author Lakner, Clemens URN etd-02012011-193639 Title Phylogenetic models as seen from protein space Degree Doctor of Philosophy Department Biological Science, Department of Advisory Committee
Advisor Name Title Gavin J. P. Naylor Committee Chair Darin R. Rokyta Committee Member Fredrik Ronquist Committee Member Mark T. Holder Committee Member Peter Beerli Committee Member Timothy A. Cross University Representative Keywords
- Phylogenetics
- Bayesian Inference
- Molecular Systematics
- Evolutionary Biology
Date of Defense 2010-12-08 Availability unrestricted Abstract This is a dissertation on protein evolution and the probabilistic models used to study it. The interaction of data and model during statistical phylogenetic inference is explored, and the biological realism of simple amino acid models is evaluated.
Following a general introduction in chapter one, a method for developing informative prior distributions for Bayesian phylogenetic analysis is provided in chapter two. Dirichlet distributions are fitted to a posterior sample of equilibrium frequencies and substitution rates from an analysis of a reference data set under a general time-reversible model. These distributions can be used as informative priors in subsequent analyses. The approach is demonstrated for amino-acid sequences of mammalian mitochondrial genes, and the effects on subsequent analyses are evaluated for small data sets. In these situations the prior is expected to have the most impact on the analysis. If the data is in agreement with the prior, the execution time of the analysis can be significantly shortened, compared to instances where a non-informative prior is used. On the other hand, if the data set is in conflict with the prior, the effects on the posterior may be such that the analysis is slowed down dramatically. With large amounts of data, the influence of the prior is predicted to be negligible.
The ramifications of failing to incorporate structural constraints into models of protein evolution are discussed in chapter three. Most phylogenetic models ignore interactions between sites, allowing for the likelihood to be calculated as the product of the individual site-likelihoods. The likelihood constitutes the integral of the probability densities of all transition paths that are consistent with the observed data. The extent to which substitution histories that are incompatible with a protein's three-dimensional structure contribute to the likelihood is investigated. Unconstrained simulations that were started from a real sequence quickly result in sequences that are incompatible with the protein structure. Thus, simple models are unable to capture the constraints on sequence evolution. However, when substitution histories are sampled between real sequences from the posterior probability distribution according to the same models, the sampled histories are largely consistent with the structure. Therefore, simple empirical substitution models may be adequate for interpolating changes between observed sequences during phylogenetic inference. The significance of this study lies in the insight it provides into the fit between data and models, and how this knowledge can be used to improve the models. Moreover, it provides a quantitative assessment of the biological realism of substitution models from the perspective of protein structure.
A specific example of likelihood inference of ancestral states is provided in chapter four. Joint and marginal reconstruction is used in an attempt to pinpoint a functional change at the base of the eukaryotic lineage during the evolution of inosine monophosphate dehydrogenase (IMPDH). The results of the phylogenetic analysis are consistent with molecular dynamics simulations and laboratory experiments performed by co-authors of the study. According to these simulations different forms of IMPDH utilize one of two different pathways for water activation. One of these pathways is likely to have been lost at the base of the eukaryotic lineage. The phylogenetic analysis of the study is described here.
Chapter five discusses probabilistic models of evolution from the perspective of protein space. Models and their assumptions about evolutionary constraints are reviewed from a conceptual perspective.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Lakner_C_Dissertation_2011.pdf 18.12 Mb 01:23:52 00:43:08 00:37:44 00:18:52 00:01:36