FSU ETD Logo

Title page for ETD etd-02012011-193639


Type of Document Dissertation
Author Lakner, Clemens
URN etd-02012011-193639
Title Phylogenetic models as seen from protein space
Degree Doctor of Philosophy
Department Biological Science, Department of
Advisory Committee
Advisor Name Title
Gavin J. P. Naylor Committee Chair
Darin R. Rokyta Committee Member
Fredrik Ronquist Committee Member
Mark T. Holder Committee Member
Peter Beerli Committee Member
Timothy A. Cross University Representative
Keywords
  • Phylogenetics
  • Bayesian Inference
  • Molecular Systematics
  • Evolutionary Biology
Date of Defense 2010-12-08
Availability unrestricted
Abstract
This is a dissertation on protein evolution and the probabilistic models used to study it. The interaction of data and model during statistical phylogenetic inference is explored, and the biological realism of simple amino acid models is evaluated.

Following a general introduction in chapter one, a method for developing informative prior distributions for Bayesian phylogenetic analysis is provided in chapter two. Dirichlet distributions are fitted to a posterior sample of equilibrium frequencies and substitution rates from an analysis of a reference data set under a general time-reversible model. These distributions can be used as informative priors in subsequent analyses. The approach is demonstrated for amino-acid sequences of mammalian mitochondrial genes, and the effects on subsequent analyses are evaluated for small data sets. In these situations the prior is expected to have the most impact on the analysis. If the data is in agreement with the prior, the execution time of the analysis can be significantly shortened, compared to instances where a non-informative prior is used. On the other hand, if the data set is in conflict with the prior, the effects on the posterior may be such that the analysis is slowed down dramatically. With large amounts of data, the influence of the prior is predicted to be negligible.

The ramifications of failing to incorporate structural constraints into models of protein evolution are discussed in chapter three. Most phylogenetic models ignore interactions between sites, allowing for the likelihood to be calculated as the product of the individual site-likelihoods. The likelihood constitutes the integral of the probability densities of all transition paths that are consistent with the observed data. The extent to which substitution histories that are incompatible with a protein's three-dimensional structure contribute to the likelihood is investigated. Unconstrained simulations that were started from a real sequence quickly result in sequences that are incompatible with the protein structure. Thus, simple models are unable to capture the constraints on sequence evolution. However, when substitution histories are sampled between real sequences from the posterior probability distribution according to the same models, the sampled histories are largely consistent with the structure. Therefore, simple empirical substitution models may be adequate for interpolating changes between observed sequences during phylogenetic inference. The significance of this study lies in the insight it provides into the fit between data and models, and how this knowledge can be used to improve the models. Moreover, it provides a quantitative assessment of the biological realism of substitution models from the perspective of protein structure.

A specific example of likelihood inference of ancestral states is provided in chapter four. Joint and marginal reconstruction is used in an attempt to pinpoint a functional change at the base of the eukaryotic lineage during the evolution of inosine monophosphate dehydrogenase (IMPDH). The results of the phylogenetic analysis are consistent with molecular dynamics simulations and laboratory experiments performed by co-authors of the study. According to these simulations different forms of IMPDH utilize one of two different pathways for water activation. One of these pathways is likely to have been lost at the base of the eukaryotic lineage. The phylogenetic analysis of the study is described here.

Chapter five discusses probabilistic models of evolution from the perspective of protein space. Models and their assumptions about evolutionary constraints are reviewed from a conceptual perspective.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Lakner_C_Dissertation_2011.pdf 18.12 Mb 01:23:52 00:43:08 00:37:44 00:18:52 00:01:36

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact the FSU Digital Library Center.