|
Type of Document Dissertation Author li, Xiaoyun URN etd-02162011-101818 Title Analysis of Multivariate Data with Random Cluster Size Degree Doctor of Philosophy Department Statistics, Department of Advisory Committee
Advisor Name Title Debajyoti Sinha Committee Chair Dan McGee Committee Member Stuart Lipsitz Committee Member Yi Zhou University Representative Keywords
- Clustered data
- Longitudinal data analysis
- Informative missing
- Categorical data anlaysis
- Logistic regression
- Bridge distribution
Date of Defense 2010-12-02 Availability unrestricted Abstract In this dissertation, we examine binary correlated data with present/absent componentor missing data that are related to binary responses of interest.
Depending on the data structure, correlated binary data can be referred as emph{clustered
data} if sampling unit is a cluster of subjects, or it can be referred as emph{longitudinal
data} when it involves repeated measurement of same subject over time. We propose our novel
models in these two data structures and illustrate the model with real data applications.
In biomedical studies involving clustered binary responses, the
cluster size can vary because some components of the cluster can be absent.
When both the presence of a cluster component as well as the binary disease status of a present
component are treated as responses of interest, we propose a novel
two-stage random effects logistic regression framework. For the ease
of interpretation of regression effects, both the marginal
probability of presence/absence of a component as well as the
conditional probability of disease status of a present component,
preserve the approximate logistic regression forms. We present a
maximum likelihood method of estimation implementable using standard
statistical software. We compare our models and the physical
interpretation of regression effects with competing methods from
literature. We also present a simulation study to assess the
robustness of our procedure to wrong specification of the random
effects distribution and to compare finite sample performances of
estimates with existing methods. The methodology is illustrated via
analyzing a study of the periodontal health status in a diabetic
Gullah population.
We extend this model in longitudinal studies with binary longitudinal response
and informative missing data. In longitudinal studies, when treating each subject as a cluster, cluster size is
the total number of observations for each subject.
When data is informatively missing, cluster size of each subject can vary and is related to the binary
response of interest and we are also interested in the missing mechanism. This is a modified
situation of the cluster binary data with present components. We modify and adopt our proposed
two-stage random effects logistic regression model so that both the marginal probability
of binary response and missing indicator as well as the conditional probability of binary response
and missing indicator preserve logistic regression forms. We present a Bayesian framework of this model
and illustrate our proposed model on an AIDS data example.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Li_X_Dissertation_2011.pdf 616.03 Kb 00:02:51 00:01:28 00:01:17 00:00:38 00:00:03