FSU ETD Logo

Title page for ETD etd-06292006-153249


Type of Document Dissertation
Author Sharma, Dinesh R.
URN etd-06292006-153249
Title Logistic Regression, Measures of Explained Variation and the Base Rate Problem
Degree Doctor of Philosophy
Department Statistics, Department of
Advisory Committee
Advisor Name Title
Daniel L. McGee, Sr. Committee Chair
Eric Chicken Committee Member
Myra Hurt Committee Member
Xu-Feng Niu Committee Member
Keywords
  • Logistic Regression
  • Explained Variation
  • Base Rate
  • Base Rate Problem
  • Coefficient of Determinant
  • R^2 Statistics
  • Latent Variable
Date of Defense 2006-06-26
Availability unrestricted
Abstract
One of the desirable properties of the coefficient of determinant (R^2 measure) is that its values for

different models should be comparable whether the models differ in one or more predictors, or in the dependent

variable, or whether the models are specified as being different for different subsets of a dataset. This allows

researchers to compare adequacy of models across subgroups of the population or models with different but

related dependent variables. However, the various analogs of the R^2 measure used for logistic regression

analysis are highly sensitive to the base rate (proportion of successes in the sample) and thus do not possess

this property. An R^2 measure sensitive to the base rate is not suitable to comparison for the same or

different model on different datasets, different subsets of a dataset or different but related dependent

variables. We evaluated 14 R^2 measures that have been suggested or might be useful to measure the explained

variation in the logistic regression models based on three criteria 1) intuitively reasonable interpretability;

2) numerical consistency with the Rho^2 of underlying model, and 3) the base rate sensitivity. We carried out

a Monte Carlo Simulation study to examine the numerical consistency and the base rate dependency of the various

R^2 measures for logistic regression analysis. We found all of the parametric R^2 measures to be

substantially sensitive to the base rate. The magnitude of the base rate sensitivity of these measures tends to

be further influenced by the rho^2 of the underlying model.

None of the measures considered in our study are found to perform equally well in all of the three evaluation

criteria used. While R^2_L stands out for its intuitively reasonable interpretability as a measures of

explained variation as well as its independence from the base rate, it appears to severely underestimate the

underlying rho^2. We found R^2_CS to be numerically most consistent with the underlying Rho^2, with

R^2_N its nearest competitor. In addition, the base rate sensitivity of these two measures appears to be very

close to that of the R^2_L, the most base rate invariant parametric R^2 measure. Therefore, we suggest to

use R^2_CS and R^2_N for logistic regression modeling, specially when it is reasonable to believe that a

underlying latent variable exists. However, when the latent variable does not exit, comparability with the

underlying rho^2 is not an issue and R^2_L might be a better choice over all the R^2 measures.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  dissertation_drs.pdf 959.42 Kb 00:04:26 00:02:17 00:01:59 00:00:59 00:00:05

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact the FSU Digital Library Center.