Sunday, December 12, 2021

Why we need to switch our emphasis from “What is the best R-Squared for the Logistic Regression?” to “What is the best combination of good R-Squared statistics?”

1.Introduction

Many different R-Squared statistics have been proposed for logistic regression in the past four and a half decades (see, e.g., review publications: Windmeijer (1995), Cameron and Windmeijer (1997), Mittlbock and Schemper (1996), Menard (2000), (2010), Smith and McKenna (2013), and Walker and Smith (2016)). These statistics can be divided into three main categories:

 

(1) R-Squared measures based on likelihoods;

(2) R-Squared measures based on the sum of squares;

(3) Measures based on squared correlations of observed output and estimated probabilities. 

 

Standing alone is the so-called Tjur R-Squared (R2Tjur). Indeed, it will be seen that R2Tjur has no apparent similarity with other R2 measures of predictive power. This abundance of R-Squared measures is very confusing – it looks like a jungle, and to find a pathway out of this jungle is quite a problem.

 

2. What is the best R2 for logistic regression?

The opinions are very different. Menard (2000) and (2010), and Cameron –

Windmeijer (1997) prefer the McFadden R-Squared (R2MF) also known as the log likelihood ratio R2 (R2L). Allison (2013), (2014) in his preferences switches from the Cox-Snell measure R2CS to R2MF and then shows some inclination to favor R2Tjur. In turn, Mittlbock and Schemper (1996) favor an OLS analog (R2OLS) or R2 based on the Pearson correlation (R2cor). For notational brevity, we prefer to use in most cases following notations: R2L instead of R2MF, R2O instead of R2OLS, R2C instead of R2cor, and R2T instead of R2Tjur. The motivations for choosing R2L, or R2O, or R2T, their drawbacks, and formulas for them are provided below.

 

2a. R2L advantageous properties

They are:

 

(1) an explicit, direct and intuitively clear interpretation in terms of information theory: the information content of the data, potentially recoverable information and information gain due to added predictors (see Cameron - Windmeijer (1997) and Shtatland (2018));

(2) a near independence from the base rate (Menard (2000));

(3) R2L is a centerpiece in the category of the R-squared analogs based on the likelihood statistic – all the other members of this category can be expressed in terms of R2L; of special interest is a very simple and natural relationship between R2L and Cox- Snell R2 CS (see Shtatland et al. (2002) and Menard (2010)). The corresponding formula will be presented elsewhere.

 

All these features of R2L are well-known. Also, the two novel, important and exclusive properties of R2L will be introduced and discussed later in the article.

 

2b. R2OLS / R2O: pros and cons

Being the strongest protagonist of R2L as the best R2 measure, Menard (2010), p. 56 notes that there are certain benefits in using R2OLS. Those benefits are:

“First, using R2OLS permits direct comparison of logistic regression models with linear probability, ANOVA, and discriminant analysis models if predicting the observed value (instead of predicting the observed probability that the dependent variable is equal to that value) is of interest. <…> Second, R2OLS is useful in calculating standardized logistic regression coefficients <…> Third, R2OLS can be used in models using methods other than maximum likelihood estimation for the logistic regression model, particularly IRLS”. IRLS here stands for the iteratively reweighted least squares method. And nevertheless, R2OLS has two serious disadvantages: it does not automatically increase when the model is extended by an additional predictor, and can be negative in some rare, rather degenerate cases. The last drawback is enough to prevent R2OLS from being the best R2 measure. As to the critique by Mittlbock and Schemper (1996) of the lack of the intuitive interpretation of R2MF or R2L  comparing  with  R2OLS , Menard rightly notes that “the issue here is less one of intuition than of greater familiarity with the more widely used R2OLS statistic”.

 

2c. R2T: pros and cons

This measure has a lot of intuitive appeal. What is very important - R2T is independent of the base rate, and it is a theoretically proven statement, not an empirically observed fact as in the case of R2L.  As a drawback - it does not automatically increase (or non-decrease) when the model is extended by an additional predictor. More about advantages and drawbacks of R2T will be shown some later.

 

3. From the best R2 - to the best combination of good ones

With this discord in choosing the best R2, it is natural to follow the proposal by DeMaris (1992) p. 56: “In sum, it may not be prudent to rely on only one measure for assessing predictive efficacy - particularly in view of the lack of consensus on which measure is most appropriate. Perhaps the best strategy is to report more than one measure for any given analysis”.

 

By the way, Menard (2010), p. 56 recommends using R2O not instead of R2L, but as a supplemental measure. Following the advice by DeMaris and Menard’s suggestions, it is proposed in Shtatland (2018) to report three measures: R2L, R2O and T2T altogether, hoping that they make up a good and natural trio of R-Squared measures and a proper way out of “the jungle” of R2 statistics for binary logistic regression.

 

Below, we will show that there exist very deep and natural relations between R2L, R2O and R2T. So, the choice of measures R2L, R2O and R2T in Shtatland (2018) is not accidental, but well-grounded.

 

4. Formulas for R2L, R2O, and R2T

First, we start with necessary notations and definitions. Binary logistic regression model with dichotomous outcome Y is defined by equation

 

ln[P(Y = 1)/P(Y = 0)] = ln[p/(1-p)] = b0 + b1X1 + b2X2 + … bkXk            (1)

 

where p is the predicted probability of EVENT (Y = 1) by model (1) and 1 – p is the predicted probability of NONEVENT (Y = 0) (for example, EVENT = {Sick}, NONEVENT = {Healthy}, etc.); X = (X1, X2, … Xk) is a vector of predictors or explanatory variables; b = (b0, b1, b2, … bk) are the related coefficients. Maximum likelihood is a basic method for estimating the parameters b = (b0, b1, b2, … bk) in (1). It means that with statistically independent observations y1, y2, ..., yi, ... yn of output variable Y and the corresponding vectors of predictors X1, X2, … Xn, we have to find values of coefficients which maximize the likelihood

 

L( y1, y2,... yi ... yn; X1, X2, … Xn; b0 , b1, b2, … bk) = Πni=1 p(yi |Xi)        (2)

 

We have dropped b0, b1, b2, … bk for notational brevity in the right side of (2).

According to Hosmer and Lemeshow (1989) p. 9, the log likelihood can be written as:

 

lnL(y1, y2,... yn; X1, X2, … Xn) = Σni=1 ln[p( yi | Xi)] =

Σni=1{yi ln[p(yi | Xi)] + (1 - yi) ln[1 – p(yi | Xi)]}

 

Now, let us introduce formulas for the chosen R2 measures:

 

R2L = 1 – lnLM / lnL0,                                                                                   (3)

 

where L0 is the maximized likelihood for the model containing only the intercept, without any predictor, and LM is the maximized likelihood of the current model M containing all available predictors;

 

R2O = 1 - Σni=1(yi -p̂(yi | Xi))2 / Σni=1(yi - ȳ)2,

 

where p̂(yi | Xi) is a maximum likelihood estimate of the actual (but unknown) probability p(yi | Xi) of the event (yi = 1), and ȳ = (Σni=1yi) / n is the average of yi;

 

R2T = Σ y = 1 p̂(yi | Xi) / n1 - Σ y = 0 p̂(yi | Xi) / n0,                                        (4)

 

where n1 is the number of EVENTs, n0 is the number of NONEVENTs, so n1 + n0 = n. R2T is a comparatively new measure, which is named after the author of publication Tjur (2009). In his comment to Allison (2013), William Greene (July 6, 2016) notes that this measure has been derived earlier in Cramer (1999). In turn, Ernest Shtatland in his comment to Allison (2013) (December 30, 2017) notes that the results of Cramer (1999) and Tjur (2009) are mutually complementary and it would be fair to use the double name: Cramer – Tjur, R2CT. Strictly speaking, formula (4) is correct only if n1 > n0, otherwise the signs in the right side of (4) should be changed from + to – , and vice versa. It is mentioned in Cramer (1999), but not in Tjur (2010). The asymmetry in the prediction is not evident, but it is well-known to practitioners according to Cramer (1999). To avoid confusion, the absolute value can be used in the right side of (4).

 

Though R2CS does not belong to our trio, we present a formula for this measure

 

R2CS = 1 – (LO /LM)(2/n)                                                                                  (5)

 

As has been mentioned earlier, all R2 measures based on the likelihood can be expressed in terms of R2L, and of special interest is a very simple and interesting relationship between R2L and Cox- Snell R2 CS (see Shtatland et al (2002) and Menard (2010)). The formula (5) can be re-written as

 

R2CS = 1 – exp (-2H* R2MF),                                                                         (6)

 

where the term H is nothing other than the entropy of the Bernoulli distribution with the parameter ȳ = (Σni=1yi) / n, which is the base rate. The formula (6) has been first derived in Shtatland et al (2002). The similar formulas can be easily obtained for the other R2 measures based on the likelihood. But the formula (6) is the simplest one, and it connects two very popular R2 statistics, former competitors for the title “The best R2 for logistic regression” (see Allison (2013) and (2014)).

 

5. What is common between classical R2L and R2O, and a new one - R2T?

At the first glance – nothing: the formulas defining them are entirely different. And yet, it can be shown that there exist deep and very interesting relationships between them.

 

5.1 R2O and R2T

The relations between R2T and R2O are quite evident. The fact is that according to Tjur (2009) R2T can be expressed through the three other R2-like measures:

 

R2T = (R2mod + R2res) / 2                                                                              (7)

 

and

 

R2T = √¯R2mod R2cor,                                                                                     (8)

 

where

 

R2res = 1 - Σni=1(p̂(yi | Xi) - ȳ)2 / Σni=1(yi - ȳ)2

 

R2mod = Σni=1(yi -p̂(yi | Xi))2 / Σni=1(yi - ȳ)2

 

R2cor = [Σni=1 (yi - ȳ) (p̂ (yi | Xi) - ȳ )]2 / (Σni=1(yi - ȳ)2) Σni=1(p̂(yi | Xi) - ȳ)2

 

Obviously, R2res is nothing but the familiar R2O, and R2cor is a well-known R 2 measure based on the squared Pearson correlations of observed output and estimated probabilities. Since both R2mod and R2cor are nonnegative, by definition, R2T is also nonnegative. Thus, averaging of R2res (i.e., R2O) and R2mod corrects the drawback of R2O of being negative under some circumstances. Also, a very interesting conclusion follows from equations (7) and (8): if we know R2O and R2T (the two members of our chosen trio) then actually we know both R2cor and R2mod. We can add to this that since R2res, R2mod, and R2cor are asymptotically equivalent (see, Tjur (2009), p. 368) then so are R2T and R2O. Thus, indeed there exist very deep and natural relations between R2T and R2O.

 

5.2 R2L and R2T; a new, symmetrical form of R2T

Now we will show that there also exist important relations between R2T and R2L, though not as obvious as between R2O and R2T.

 

Since one of the main goals of the logistic regression model is a good prediction we would like to make p(yi = 1) as close to 1 as possible for those subjects who do have yi = 1, and at the same time to have p(yi = 1) as close to 0 as possible for the ones with yi = 0. All R2 statistics for logistic regression discussed above serve as measures of predictive power. It is easily seen (Tjur (2009)) that R2T and R2L (as well as R2O, R2mod and R2cor) are always ≤1, with equality if and only if yi = p̂ (yi | Xi) for all i. Of course, this is an ideal, purely theoretical situation, but it is quite possible (though very seldom) that p̂ (yi | Xi) is sufficiently close to yi for all i. In this case we can use a very simple and useful approximation based on a first-order Taylor expansion of the logarithmic function log(x) : log(1- x) ≈ -x for small positive x and log(x) ≈ x - 1 for x close to 1. In general, if x is smaller than 0.10 this approximation is quite practical, and with x smaller than 0.01 – the approximation is very close. We apply this approximation to the following formula for the log likelihood

 

lnL = Σni=1[yi ln[p̂(yi | Xi)] + (1 - yi)ln[1 - p̂(yi | Xi)]] =

Σ(y = 1)  ln[p̂(yi | Xi)] + Σ (y = 0)  ln[1 - p̂(yi | Xi)]:

 

ln[p̂(yi | Xi)] ≈ p̂(yi | Xi) – 1 in the first sum above, and ln[1 - p̂(yi | Xi)] ≈ - p̂(yi | Xi) in the second one. As a result, we arrive at the formula

 

lnL ≈ Σ(y = 1) p̂(yi | Xi) – n1 – Σ (y = 0)  p̂(yi | Xi),                                                (9)

 

where n1 is the number of events. Thus, the log likelihood of logistic regression is approximately equal (up to the additive constant n1) to the difference between the sum of probabilities of correctly predicted events and the sum of probabilities of incorrect predictions. This approximation is linear in terms of p̂(yi | Xi). It reminds us of the formula for R2T, though instead of the sums of probabilities there are the averages of the probabilities. Since R2L is a linear function of lnL (see the formula (3) defining R2L, we have a very interesting and surprising similarity between R2L (in terms of p̂(yi | Xi)) and R2T. Note though that this similarity is meaningful and could be useful only in the advanced stages of the process of model building when we already work with “good” models, particularly those that are close to the saturated one.

 

Also, because by definition p̂(yi | Xi) is the probability of yi =1, then 1-p̂(yi | Xi) is the probability of yi = 0 i.e. p̂(yi = 0 | Xi) and the previous formula can be rewritten in the following way

 

lnL Σ(y = 1) p̂(yi = 1|Xi) - Σ(y = 0) [1 -    p̂(yi =0|Xi)] – n1 = Σ(y = 1) p̂(yi = 1|Xi) +

 

Σ (y = 0)  p̂(yi = 0| Xi))  - n1 n0 = Σ(y = 1) p̂(yi = 1|Xi) + Σ (y = 0)  p̂(yi = 0| Xi)) – n

 

Consequently,

 

lnL /n [Σ(y = 1) p̂(yi =1|Xi)] / n + [Σ (y = 0)  p̂(yi = 0| Xi)] / n  – 1 =

 

[Σ(y = 1) p̂(yi = 1| Xi)] / n1 }*(n1 /n) + {[Σ (y = 0)  p̂(yi = 0| Xi)] / n0 * (n0 /n) – 1

 

Note that p̂(yi | Xi) can be seen as “true positive” if yi  = 1 and “false positive” if yi = 0 on probability scale (related to the observation yi). The quantities:

 

(y = 1) p̂(yi | Xi)] / n1

 

and

 

(y = 0) (1 - p̂(yi | Xi))] / n0 = [Σ (y = 0) p̂ (yi = 0| Xi)] / n0

 

can be considered as sensitivity and specificity correspondingly. With these new notations, we have

 

lnL /n sensitivity*(n1/n) + specificity*(n0/n) - 1                               (10)

 

Now, we can rewrite the formula for R2T in a more symmetrical form. Indeed,

 

R2T =  Σ y = 1 p̂(yi = 1| Xi) / n1 - Σ y = 0 p̂(yi = 1| Xi) / n0 =

 

Σ y = 1 p̂(yi = 1| Xi) / n1 - Σ y = 0  (1 - p̂(yi = 0| Xi) / n0 =

 

Σ y = 1 p̂(yi = 1| Xi) / n1 + Σ y = 0  p̂(yi = 0| Xi) / n0 - 1 =

 

sensitivity + specificity - 1                                                                       (11)

 

The last quantity is a well-known Youden’s Index J introduced in Youden (1950) as a measure of the performance of a dichotomous diagnostic test. Thus, we can date back the origin of R2T from 2009 (Tjur) to 1999 (Cramer), and even to 1950. Though the term logistic regression was not adopted in the statistical literature in 1950, not to say about R2 measures for logistic regression.

 

Note, that the formula (11) for R2T in terms of sensitivity and specificity looks rather similar to the formula (10) for LnL/ n with two exceptions:

 

1. Formula (11) is exact, but (10) is approximate;

2. Equal weights are assigned to sensitivity and specificity in formula (11), at the same time, in (10) sensitivity and specificity are used with weights n1/n and n0/n correspondingly.

 

It should be noted that the equivalence of R2T and Youden’s Index has been first mentioned in 2-pages article Hughes (2017), but author’s arguments are not fully clear.

 

5.3 R2T and the base rate

Another rather striking property of R2T demonstrates a very interesting association of R2T with R2L. Indeed, Cramer (1999) has shown that both terms in the formula (4): Σ y = 1 p̂(yi | Xi) / n1 and Σ y = 0 p̂(yi | Xi) / n0 are linear functions of the base rate, n1/n (the proportion of the cases with yi = 1) with equal slopes (see ibid. equations (11) and (12), note that notations in Cramer (1999) are different from ours). As a result, R2T itself is independent from the base rate (ibid. equation (14)). It is a theoretically founded statement, not an empirically observed fact as in the case of R2L. Due to this property R2T is closer to R2L than to any other version of R2 for logistic regression. Some researchers (for example, Menard (2000) and (2010)) consider this property as a very important one. However, not everybody in the statistical community thinks this way: other researchers consider the dependence on the base rate as a strength, not a weakness. The main supporters of this point of view are Mittlbock and Schemper (1996). But their arguments are related to the question of which R2 measure: R2L or R2O, is the best, and this question is not actual anymore because R2O can under some circumstances take negative values. As said in Tjur (2009), it is more than enough to disqualify R2O as a candidate for the title “best choice of R2 in logistic regression”. Here we are not going to discuss the arguments in favor of independence of R2 from the base rate. These arguments can be readily found in Menard (2010) pp. 50, 54 – 55, 62. The mathematically proven independence of a very popular R2T statistic is a strong point in favor of Menard’s point of view. Again, let us remind that averaging of R2res (i.e., R2O) and R2mod causes R2T to be always positive though R2O can take sometimes negative values.

 

6. The unique properties of R2L

 

6.1 R2L and Adj-R2L vs AIC

So far, we have dealt only with R2 measures unadjusted for the number of predictors. Here we will make an exception for R2L and define an adjusted R2L as follows:

 

Adj-R2L = 1 - (lnLM – K) / lnL0, = R2L + K / lnL0                                              (12)

 

where, as before, L0 is the maximized likelihood for the model containing only the intercept, without any predictor, and LM is the maximized likelihood of the current model M containing all K available predictors. There are several variants of formulas for adjusted R2L (see, for examples, Menard (2010), pp.48-49; Smith and McKenna (2013); Walker and Smith (2016)). They are very close, almost equivalent. We have chosen the simplest one. Note that the last formula can be re-written as follows:

 

Adj-R2L = R2L - K / n [- ln ȳ - (1- ȳ) ln(1- ȳ)] = R2L - K / n H(ȳ),

 

where ȳ = (Σni=1yi) / n is the base rate, H(ȳ) is the entropy of the Bernoulli distribution with parameter , and n/K is the number of observations per predictor.

Also, define the Akaike Information Criterion, AIC as:

 

AIC = -2lnLM + 2K

 

Additionally, we introduce the test statistics T for testing the significance of the logistic model coefficients:

 

T = 2(lnLMLnL0)                                                                                       (13)

 

It is easily seen that there exist simple linear relationships between these three statistics, Adj-R2L, AIC, and T, for example:

 

Adj-R2L = 1 + AIC / 2lnL(0),

 

AIC = 2lnL(0)[Adj-R2L – 1],

 

T = -AIC + 2k -2lnL(0),

 

T = - 2lnL(0)*Adj-R2L  + 2k.

 

Due to the linear relationships between Adj-R2L and AIC, we can see that AIC and Adj-R2L are equivalent: if AIC decreases then Adj-R2L increases and vice versa.  So, Adj-R2L and AIC can be used interchangeably. That is why Adj-R2L can be referred to as the AIC-type adjustment. Note also that the value of AIC does not have any meaning, AIC value itself tells nothing about the quality of a model, only the quality relative to other models. Thus, if all the candidate models fit poorly, AIC will not give any warning of that. Comparing two models, M1 and M2, we need to know only which AIC value is smaller (the smaller - the better). At the same time, Adj-R2L alone characterizes the predictive power of model M considered separately. For that reason, Adj-R2L being equivalent to AIC, should be preferred over AIC, which is probably the most popular statistic in logistic regression applications. It makes R2L a unique R2 statistic.

 

6.2 R2L and a seeming paradox: Statistical significance vs Scientific or Practical significance, and P-values vs predictive power

Note from the very beginning that The American Statistical Association (ASA) has released a “Statement on Statistical Significance and P-Values” (see Wasserstein and Lazar (2016)). The statement does not recommend using the concept and the term “statistical significance” at all as misleading. P-values do not measure the size of an effect or the evidence of the significance of any kind (scientific or practical), or the importance of a model. P-values can only indicate how incompatible the data are with a specified null model (without predictors) and tells nothing about the alternative model (with available predictors).

 

Nevertheless, since the so-called “Statistical Significance War” has not yet affected the statistical literature on logistic regression, we will continue using the traditional terminology. We do hope that using the term “significance” below will not cause any confusion.

 

It has been shown in Shtatland et al (2018) that the quantity lnLM - lnL0 can be interpreted as information gain due to switching from the null model with intercept only to the current model with all available predictors. Thus, the test statistic (13) is nothing but doubled information gain. The reason for using the multiplier 2 is purely technical – just to get a statistic with the chi-square distribution. Therefore, we reject the null hypothesis only if the information gain due to additional explanatory variables exceeds some critical number, which sounds quite natural. Usually, the values of chi-square statistics are considered not meaningful. Note also that it has been said in Hosmer and Lemeshow (1989), p. 149 that:

 

“The quantity R2L is nothing more than an expression of the likelihood ratio test and, as such, is not a measure of goodness-of-fit”. In other words, the authors decline to interpret R2L as a measure of quality of the model (in particular, a measure of predictive power). Though, in the 3-rd edition of their book the authors drop this statement.

 

This situation looks like a paradox. But it is a seeming paradox - in reality, it is a manifestation of the unique property of R2L to be a fusion of the two distinct functions: as the test statistic for testing overall statistical significance of the model and as a measure of practical/scientific importance (in particular, predictive power) of the same model. In general, P-values by themselves as a technique to measure statistical significance say nothing about the size of the scientific or practical significance.

 

Summarizing, we can see that R2L has a number of undeniable advantages over the rest of R2 statistics, including R2O, R2T and R2CS. Moreover, it has no evident drawbacks. Naturally, there is a temptation to call it the best R2. Nevertheless, we prefer to consider R2L not the best R2 measure, but just the leading member of the trio: R2O, R2T and R2L.

 

7. Conclusion

The abundance of R-Squared measures for logistic regression, combined with the lack of consensus agreement on the criteria for choosing the best R2 is very confusing for researchers and other practitioners. In such a situation, it is natural to search for the best combination of R2 measures which provide a good performance. Our choice is a trio: R2L, R2O, and R2T. All of them have advantages of their own. But R2L stands out for established, well-known properties such as: explicit and straightforward interpretation in terms of information theory; its near independence from the base rate; and the fact that all the other R2 measures based on the log likelihood statistic can be easily expressed in terms of R2L. On top of that, R2L has two extraordinary features: very close and natural relationships with AIC (arguably the most popular statistic in the logistic regression modeling); and being a measure, which integrates two different functions - as the test statistic for testing overall statistical significance of the model and as a measure of predictive power for the same model. Also, R2L has no apparent disadvantages, unlike R2O and R2T. However, R2O and R2T have important properties, which are complementary to the R2L features mentioned above. For example, R2T is equal to the famous Youden’s index, also R2T is absolutely independent from the base rate. That is why the trio of R2L, R2O, and R2T can be rightly called the best combination of R2 measures for logistic regression.

 

Acknowledgments

I am indebted to Dr. Scott Menard for originating my interest in R2 measures for Logistic Regression, and for his helpful comments.

 

References

Allison, P.D. (2013) “What’s the best R-Squared for logistic regression?” http://www.statisticalhorizons.com/r2logistic

 

Allison, P.D. (2014) “Measures of fit for logistic regression”. https://support.sas.com/resources/papers/proceedings14/1485-2014.pdf

 

Cameron, C. A. and Windmeijer, F. A. G. (1997) "An R-squared measure of goodness of fit for some common nonlinear regression models", Journal of Econometrics, Vol. 77, No.2, pp. 329-342.

 

Cramer, J. S. (1999) “Predictive performance of the binary logit model in unbalanced samples”, The Statistician, 48, 85 – 94.

 

DeMaris, A. (1992) “Logit modeling” Sage University Paper Series.

 

Hosmer, D.W. and Lemeshow, S. (1989) “Applied Logistic Regression”. New York: Wiley.

 

Hosmer, D.W., Lemeshow, S. and Sturdivant, R.X.  (2013) “Applied Logistic Regression”. 3rd New York: Wiley.

 

Hughes, G., (2017) “Tjurs’s R2 for logistic regression models is the same as Youden’s index for a 2x2 diagnostic table”. Annals of Epidemiology, Vol. 27, 801 – 802.

 

McFadden, D. (1974), “Conditional logit analysis of qualitative choice behavior”, pp. 105 -142 in Zarembka (ed.), Frontiers in Econometrics. Academic Press.

 

Menard, S. (2000) “Coefficients of determination for multiple logistic regression analysis”, The American Statistician, 54, 17 – 24.

 

Menard, S. (2002), “Applied logistic regression analysis”, Sage University Paper Series (2nd edition).

 

Menard, S. (2010), “Logistic regression: From introductory to advanced concepts and applications”, Sage University Paper Series.

 

Mittlebock, M. and Schemper, M. (1996), “Explained variation in logistic regression”, Statistics in Medicine, 15, 1987 – 1997.

 

Shtatland, E. S., Kleinman, K. and Cain, E. M. (2002) “One more time about R^2 measures of fit in logistic regression”, NESUG 2002 Proceedings.

 

Shtatland, E.S. (2018) “Do we really need more than one R-Squared in logistic regression?”   http://statisticalmiscellany.blogspot.com/2018/

 

Smith, T. J. and McKenna, C. M. (2013) “A comparison of logistic regression pseudo R2 indices”, Multiple Linear Regression Viewpoints, 39, 17 - 26.

 

Tjur, T. (2009) “Coefficients of determination in logistic regression models – a new proposal: the coefficient of discrimination”, The American Statistician, 63, 366 – 372.

 

Walker, D. A. and Smith, T. J. (2016) “Nine pseudo R^2 indices for binary logistic regression models”, Journal of Modern Applied Statistical Methods, 15, 848 – 854.

 

Wasserstein, R. L. and Lazar, N. A. (2016)The ASA's statement on P-values: Context, process, and purpose. The American Statistician, 70, 129-133.

 

Windmeijer, F. A. G. (1995) “Goodness of fit measures in binary choice models”, Econometric Reviews, 14, 101 – 116.