Statistical Miscellany

1.Introduction

Many different R-Squared statistics have been proposed for logistic regression in the past four and a half decades (see, e.g., review publications: Windmeijer (1995), Cameron and Windmeijer (1997), Mittlbock and Schemper (1996), Menard (2000), (2010), Smith and McKenna (2013), and Walker and Smith (2016)). These statistics can be divided into three main categories:

(1) R-Squared measures based on likelihoods;

(2) R-Squared measures based on the sum of squares;

(3) Measures based on squared correlations of observed output and estimated probabilities.

Standing alone is the so-called Tjur R-Squared (R²_Tjur). Indeed, it will be seen that R²_Tjur has no apparent similarity with other R² measures of predictive power. This abundance of R-Squared measures is very confusing – it looks like a jungle, and to find a pathway out of this jungle is quite a problem.

2. What is the best R² for logistic regression?

The opinions are very different. Menard (2000) and (2010), and Cameron –

Windmeijer (1997) prefer the McFadden R-Squared (R²_MF) also known as the log likelihood ratio R² (R²_L). Allison (2013), (2014) in his preferences switches from the Cox-Snell measure R²_CS to R²_MF and then shows some inclination to favor R²_Tjur. In turn, Mittlbock and Schemper (1996) favor an OLS analog (R²_OLS) or R² based on the Pearson correlation (R²_co_r). For notational brevity, we prefer to use in most cases following notations: R²_L instead of R²_MF, R²_O instead of R²_OLS_,R²_C instead of R²_co_r, and R²_T instead of R²_Tjur. The motivations for choosing R²_L, or R²_O, or R²_T, their drawbacks, and formulas for them are provided below.

2a. R²_Ladvantageous properties

They are:

(1) an explicit, direct and intuitively clear interpretation in terms of information theory: the information content of the data, potentially recoverable information and information gain due to added predictors (see Cameron - Windmeijer (1997) and Shtatland (2018));

(2) a near independence from the base rate (Menard (2000));

(3) R²_L is a centerpiece in the category of the R-squared analogs based on the likelihood statistic – all the other members of this category can be expressed in terms of R²_L; of special interest is a very simple and natural relationship between R²_L and Cox- Snell R²_CS (see Shtatland et al. (2002) and Menard (2010)). The corresponding formula will be presented elsewhere.

All these features of R²_L are well-known. Also, the two novel, important and exclusive properties of R²_L will be introduced and discussed later in the article.

2b. R²_OLS / R²_O: pros and cons

Being the strongest protagonist of R²_L as the best R² measure, Menard (2010), p. 56 notes that there are certain benefits in using R²_OLS. Those benefits are:

“First, using R²_OLS permits direct comparison of logistic regression models with linear probability, ANOVA, and discriminant analysis models if predicting the observed value (instead of predicting the observed probability that the dependent variable is equal to that value) is of interest. <…> Second, R²_OLS is useful in calculating standardized logistic regression coefficients <…> Third, R²_OLS can be used in models using methods other than maximum likelihood estimation for the logistic regression model, particularly IRLS”. IRLS here stands for the iteratively reweighted least squares method. And nevertheless, R²_OLS has two serious disadvantages: it does not automatically increase when the model is extended by an additional predictor, and can be negative in some rare, rather degenerate cases. The last drawback is enough to prevent R²_OLS from being the best R²measure. As to the critique by Mittlbock and Schemper (1996) of the lack of the intuitive interpretation of R²_MF or R²_Lcomparing with R²_OLS , Menard rightly notes that “the issue here is less one of intuition than of greater familiarity with the more widely used R²_OLS statistic”.

2c. R²_T: pros and cons

This measure has a lot of intuitive appeal. What is very important - R²_T is independent of the base rate, and it is a theoretically proven statement, not an empirically observed fact as in the case of R²_L. As a drawback - it does not automatically increase (or non-decrease) when the model is extended by an additional predictor. More about advantages and drawbacks of R²_T will be shown some later.

3. From the best R² - to the best combination of good ones

With this discord in choosing the best R², it is natural to follow the proposal by DeMaris (1992) p. 56: “In sum, it may not be prudent to rely on only one measure for assessing predictive efficacy - particularly in view of the lack of consensus on which measure is most appropriate. Perhaps the best strategy is to report more than one measure for any given analysis”.

By the way, Menard (2010), p. 56 recommends using R²_O not instead of R²_L, but as a supplemental measure. Following the advice by DeMaris and Menard’s suggestions, it is proposed in Shtatland (2018) to report three measures: R²_L, R²_O and T²_T altogether, hoping that they make up a good and natural trio of R-Squared measures and a proper way out of “the jungle” of R² statistics for binary logistic regression.

Below, we will show that there exist very deep and natural relations between R²_L, R²_O and R²_T. So, the choice of measures R²_L, R²_O and R²_T in Shtatland (2018) is not accidental, but well-grounded.

4. Formulas for R²_L, R²_O_,and R²_T

First, we start with necessary notations and definitions. Binary logistic regression model with dichotomous outcome Y is defined by equation

ln[P(Y = 1)/P(Y = 0)] = ln[p/(1-p)] = b₀ + b₁X₁ + b₂X₂ + … b_kX_k (1)

where p is the predicted probability of EVENT (Y = 1) by model (1) and 1 – p is the predicted probability of NONEVENT (Y = 0) (for example, EVENT = {Sick}, NONEVENT = {Healthy}, etc.); X = (X₁, X₂, … X_k) is a vector of predictors or explanatory variables; b = (b₀, b₁, b₂, … b_k) are the related coefficients. Maximum likelihood is a basic method for estimating the parameters b = (b₀, b₁, b₂, … b_k) in (1). It means that with statistically independent observations y₁, y₂, ..., y_i, ... y_n of output variable Y and the corresponding vectors of predictors X₁, X₂, … X_n, we have to find values of coefficients which maximize the likelihood

L( y₁, y₂,... y_i ... y_n; X₁, X₂, … X_n; b₀ , b₁, b₂, … b_k) = Πⁿ_i=1 p(y_i |Xi) (2)

We have dropped b₀, b₁, b₂, … b_k for notational brevity in the right side of (2).

According to Hosmer and Lemeshow (1989) p. 9, the log likelihood can be written as:

lnL(y₁, y₂,... y_n; X₁, X₂, … X_n) = Σⁿ_i=1 ln[p( y_i | X_i)] =

Σⁿ_i=1{y_i ln[p(y_i | X_i)] + (1 - y_i) ln[1 – p(y_i | X_i)]}

Now, let us introduce formulas for the chosen R² measures:

R²_L = 1 – lnL_M / lnL₀, (3)

where L₀is the maximized likelihood for the model containing only the intercept, without any predictor, and L_Mis the maximized likelihood of the current model M containing all available predictors;

R²_O = 1 - Σⁿ_i=1(y_i -p̂(y_i | X_i))² / Σⁿ_i=1(y_i - ȳ)²,

where p̂(y_i | X_i) is a maximum likelihood estimate of the actual (but unknown) probability p(y_i | X_i) of the event (y_i = 1), and ȳ = (Σⁿ_i=1y_i) / n is the average of y_i;

R²_T = Σ _{y
= 1}p̂(y_i | X_i) / n₁ - Σ _{y =
0}p̂(y_i | X_i) / n₀, (4)

where n₁ is the number of EVENTs, n₀ is the number of NONEVENTs, so n₁ + n₀ = n. R²_T is a comparatively new measure, which is named after the author of publication Tjur (2009). In his comment to Allison (2013), William Greene (July 6, 2016) notes that this measure has been derived earlier in Cramer (1999). In turn, Ernest Shtatland in his comment to Allison (2013) (December 30, 2017) notes that the results of Cramer (1999) and Tjur (2009) are mutually complementary and it would be fair to use the double name: Cramer – Tjur, R²_CT. Strictly speaking, formula (4) is correct only if n₁ > n₀, otherwise the signs in the right side of (4) should be changed from + to – , and vice versa. It is mentioned in Cramer (1999), but not in Tjur (2010). The asymmetry in the prediction is not evident, but it is well-known to practitioners according to Cramer (1999). To avoid confusion, the absolute value can be used in the right side of (4).

Though R²_CS does not belong to our trio, we present a formula for this measure

R²_CS = 1 – (L_O /L_M)^(2/n) (5)

As has been mentioned earlier, all R² measures based on the likelihood can be expressed in terms of R²_L, and of special interest is a very simple and interesting relationship between R²_L and Cox- Snell R²_CS (see Shtatland et al (2002) and Menard (2010)). The formula (5) can be re-written as

R²_CS = 1 – exp (-2H* R²_MF), (6)

where the term H is nothing other than the entropy of the Bernoulli distribution with the parameter ȳ = (Σⁿ_i=1y_i) / n, which is the base rate. The formula (6) has been first derived in Shtatland et al (2002). The similar formulas can be easily obtained for the other R² measures based on the likelihood. But the formula (6) is the simplest one, and it connects two very popular R²statistics, former competitors for the title “The best R² for logistic regression” (see Allison (2013) and (2014)).

5. What is common between classical R²_L and R²_O_,and a new one - R²_T?

At the first glance – nothing: the formulas defining them are entirely different. And yet, it can be shown that there exist deep and very interesting relationships between them.

5.1 R²_Oand R²_T

The relations between R²_T and R²_O are quite evident. The fact is that according to Tjur (2009) R²_T can be expressed through the three other R²-like measures:

R²_T = (R²_mod + R²_res) / 2 (7)

and

R²_T = √¯R²_mod R²_cor, (8)

where

R²_res = 1 - Σⁿ_i=1(p̂(y_i | X_i) - ȳ)² / Σⁿ_i=1(y_i - ȳ)²

R²_mod = Σⁿ_i=1(y_i -p̂(y_i | X_i))² / Σⁿ_i=1(y_i - ȳ)²

R²_cor = [Σⁿ_i=1
(y_i - ȳ) (p̂ (y_i | X_i) - ȳ )]²/ (Σⁿ_i=1(y_i - ȳ)²) Σⁿ_i=1(p̂(y_i | X_i) - ȳ)²

Obviously, R²_res is nothing but the familiar R²_O, and R²_cor is a well-known R² measure based on the squared Pearson correlations of observed output and estimated probabilities. Since both R²_mod and R²_cor are nonnegative, by definition, R²_T is also nonnegative. Thus, averaging of R²_res (i.e., R²_O) and R²_mod corrects the drawback of R²_O of being negative under some circumstances. Also, a very interesting conclusion follows from equations (7) and (8): if we know R²_O and R²_T (the two members of our chosen trio) then actually we know both R²_cor and R²_mod. We can add to this that since R²_res,R²_mod_,and R²_cor are asymptotically equivalent (see, Tjur (2009), p. 368) then so are R²_Tand R²_O. Thus, indeed there exist very deep and natural relations between R²_Tand R²_O.

5.2 R²_Land R²_T; a new, symmetrical form of R²_T

Now we will show that there also exist important relations between R²_T and R²_L, though not as obvious as between R²_Oand R²_T.

Since one of the main goals of the logistic regression model is a good prediction we would like to make p(y_i = 1) as close to 1 as possible for those subjects who do have y_i = 1, and at the same time to have p(y_i = 1) as close to 0 as possible for the ones with y_i = 0. All R²statistics for logistic regression discussed above serve as measures of predictive power. It is easily seen (Tjur (2009)) that R²_T and R²_L (as well as R²_O, R²_mod and R²_cor) are always ≤1, with equality if and only if y_i = p̂ (y_i | X_i) for all i. Of course, this is an ideal, purely theoretical situation, but it is quite possible (though very seldom) that p̂ (y_i | X_i) is sufficiently close to y_i for all i. In this case we can use a very simple and useful approximation based on a first-order Taylor expansion of the logarithmic function log(x) : log(1- x) ≈ -x for small positive x and log(x) ≈ x - 1 for x close to 1. In general, if x is smaller than 0.10 this approximation is quite practical, and with x smaller than 0.01 – the approximation is very close. We apply this approximation to the following formula for the log likelihood

lnL = Σⁿ_i=1[y_i ln[p̂(y_i | X_i)] + (1 - y_i)ln[1 - p̂(y_i | X_i)]] =

Σ_{(y = 1)}ln[p̂(y_i | X_i)] + Σ _{(y = 0)}ln[1 - p̂(y_i | X_i)]:

ln[p̂(y_i | X_i)] ≈ p̂(y_i | X_i) – 1 in the first sum above, and ln[1 - p̂(y_i | X_i)] ≈ - p̂(y_i | X_i) in the second one. As a result, we arrive at the formula

lnL ≈ Σ_{(y = 1)}p̂(y_i | X_i) – n₁ – Σ _{(y = 0)}p̂(y_i | X_i), (9)

where n₁ is the number of events. Thus, the log likelihood of logistic regression is approximately equal (up to the additive constant n₁) to the difference between the sum of probabilities of correctly predicted events and the sum of probabilities of incorrect predictions. This approximation is linear in terms of p̂(y_i | X_i). It reminds us of the formula for R²_T, though instead of the sums of probabilities there are the averages of the probabilities. Since R²_L is a linear function of lnL (see the formula (3) defining R²_L, we have a very interesting and surprising similarity between R²_L (in terms of p̂(y_i | X_i)) and R²_T. Note though that this similarity is meaningful and could be useful only in the advanced stages of the process of model building when we already work with “good” models, particularly those that are close to the saturated one.

Also, because by definition p̂(y_i | X_i) is the probability of y_i =1, then 1-p̂(y_i | X_i) is the probability of y_i = 0 i.e. p̂(y_i = 0 | X_i) and the previous formula can be rewritten in the following way

lnL ≈ Σ_{(y = 1)}p̂(y_i = 1|X_i) - Σ_{(y = 0)} [1 -p̂(y_i =0|X_i)] – n₁ = Σ_{(y = 1)}p̂(y_i = 1|X_i) +

Σ _{(y = 0)}p̂(y_i = 0| X_i)) - n₁ – n₀ = Σ_{(y = 1)}p̂(y_i = 1|X_i) + Σ _{(y = 0)}p̂(y_i = 0| X_i)) – n

Consequently,

lnL /n ≈ [Σ_{(y =
1)}p̂(y_i =1|X_i)] / n + [Σ _{(y = 0)}p̂(y_i = 0| X_i)] / n – 1 =

[Σ_{(y = 1)}p̂(y_i = 1| X_i)] / n₁ }*(n₁ /n) + {[Σ _{(y = 0)}p̂(y_i = 0| X_i)] / n₀ * (n₀ /n) – 1

Note that p̂(y_i | X_i) can be seen as “true positive” if y_i= 1 and “false positive” if y_i = 0 on probability scale (related to the observation y_i). The quantities:

[Σ_{(y = 1)}p̂(y_i | X_i)] / n₁

and

[Σ _{(y = 0)}(1 - p̂(y_i | X_i))] / n₀ = [Σ _{(y = 0)}p̂ (y_i = 0| X_i)] / n₀

can be considered as sensitivity and specificity correspondingly. With these new notations, we have

lnL /n ≈ sensitivity*(n₁/n) + specificity*(n₀/n) - 1 (10)

Now, we can rewrite the formula for R²_T in a more symmetrical form. Indeed,

R²_T = Σ _{y = 1}p̂(y_i = 1| X_i) / n₁ - Σ _{y = 0}p̂(y_i = 1| X_i) / n₀ =

Σ _{y = 1}p̂(y_i = 1| X_i) / n₁ - Σ _{y = 0}(1 - p̂(y_i = 0| X_i) / n₀ =

Σ _{y = 1}p̂(y_i = 1| X_i) / n₁ + Σ _{y = 0}p̂(y_i = 0| X_i) / n₀ - 1 =

sensitivity + specificity - 1 (11)

The last quantity is a well-known Youden’s Index J introduced in Youden (1950) as a measure of the performance of a dichotomous diagnostic test. Thus, we can date back the origin of R²_T from 2009 (Tjur) to 1999 (Cramer), and even to 1950. Though the term logistic regression was not adopted in the statistical literature in 1950, not to say about R² measures for logistic regression.

Note, that the formula (11) for R²_T in terms of sensitivity and specificity looks rather similar to the formula (10) for LnL/ n with two exceptions:

1. Formula (11) is exact, but (10) is approximate;

2. Equal weights are assigned to sensitivity and specificity in formula (11), at the same time, in (10) sensitivity and specificity are used with weights n₁/n and n₀/n correspondingly.

It should be noted that the equivalence of R²_T and Youden’s Index has been first mentioned in 2-pages article Hughes (2017), but author’s arguments are not fully clear.

5.3 R²_T and the base rate

Another rather striking property of R²_T demonstrates a very interesting association of R²_T with R²_L. Indeed, Cramer (1999) has shown that both terms in the formula (4): Σ _{y = 1}p̂(y_i | X_i) / n₁ and Σ _{y = 0}p̂(y_i | X_i) / n₀ are linear functions of the base rate, n₁/n (the proportion of the cases with y_i = 1) with equal slopes (see ibid. equations (11) and (12), note that notations in Cramer (1999) are different from ours). As a result, R²_T itself is independent from the base rate (ibid. equation (14)). It is a theoretically founded statement, not an empirically observed fact as in the case of R²_L. Due to this property R²_T is closer to R²_L than to any other version of R² for logistic regression. Some researchers (for example, Menard (2000) and (2010)) consider this property as a very important one. However, not everybody in the statistical community thinks this way: other researchers consider the dependence on the base rate as a strength, not a weakness. The main supporters of this point of view are Mittlbock and Schemper (1996). But their arguments are related to the question of which R² measure: R²_L or R²_O,is the best, and this question is not actual anymore because R²_O can under some circumstances take negative values. As said in Tjur (2009), it is more than enough to disqualify R²_O as a candidate for the title “best choice of R² in logistic regression”. Here we are not going to discuss the arguments in favor of independence of R² from the base rate. These arguments can be readily found in Menard (2010) pp. 50, 54 – 55, 62. The mathematically proven independence of a very popular R²_T statistic is a strong point in favor of Menard’s point of view. Again, let us remind that averaging of R²_res (i.e., R²_O) and R²_mod causes R²_T to be always positive though R²_O can take sometimes negative values.

6. The unique properties of R²_L

6.1 R²_L and Adj-R²_L vs AIC

So far, we have dealt only with R²measures unadjusted for the number of predictors. Here we will make an exception for R²_L and define an adjusted R²_L as follows:

Adj-R²_L = 1 - (lnL_M – K) / lnL₀, = R²_L + K / lnL₀ (12)

where, as before, L₀is the maximized likelihood for the model containing only the intercept, without any predictor, and L_Mis the maximized likelihood of the current model M containing all K available predictors. There are several variants of formulas for adjusted R²_L (see, for examples, Menard (2010), pp.48-49; Smith and McKenna (2013); Walker and Smith (2016)). They are very close, almost equivalent. We have chosen the simplest one. Note that the last formula can be re-written as follows:

Adj-R²_L = R²_L - K / n [- ȳ ln ȳ - (1- ȳ) ln(1- ȳ)] = R²_L - K / n H(ȳ),

where ȳ = (Σⁿ_i=1y_i) / n is the base rate, H(ȳ) is the entropy of the Bernoulli distribution with parameter ȳ, and n/K is the number of observations per predictor.

Also, define the Akaike Information Criterion, AIC as:

AIC = -2lnL_M + 2K

Additionally, we introduce the test statistics T for testing the significance of the logistic model coefficients:

T = 2(lnL_M – LnL₀) (13)

It is easily seen that there exist simple linear relationships between these three statistics, Adj-R²_L, AIC, and T, for example:

Adj-R²_L = 1 + AIC / 2lnL(0),

AIC = 2lnL(0)[Adj-R²_L – 1],

T = -AIC + 2k -2lnL(0),

T = - 2lnL(0)*Adj-R²_L+ 2k.

Due to the linear relationships between Adj-R²_L and AIC, we can see that AIC and Adj-R²_L are equivalent: if AIC decreases then Adj-R²_L increases and vice versa. So, Adj-R²_L and AIC can be used interchangeably. That is why Adj-R²_L can be referred to as the AIC-type adjustment. Note also that the value of AIC does not have any meaning, AIC value itself tells nothing about the quality of a model, only the quality relative to other models. Thus, if all the candidate models fit poorly, AIC will not give any warning of that. Comparing two models, M₁ and M₂, we need to know only which AIC value is smaller (the smaller - the better). At the same time, Adj-R²_L alone characterizes the predictive power of model M considered separately. For that reason, Adj-R²_L being equivalent to AIC, should be preferred over AIC, which is probably the most popular statistic in logistic regression applications. It makes R²_L a unique R² statistic.

6.2 R²_L and a seeming paradox: Statistical significance vs Scientific or Practical significance, and P-values vs predictive power

Note from the very beginning that The American Statistical Association (ASA) has released a “Statement on Statistical Significance and P-Values” (see Wasserstein and Lazar (2016)). The statement does not recommend using the concept and the term “statistical significance” at all as misleading. P-values do not measure the size of an effect or the evidence of the significance of any kind (scientific or practical), or the importance of a model. P-values can only indicate how incompatible the data are with a specified null model (without predictors) and tells nothing about the alternative model (with available predictors).

Nevertheless, since the so-called “Statistical Significance War” has not yet affected the statistical literature on logistic regression, we will continue using the traditional terminology. We do hope that using the term “significance” below will not cause any confusion.

It has been shown in Shtatland et al (2018) that the quantity lnL_M - lnL₀ can be interpreted as information gain due to switching from the null model with intercept only to the current model with all available predictors. Thus, the test statistic (13) is nothing but doubled information gain. The reason for using the multiplier 2 is purely technical – just to get a statistic with the chi-square distribution. Therefore, we reject the null hypothesis only if the information gain due to additional explanatory variables exceeds some critical number, which sounds quite natural. Usually, the values of chi-square statistics are considered not meaningful. Note also that it has been said in Hosmer and Lemeshow (1989), p. 149 that:

“The quantity R²_L is nothing more than an expression of the likelihood ratio test and, as such, is not a measure of goodness-of-fit”. In other words, the authors decline to interpret R²_L as a measure of quality of the model (in particular, a measure of predictive power). Though, in the 3-rd edition of their book the authors drop this statement.

This situation looks like a paradox. But it is a seeming paradox - in reality, it is a manifestation of the unique property of R²_Lto be a fusion of the two distinct functions: as the test statistic for testing overall statistical significance of the model and as a measure of practical/scientific importance (in particular, predictive power) of the same model. In general, P-values by themselves as a technique to measure statistical significance say nothing about the size of the scientific or practical significance.

Summarizing, we can see that R²_Lhas a number of undeniable advantages over the rest of R² statistics, including R²_O, R²_T and R²_CS. Moreover, it has no evident drawbacks. Naturally, there is a temptation to call it the best R². Nevertheless, we prefer to consider R²_L not the best R² measure, but just the leading member of the trio: R²_O, R²_T and R²_L.

7. Conclusion

The abundance of R-Squared measures for logistic regression, combined with the lack of consensus agreement on the criteria for choosing the best R² is very confusing for researchers and other practitioners. In such a situation, it is natural to search for the best combination of R² measures which provide a good performance. Our choice is a trio: R²_L, R²_O, and R²_T. All of them have advantages of their own. But R²_L stands out for established, well-known properties such as: explicit and straightforward interpretation in terms of information theory; its near independence from the base rate; and the fact that all the other R² measures based on the log likelihood statistic can be easily expressed in terms of R²_L. On top of that, R²_L has two extraordinary features: very close and natural relationships with AIC (arguably the most popular statistic in the logistic regression modeling); and being a measure, which integrates two different functions - as the test statistic for testing overall statistical significance of the model and as a measure of predictive power for the same model. Also, R²_L has no apparent disadvantages, unlike R²_O and R²_T. However, R²_O and R²_T have important properties, which are complementary to the R²_L features mentioned above. For example, R²_T is equal to the famous Youden’s index, also R²_T is absolutely independent from the base rate. That is why the trio of R²_L, R²_O, and R²_T can be rightly called the best combination of R² measures for logistic regression.

Acknowledgments

I am indebted to Dr. Scott Menard for originating my interest in R² measures for Logistic Regression, and for his helpful comments.

References

Allison, P.D. (2013) “What’s the best R-Squared for logistic regression?” http://www.statisticalhorizons.com/r2logistic

Allison, P.D. (2014) “Measures of fit for logistic regression”. https://support.sas.com/resources/papers/proceedings14/1485-2014.pdf

Cameron, C. A. and Windmeijer, F. A. G. (1997) "An R-squared measure of goodness of fit for some common nonlinear regression models", Journal of Econometrics, Vol. 77, No.2, pp. 329-342.

Cramer, J. S. (1999) “Predictive performance of the binary logit model in unbalanced samples”, The Statistician, 48, 85 – 94.

DeMaris, A. (1992) “Logit modeling” Sage University Paper Series.

Hosmer, D.W. and Lemeshow, S. (1989) “Applied Logistic Regression”. New York: Wiley.

Hosmer, D.W., Lemeshow, S. and Sturdivant, R.X. (2013) “Applied Logistic Regression”. 3^rd New York: Wiley.

Hughes, G., (2017) “Tjurs’s R² for logistic regression models is the same as Youden’s index for a 2x2 diagnostic table”. Annals of Epidemiology, Vol. 27, 801 – 802.

McFadden, D. (1974), “Conditional logit analysis of qualitative choice behavior”, pp. 105 -142 in Zarembka (ed.), Frontiers in Econometrics. Academic Press.

Menard, S. (2000) “Coefficients of determination for multiple logistic regression analysis”, The American Statistician, 54, 17 – 24.

Menard, S. (2002), “Applied logistic regression analysis”, Sage University Paper Series (2nd edition).

Menard, S. (2010), “Logistic regression: From introductory to advanced concepts and applications”, Sage University Paper Series.

Mittlebock, M. and Schemper, M. (1996), “Explained variation in logistic regression”, Statistics in Medicine, 15, 1987 – 1997.

Shtatland, E. S., Kleinman, K. and Cain, E. M. (2002) “One more time about R^2 measures of fit in logistic regression”, NESUG 2002 Proceedings.

Shtatland, E.S. (2018) “Do we really need more than one R-Squared in logistic regression?” http://statisticalmiscellany.blogspot.com/2018/

Smith, T. J. and McKenna, C. M. (2013) “A comparison of logistic regression pseudo R² indices”, Multiple Linear Regression Viewpoints, 39, 17 - 26.

Tjur, T. (2009) “Coefficients of determination in logistic regression models – a new proposal: the coefficient of discrimination”, The American Statistician, 63, 366 – 372.

Walker, D. A. and Smith, T. J. (2016) “Nine pseudo R^2 indices for binary logistic regression models”, Journal of Modern Applied Statistical Methods, 15, 848 – 854.

Wasserstein, R. L. and Lazar, N. A. (2016) “The ASA's statement on P-values: Context, process, and purpose. The American Statistician, 70, 129-133.

Windmeijer, F. A. G. (1995) “Goodness of fit measures in binary choice models”, Econometric Reviews, 14, 101 – 116.

Statistical Miscellany

Sunday, December 12, 2021

Why we need to switch our emphasis from “What is the best R-Squared for the Logistic Regression?” to “What is the best combination of good R-Squared statistics?”

Report Abuse