AUC = 0 : perfect prediction

For the model, see:

but I post it in a separate thread because the question is different.

plot(vs, stats=c("auc"))

How should I read this plot? In my world, an AUC of 0 means perfect prediction - the researcher only got the sign wrong. 1 is perfect decision, 0.5 random guessing. I expected a curve in the range from 0.5 to 1.

Note: This machine learning - Example when using accuracy as an outcome measure will lead to a wrong conclusion - Cross Validated holds anyway, but let’s skip that for a moment.


I think it’s the other way around. If the area under the ROC curve is \mathrm{AUC} = 1, then we like our model and we might even consider taking it out on a dinner :) The plot indicates that having more >4 covariates is probably not necessary.

Finding the right model was not the question, that’s evident. It was the scaling which I believe is not what I am used to from my work in radar detection.

I assume they compute the area above the diagonal, but the definition I know is the area under the full curve, i.e. 0.5 for pure guessing.

When you are predicting the outcome on withdrawn data (as in cross-validation), you could also get values below 0.5. That’s because without looking at the data (and so at the correct answer), you would not be able to state that the sign was incorrect.

I would not be surprised about random variation around 0.5, but the value of 0 at 0 is definitively wrong and could be 0.5. And to me it looks like the value at 1 is also not just 0.5 minus variation, because normally the first term makes a big jump above randomness.

I have to explain the outcome to people who know well to handle AUC plots, and they will be confused to see this.

I suspect that there is a calibration error and that the AUC was computed above the diagonal, not the bottom/right coordinates a usual.

What’s your output of summary(vs, stats="auc")? I’m wondering if it’s a problem with the computation or with the plotting.

Plotting is correct. So op_group could be 0.49 when we use 2*, but that’s VERY unlikely from the data I know, the first predictor is definitively higher. I think the normalization should be something auc_correct = auc/2+0.5, just a wild guess…

size solution_terms auc
0 0.00 0.00
1 op_group 0.44 0.05
2 reflux_preop 0.57 0.05
3 z_age_op 0.67 0.05
4 op_group:z_age_op 0.71 0.04
5 z_bmi_preop 0.71 0.04
6 op_group:z_bmi_preop 0.71 0.04
7 op_group:reflux_preop 0.70 0.04
8 (1 | subject_id) 0.70 0.04
1 Like

@AlejandroCatalina Can you have a look at this?

Thanks @mcol for tagging me.

I believe we may be looking at the same thing under two different definitions. Our implementation corresponds to AUC between 0 and 1, where a value of 0 corresponds to 100% incorrect predictions and 1 to 100% correct predictions. Would this help you interpreting the plot?

I fully agree with your definition. In a binomial situation, 100% incorrect prediction at 0 is equivalent to perfect prediction as I said above, the researcher just got the interpretation wrong. In your plot, the best prediction would be with 0 parameters, we just have to tell the researcher that he is down-under. Your measure might be helpful, but it is not what AUC of ROC in the general literature is. I have worked in radar detection for some years, but if you do not believe me, check the literature on the subject.

To cite wikipedia (bold by me):

Whereas ROC AUC varies between 0 and 1 — with an uninformative classifier yielding 0.5 — the alternative measures known as informedness…

I discussed this with a reviewer who correctly informed me that the definition of AUC does not agree with the standard. She suggested that I ask you how AUC was defined. The standard is the AUC in the ROC, but that one definitively has 0.5 as pure guessing, 0 and 1 a full knowledge in a binary decision.

Maybe you have reasons not to use a standard, but please give us the definition.

1 Like

I think that the computation of the AUC for a random guess is correct:

x <- cbind(V1=rbinom(100, 1, 0.5), V2=runif(100), V3=1)
[1] 0.5172276

What data enters the auc() function for a model of size 0 I don’t know, so perhaps the problem is in that area.

Fine progress, so we agree on the definition! Knowing nothing is pure prior. AUC of 0 is NOT know-nothing, but know all and being headstrong.

It is a bug in cv_varsel, varsel works as expected. See

1 Like