AUC = 0 : perfect prediction

denne · February 22, 2021, 4:07pm

For the model, see:

but I post it in a separate thread because the question is different.

plot(vs, stats=c("auc"))

How should I read this plot? In my world, an AUC of 0 means perfect prediction - the researcher only got the sign wrong. 1 is perfect decision, 0.5 random guessing. I expected a curve in the range from 0.5 to 1.

Note: This machine learning - Example when using accuracy as an outcome measure will lead to a wrong conclusion - Cross Validated holds anyway, but let’s skip that for a moment.

torkar · February 22, 2021, 4:31pm

Hi,

I think it’s the other way around. If the area under the ROC curve is \mathrm{AUC} = 1, then we like our model and we might even consider taking it out on a dinner :) The plot indicates that having more >4 covariates is probably not necessary.

denne · February 22, 2021, 4:42pm

Finding the right model was not the question, that’s evident. It was the scaling which I believe is not what I am used to from my work in radar detection.

I assume they compute the area above the diagonal, but the definition I know is the area under the full curve, i.e. 0.5 for pure guessing.

mcol · February 22, 2021, 4:48pm

When you are predicting the outcome on withdrawn data (as in cross-validation), you could also get values below 0.5. That’s because without looking at the data (and so at the correct answer), you would not be able to state that the sign was incorrect.

denne · February 22, 2021, 4:58pm

I would not be surprised about random variation around 0.5, but the value of 0 at 0 is definitively wrong and could be 0.5. And to me it looks like the value at 1 is also not just 0.5 minus variation, because normally the first term makes a big jump above randomness.

I have to explain the outcome to people who know well to handle AUC plots, and they will be confused to see this.

I suspect that there is a calibration error and that the AUC was computed above the diagonal, not the bottom/right coordinates a usual.

mcol · February 22, 2021, 5:25pm

What’s your output of summary(vs, stats="auc")? I’m wondering if it’s a problem with the computation or with the plotting.

denne · February 22, 2021, 5:36pm

Plotting is correct. So op_group could be 0.49 when we use 2*auc.se, but that’s VERY unlikely from the data I know, the first predictor is definitively higher. I think the normalization should be something auc_correct = auc/2+0.5, just a wild guess…

size	solution_terms	auc	auc.se
0		0.00	0.00
1	op_group	0.44	0.05
2	reflux_preop	0.57	0.05
3	z_age_op	0.67	0.05
4	op_group:z_age_op	0.71	0.04
5	z_bmi_preop	0.71	0.04
6	op_group:z_bmi_preop	0.71	0.04
7	op_group:reflux_preop	0.70	0.04
8	(1 \| subject_id)	0.70	0.04

mcol · February 22, 2021, 5:38pm

@AlejandroCatalina Can you have a look at this?

AlejandroCatalina · February 23, 2021, 8:10am

Thanks @mcol for tagging me.

I believe we may be looking at the same thing under two different definitions. Our implementation corresponds to AUC between 0 and 1, where a value of 0 corresponds to 100% incorrect predictions and 1 to 100% correct predictions. Would this help you interpreting the plot?

denne · February 23, 2021, 8:30am

I fully agree with your definition. In a binomial situation, 100% incorrect prediction at 0 is equivalent to perfect prediction as I said above, the researcher just got the interpretation wrong. In your plot, the best prediction would be with 0 parameters, we just have to tell the researcher that he is down-under. Your measure might be helpful, but it is not what AUC of ROC in the general literature is. I have worked in radar detection for some years, but if you do not believe me, check the literature on the subject.

To cite wikipedia (bold by me):

Whereas ROC AUC varies between 0 and 1 — with an uninformative classifier yielding 0.5 — the alternative measures known as informedness…

denne · February 25, 2021, 1:12pm

I discussed this with a reviewer who correctly informed me that the definition of AUC does not agree with the standard. She suggested that I ask you how AUC was defined. The standard is the AUC in the ROC, but that one definitively has 0.5 as pure guessing, 0 and 1 a full knowledge in a binary decision.

Maybe you have reasons not to use a standard, but please give us the definition.

mcol · February 25, 2021, 5:38pm

I think that the computation of the AUC for a random guess is correct:

set.seed(1)
x <- cbind(V1=rbinom(100, 1, 0.5), V2=runif(100), V3=1)
projpred:::auc(x)
[1] 0.5172276

What data enters the auc() function for a model of size 0 I don’t know, so perhaps the problem is in that area.

denne · February 25, 2021, 5:44pm

Fine progress, so we agree on the definition! Knowing nothing is pure prior. AUC of 0 is NOT know-nothing, but know all and being headstrong.

denne · March 2, 2021, 9:51am

It is a bug in cv_varsel, varsel works as expected. See

Topic		Replies	Views
Out-of-sample predictions for mixed model are the same as naive model (ignoring the random effects) Modeling	1	1515	April 4, 2019
ROC modeling using placement values Modeling fitting-issues	3	613	November 8, 2019
Not good results fitting with rstan Modeling	2	895	January 20, 2022
About model comparison General	12	1044	September 24, 2022
I want to explain about my new Bayesian model Modeling cognitive-science	20	1303	November 11, 2019

AUC = 0 : perfect prediction

Related topics