Projpred

I have a stan_lm object - 15 covariates or factors/33 predictors.

There are no missing values in the data set.

I wish to try model selection.

The following is returned when I do get_refmodel(object) or varsel(object):

Error: NAs are not allowed in ‘newdata’.

The console output with traceback is:

StandardizedOME.stan_lm.5c.refmodel ← get_refmodel(StandardizedOME.stan_lm.5c)
Error: NAs are not allowed in ‘newdata’.

traceback()
12: stop(“NAs are not allowed in ‘newdata’.”, call. = FALSE)
11: validate_newdata(object, newdata = newdata, m = NULL)
10: posterior_linpred.stanreg(object, newdata = data.frame(zt), transform = T,
offset = rep(0, nrow(zt)))
9: rstanarm::posterior_linpred(object, newdata = data.frame(zt),
transform = T, offset = rep(0, nrow(zt)))
8: t(rstanarm::posterior_linpred(object, newdata = data.frame(zt),
transform = T, offset = rep(0, nrow(zt))))
7: predfun(z)
6: family$linkfun(predfun(z))
5: family$linkinv(family$linkfun(predfun(z)) + offset)
4: predmu(z, offset)
3: init_refmodel(z = z, y = y, family = fam, x = x, predfun = predfun,
dis = dis, offset = offset, wobs = wobs, wsample = wsample,
intercept = intercept, cvfits = NULL, cvfun = cvfun)
2: get_refmodel.stanreg(StandardizedOME.stan_lm.5c)
1: get_refmodel(StandardizedOME.stan_lm.5c)

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] projpred_1.1.5 performance_0.4.3 parameters_0.4.1 bayestestR_0.5.1
[5] cowplot_1.0.0 shinystan_2.5.0 shiny_1.4.0 bayesplot_1.7.1
[9] rstanarm_2.19.2 Rcpp_1.0.3 rstantools_2.0.0 rstan_2.19.2
[13] StanHeaders_2.21.0-1 emmeans_1.4.4 lmerTest_3.1-1 lme4_1.1-21

I can’t a mention of this error in documents pertaining to projpred.

Nathan

projpred with stan_lm works for me, so I need more information to be able to help

How many observations?
Does it work without factors?
Can you post the R code showing how you call stan_lm and projpred?
If you can send me the data (privately is ok), I can also try to reproduce the error.

What version of rstanarm do you have? I seem to remember that there was a bug that caused this, and it was fixed last year, so make sure you have the latest version.

OK, I rembered well: https://github.com/stan-dev/projpred/issues/21
If the issue is still present in the latest rstanarm, let us see an example so we can try to reproduce and fix.

It was in the session info

which is the latest in CRAN release October 8, so should have that fix

Here is the lead paragraph of the summary(object) statement.

Model Info:
function: stan_lm
family: gaussian [identity]
formula: log(StandardizedOME) ~ Age + AnesthesiaDuration + AnesthesiaTechniqueBlock +
AnesthesiaTechniqueGeneral + AnesthesiaTechniqueNeuraxial +
o.ASAClassFactor + EmergencyStatusYN + t.Race + Sex + REMI +
poly(NonOpioidAnalgesicsNumericCount, degree = 2) + BaseUnit +
AIM1YearQuarter + MPOGInstitutionID + PrimaryAnesthesiaCPTBucket
algorithm: sampling
sample: 32000 (posterior sample size)
priors: see help(‘prior_summary’)
observations: 1104324
predictors: 33

I attempted the varsel call:

vs ← varsel(StandardizedOME.stan_lm.5c)
Error: NAs are not allowed in ‘newdata’.

traceback()
13: stop(“NAs are not allowed in ‘newdata’.”, call. = FALSE)
12: validate_newdata(object, newdata = newdata, m = NULL)
11: posterior_linpred.stanreg(object, newdata = data.frame(zt), transform = T,
offset = rep(0, nrow(zt)))
10: rstanarm::posterior_linpred(object, newdata = data.frame(zt),
transform = T, offset = rep(0, nrow(zt)))
9: t(rstanarm::posterior_linpred(object, newdata = data.frame(zt),
transform = T, offset = rep(0, nrow(zt))))
8: predfun(z)
7: family$linkfun(predfun(z))
6: family$linkinv(family$linkfun(predfun(z)) + offset)
5: predmu(z, offset)
4: init_refmodel(z = z, y = y, family = fam, x = x, predfun = predfun,
dis = dis, offset = offset, wobs = wobs, wsample = wsample,
intercept = intercept, cvfits = NULL, cvfun = cvfun)
3: get_refmodel.stanreg(object, …)
2: get_refmodel(object, …)
1: varsel(StandardizedOME.stan_lm.5c)

I ran the toy example shown in projpred Quick Start
2018-10-16 for Gaussian data.

It completed as expected.

This is an academic, not commerical project. But I am not allowed to move the data off the server.

Nathan

My project has two data sets. The second is n about 1000. I have a stan_glm model on a Windows server, 6 covariates, 1 binary, and 1 factor with 12 predictors.
Convergence diagnostics seem ok.

Model Info:
function: stan_glm
family: binomial [logit]
formula: OpioidTaking ~ AnxietyTScore + DepressionTScore + Worst + Average +
PhysicalFunctionTScore + ReducedBodyMap + SitePainLastWeek +
SleepRawScore
algorithm: sampling
sample: 40000 (posterior sample size)
priors: see help(‘prior_summary’)
observations: 958
predictors: 12

I get the same error with varsel(object):

Error: NAs are not allowed in ‘newdata’

R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] projpred_1.1.5 emmeans_1.4.4 loo_2.2.0 brms_2.11.1
[5] tidybayes_2.0.1 bayestestR_0.5.1 rstanarm_2.19.2 Rcpp_1.0.3

I ran the toy gaussian example without difficulty.

Nathan

Do you have NAs in your data in any column?

Can you try without factors and poly? I’m asking because the current version was not designed to work with them. The next release is designed to handle them correctly.

I remodeled (binomial family) the smaller data set (Windows server) including only the columns in the model and removing all rows with any NAs. I used the same variables (continuous, binary, factor).

I am able to run varsel(fit) successfully.

Later I’ll rerun the larger dataset model (Linux server).

Thanks for the suggestion.

Nathan

2 Likes

Factors are important in my models. Is there an ETA for the version of projpred that can handle factors correctly?

A pull request https://github.com/stan-dev/projpred/pull/38 was made last week,
so you can try it right now. If you are interested to try, but don’t know how to install from a github branch, please ask and I’ll give more instructions. PR is quite big so it will take some time to review and merge to master. We haven’t yet decided whether other features will be merged before making a new CRAN release. This is overall a big change for projpred.

Thanks for pointing me to github. I read 100+ comments with replies about minor code tweaking of projpred.

For the moment I’ll wait until the master is ready.

I have a simple (naive) question for which I don’t recognize an explanation in the projpred pdf file.

In varsel_plot there is a dashed red line in the elpd figure. How is that related to the reference model elpd?

Similar question concerning the dashed red line in the acc plot.

Also, in varsel_plot the default alpha is 0.32. Is that a better choice than 0.95 (2 SEs)?

I did warn you that it’s a big PR and in progress :D

We’ll post in discourse when it’s merged and ready for testing

With deltas=FALSE it’s the elpd of the reference model, with deltas=TRUE it’s 0.

Correspondingly,

It depends what you want. We wanted by default models which are more similar to the reference model. alpha=0.32 corresponds to 68% interval (1 SE). alpha=0.95 is not valid. alpha=0.05 would correspond to 95% interval (approx 2 SEs). Smaller alpha would suggest selecting models which are more likely to be worse than the full model, which is fine if you want smaller models. We are considering an alternative criterion, but more about that in a few weeks.

Appreciate the insights.

Reading the various papers about projpred, it doesn’t appear to handle models created by stan_glmer.

Correct?

Soon, very soon. There is already a pull request if you’re brave to start testing. We’ll announce when it’s merged.

Hehehe, isn’t it nice, Aki, when you have so many people on the fence for things you guys develop? ;)

1 Like

Yes, it’s great to know there is interest before we start our selling pitch :D

I have installed version 1.1.6.

I am assuming that this can handle multilevel factors.

Correct?

Nathan

Multilevel factors are available in develop branch in github, but not yet in CRAN. As usual refactoring has taken more time than expected. At least few more weeks.