Loo with k_threshold error for stan_polr()

Hmmm, sorry that didn’t fix it! Maybe there’s another issue in addition to the one I found. I’m not actually able to reproduce the error message

Error in .subset(x, j) : invalid subscript type ‘list’

so here are a few things could help narrow things down:

  • Does this only happen for you with stan_polr() or does it also happen with other rstanarm modeling functions, e.g., stan_glm()?

  • Are you able to share your data and code so I can reproduce this? If you can’t share them then is it possible to share a toy version using simulated data that also gives you this error? Anything that could help me be able to reproduce this particular error message.

Yes, I get the same issue when using glm. Below is a code that reproduces the error on my computer both with glm and polr. The mc.core=1 was to used to circumvent the issure raised in this thread: Problems running LOO on >1 core
Here is the required data.
reducedvilt.csv (11.4 KB)

df<-read.csv("reducedvilt.csv")
options(mc.cores = 1)
require(rstanarm)
require(rstan)
rstan_options(auto_write = TRUE)

moddef="Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha + Lan"


WL<-stan_glm(moddef , data = df)
loo(WL,k_threshold = 0.7)


df$Inno_art<-ordered(df$Inno_art,levels=1:5,labels=LETTERS[1:5])
WL<-stan_polr(moddef , data = df, method = "logistic",
                      prior = R2(0.25,what = "median"), init_r = 0.1, seed = 12346,
                      algorithm = "sampling")
loo(WL,k_threshold = 0.7)
1 Like

Thanks for sharing the code and data. I was able to reproduce this and I figured out that the problem is that one of the internal functions called by loo when k_threshold is specified assumes the model formula is of class formula as opposed to just a string. I will fix this so in the future it works, but until then you should be able to avoid the error by changing

to be a formula like this

moddef=as.formula("Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha + Lan")

If you do that does it avoid the error? My hunch is that it avoids the error when you use loo with k_threshold after stan_glm() but when you use it with stan_polr() you might still also need this

which is fixed on GitHub but not in your installed version of rstanarm.

Actually there may be another problem with loo stan_polr that I’m trying to sort out now, but the advice above should work for stan_glm. Will report back.

Ok so this is the issue that keeps giving. I’ve already fixed two bugs as a result of this and just found a third:

The reason this wasn’t detected sooner is that it requires some particular features in the data to trigger it.

So my post above has a fix for stan_glm() but with this particular dataset it’s not yet going to work with stan_polr() until we fix this new issue. Just need to discuss with @bgoodri to figure out the best way to fix it.

1 Like

Thanks for looking into the issue. I can confirm that implementing as.formula() made it possible to run stan_glm without issues. It did however not solve running stan_polr, even if it seemed to allow the process to proceed a bit further. Without the fit$zeta <- NULL hack, I got

3 problematic observation(s) found.
Model will be refit 3 times.

Fitting model 1 out of 3 (leaving out observation 22)
Error in x$z[omitted, , drop = FALSE] : incorrect number of dimensions

With the fit$zeta <- NULL hack, I got

3 problematic observation(s) found.
Model will be refit 3 times.

Fitting model 1 out of 3 (leaving out observation 22)

Fitting model 2 out of 3 (leaving out observation 25)

Fitting model 3 out of 3 (leaving out observation 173)
Error in draws$zeta[, y_i] : subscript out of bounds

…so it seems the refitting was now done for each flagged observation, but something went wrong after that.

Yeah exactly. I’ve found the problem (the issue described in the github link in my previous post) but I don’t know the exact fix yet. Once we sort it out (hopefully very soon) I’ll post back here.

Maybe a bit of progress. Are you able to install rstanarm from github and see if my potential fix works? Unfortunately because it’s installing from github it will have to compile all of rstanarm at install time (unlike CRAN when everything is pre-compiled):

Sys.setenv(MAKEFLAGS = "-j4") # number of cores to use (you can change)
devtools::install_github("stan-dev/rstanarm", build_vignettes = FALSE)

Unfortunately, I’m having trouble installing from github. This is what I keep getting, and I’ve tried different options for the “Enter one or more numbers…” stage with similar [lack of] progress:

Downloading GitHub repo stan-dev/rstanarm@master
stan-dev-rstanarm-5b9b826/tests/testthat/include: Can't create '\\\\?\\C:\\Users\\tomli96\\AppData\\Local\\Temp\\RtmpwtUaHX\\remotes6c5826649a7\\stan-dev-rstanarm-5b9b826\\tests\\testthat\\include'
stan-dev-rstanarm-5b9b826/tests/testthat/stan_files: Can't create '\\\\?\\C:\\Users\\tomli96\\AppData\\Local\\Temp\\RtmpwtUaHX\\remotes6c5826649a7\\stan-dev-rstanarm-5b9b826\\tests\\testthat\\stan_files'
tar.exe: Error exit delayed from previous errors.
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

 1: All                                  
 2: CRAN packages only                   
 3: None                                 
 4: tibble    (3.0.1   -> 3.0.3  ) [CRAN]
 5: vctrs     (0.3.0   -> 0.3.2  ) [CRAN]
 6: pillar    (1.4.4   -> 1.4.6  ) [CRAN]
 7: isoband   (0.2.1   -> 0.2.2  ) [CRAN]
 8: pkgload   (1.0.2   -> 1.1.0  ) [CRAN]
 9: pkgbuild  (1.0.8   -> 1.1.0  ) [CRAN]
10: backports (1.1.7   -> 1.1.8  ) [CRAN]
11: processx  (3.4.2   -> 3.4.3  ) [CRAN]
12: jsonlite  (1.6.1   -> 1.7.0  ) [CRAN]
13: httpuv    (1.5.2   -> 1.5.4  ) [CRAN]
14: htmltools (0.4.0   -> 0.5.0  ) [CRAN]
15: later     (1.0.0   -> 1.1.0.1) [CRAN]
16: promises  (1.1.0   -> 1.1.1  ) [CRAN]
17: dplyr     (0.8.5   -> 1.0.1  ) [CRAN]
18: shiny     (1.4.0.2 -> 1.5.0  ) [CRAN]
19: xfun      (0.13    -> 0.16   ) [CRAN]
20: openssl   (1.4.1   -> 1.4.2  ) [CRAN]
21: sys       (3.3     -> 3.4    ) [CRAN]
22: DT        (0.13    -> 0.14   ) [CRAN]

Enter one or more numbers, or an empty line to skip updates:

Installing 1 packages: rlang
Installing package into ‘C:/Users/tomli96/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/rlang_0.4.7.zip'
Content type 'application/zip' length 1129581 bytes (1.1 MB)
downloaded 1.1 MB

package ‘rlang’ successfully unpacked and MD5 sums checked
Error: Failed to install 'rstanarm' from GitHub:
  (converted from warning) cannot remove prior installation of package ‘rlang’
In addition: Warning message:
In utils::untar(tarfile, ...) :
  ‘tar.exe -xf "C:\Users\tomli96\AppData\Local\Temp\RtmpwtUaHX\file6c58a13c27.tar.gz" -C "C:/Users/tomli96/AppData/Local/Temp/RtmpwtUaHX/remotes6c5826649a7"’ returned error code 1

Any suggestion for what to try?

Hmm, what does the error message look like when you tell it to install none of them (option 3)? Right now it looks like there’s a message about installing the ‘rlang’ package, but that shouldn’t be necessary.

It says:

Error: Failed to install 'rstanarm' from GitHub:
  (converted from warning) cannot remove prior installation of package ‘rlang’
In addition: Warning message:
In utils::untar(tarfile, ...) :
  ‘tar.exe -xf "C:\Users\tomli96\AppData\Local\Temp\RtmpwtUaHX\file6c58d141724.tar.gz" -C "C:/Users/tomli96/AppData/Local/Temp/RtmpwtUaHX/remotes6c582f7e608b"’ returned error code 1

Weird. I don’t know what’s up with ‘rlang’, that doesn’t really have anything to do with rstanarm. I’m just guessing but maybe try remove.packages("rlang") and then install.packages("rlang") before trying to install rstanarm?

Thanks jonah, that seem to have worked. I got a warning message in the rstanarm installation:

Warning message:
In utils::untar(tarfile, ...) :
  ‘tar.exe -xf "C:\Users\tomli96\AppData\Local\Temp\RtmpmcxkkY\fileb9742b1c15e5.tar.gz" -C "C:/Users/tomli96/AppData/Local/Temp/RtmpmcxkkY/remotesb97427a4815"’ returned error code 1

…but it appears to have installed correctly. I can now run the example code I provided above in its entirety (without previously suggested hacks) without encountering any issues. Thank you very much!

1 Like

Glad that works. I’m not sure about that warning message, but if everything seems to be working I think you can safely ignore it.

Sorry @jonah, but I think there is still some issue. Uncertain if it’s with loo, rstanarm, or something else, but this code…

df<-read.csv("reducedvilt.csv")

options(mc.cores = 1)
require(rstanarm)
require(rstan)
require(loo)
rstan_options(auto_write = TRUE)

moddef="Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha + Lan"


df$Inno_art<-ordered(df$Inno_art,levels=1:5,labels=LETTERS[1:5])
WL<-stan_polr(moddef , data = df, method = "logistic",
                      prior = R2(0.25,what = "median"), init_r = 0.1, seed = 12346,
                      algorithm = "sampling")


moddef="Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha"


WoL<-stan_polr(moddef , data = df, method = "logistic",
              prior = R2(0.25,what = "median"), init_r = 0.1, seed = 12346,
              algorithm = "sampling")
LOOWith<-loo(WL,k_threshold = 0.7)
LOOWithout<-loo(WoL,k_threshold = 0.7)

…gives the error message…

Error in stanmat[, colnames(x), drop = FALSE] : subscript out of bounds

Swapping the order of which model is fitted first, i.e…



moddef="Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha"


WoL<-stan_polr(moddef , data = df, method = "logistic",
              prior = R2(0.25,what = "median"), init_r = 0.1, seed = 12346,
              algorithm = "sampling")

moddef="Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha + Lan"

WL<-stan_polr(moddef , data = df, method = "logistic",
              prior = R2(0.25,what = "median"), init_r = 0.1, seed = 12346,
              algorithm = "sampling")

LOOWithout<-loo(WoL,k_threshold = 0.7)
LOOWith<-loo(WL,k_threshold = 0.7)

…instead yields:

Error in eval(predvars, data, env) : object 'Lan' not found

In both cases, the first loo calculation works, but the second fail after refitting for pareto-flagged observations. I get the same results when running in parallel, but wanted to run serially to make sure there isn’t some issue related to previous problems with parallel chains on Windows. However, the following, which moves the loo calculation to immediately after stan_polr, works:

moddef="Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha + Lan"



WL<-stan_polr(moddef , data = df, method = "logistic",
                      prior = R2(0.25,what = "median"), init_r = 0.1, seed = 12346,
                      algorithm = "sampling")

LOOWith<-loo(WL,k_threshold = 0.7)
moddef="Inno_art ~ Vall_ha + Spann_ha + Socker_ha + Olje_ha + Balj_ha + Potatis_ha + Gron_ha + Majs_ha + Annat_ha"


WoL<-stan_polr(moddef , data = df, method = "logistic",
              prior = R2(0.25,what = "median"), init_r = 0.1, seed = 12346,
              algorithm = "sampling")

LOOWithout<-loo(WoL,k_threshold = 0.7)

Not sure what to make of it. It seems the loo calculation “remembers” the lastly fitted object and expects that object structure.

Hmm that’s weird, thanks for letting me know. I’ll try to find some time later today to look into this.

Thanks. For my focal problem, of which the published code and data above are subsets, it wasn’t a big deal to restructure the order of calculations to get it to work once I figured out that loo has to follow immediately after stan_polr, so it seems to be working now. But it’s not always convenient to set things up that way, and it would be great to be certain that there isn’t something fishy happening under the hood. Again, thanks for looking into it. Let me know if you want me to test-run something.

1 Like

Ok so I made a bit of progress. I found why this is messing it up but not yet how to fix it fully. But there should be another work around: it should avoid this error if you don’t use the same name (moddef) for the formula for both models. If you use moddef2 for the second one then I think that should avoid this error. It should work either way, but I haven’t totaly fixed it yet.

Ok I think it’s actually fixed now on GitHub. We needed to make sure moddef the symbol was evaluated and converted to the actual formula before calling the update() method. Some weirdness related to nested environments and lazy evaluation.

If you reinstall from GitHub it shouldn’t matter anymore what order you run the models and the calls to loo().

Thanks for your persistence with this, it helped me track down a tricky bug. Let me know if you still think there’s something broken.

Thanks jonah, that all seems to work fine now.

1 Like