Generated Quantiles / sampler in R with parameter vector of length 1

Can’t run a gqs on a model with parameter of variable vector length when that length becomes 1:

test.stan

data {
  int N;
  int K;
  int<lower=0,upper=1> y[N];
  matrix[N,K] X;
}
parameters {
  vector[K] b;
}
model {
  y ~ bernoulli_logit(X*b);
}

test_sim.stan

data {
  int N;
  int K;
  matrix[N,K] X;
}
parameters {
  vector[K] b;
}
generated quantities {
    vector[N] y_est = rep_vector(0,N);
    {
      y_est = X*b;
    }
}

R code:

require(rstan)

# R code

data<-list()  
data$y <- c(rep(1,10),rep(0,5))
data$N <- length(data$y)
data$K <- 1
data$X <- array (1, c(data$N,data$K))


test_m<-stan_model(file="test.stan")

fit <- sampling(test_m, data)

test_sim_m<-stan_model(file="test_sim.stan")
as.matrix(fit)
fit_proj <- gqs(test_sim_m, data=data, draws=as.matrix(fit))

out<-rstan::extract(fit_proj)

The above works for K>1 but breaks when K=1. Specifically the gqs step says:
Exception: Variable b missing (in 'model573821a07f15_test_sim' at line 7)

The variable is shown in as.matrix(fit).

> as.matrix(fit)[1:5,]
          parameters
iterations      b[1]       lp__
      [1,] 0.4345688  -9.662128
      [2,] 0.4345688  -9.662128
      [3,] 1.1326903  -9.852514
      [4,] 1.3721195 -10.250517
      [5,] 0.2968098  -9.819760

Session Info:

> sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default
...
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.19.3         ggplot2_3.3.1        StanHeaders_2.21.0-5

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6       pillar_1.4.4       compiler_4.0.1     prettyunits_1.1.1  tools_4.0.1        pkgbuild_1.0.8     lifecycle_0.2.0   
 [8] tibble_3.0.1       gtable_0.3.0       pkgconfig_2.0.3    rlang_0.4.6        cli_2.0.2          rstudioapi_0.11    parallel_4.0.1    
[15] loo_2.2.0          gridExtra_2.3      withr_2.2.0        dplyr_1.0.0        generics_0.0.2     vctrs_0.3.1        stats4_4.0.1      
[22] grid_4.0.1         tidyselect_1.1.0   glue_1.4.1         inline_0.3.15      R6_2.4.1           processx_3.4.2     fansi_0.4.1       
[29] callr_3.4.3        purrr_0.3.4        magrittr_1.5       scales_1.1.1       ps_1.3.3           codetools_0.2-16   ellipsis_0.3.1    
[36] matrixStats_0.56.0 assertthat_0.2.1   colorspace_1.4-1   RcppParallel_5.0.1 munsell_0.5.0      crayon_1.3.4  

Looks like it may be related to this:

I’m reasonably sure this is a bug? gqs should really be able to interpret the output of sampler using essentially the same model?

Thanks for reporting this. I agree this seems like a bug and I can confirm that I can reproduce this with rstan. I’m not yet sure if it’s a bug in rstan’s gqs() function or in the underlying implementation in Stan.

@mitzimorris The example from @lrossouw works in rstan with K=2 but not with K=1, but when I tried in cmdstanpy (as a way of telling if this was an issue with the underlying standalone generated quantities implementation as opposed to an issue with rstan) I couldn’t even get it to work with K=2. Am I making a mistake in my cmdstanpy usage?

import cmdstanpy
import os
import numpy as np

cmdstanpy.set_cmdstan_path(os.path.join("/Users/jgabry/.cmdstanr/cmdstan-2.23.0"))

# these are the two Stan programs from the original post
test_code = os.path.join("/Users/jgabry/Desktop/cmdstanr-test/gqs-test/test.stan")
test_ppc_code = os.path.join("/Users/jgabry/Desktop/cmdstanr-test/gqs-test/test-sim.stan")

test_model = cmdstanpy.CmdStanModel(stan_file=test_code)
test_ppc_model = cmdstanpy.CmdStanModel(stan_file=test_ppc_code)

data1 = {"N" : 15,  "y" : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0], 
         "K" : 1,  "X" : np.ones((15, 1))}
data2 = {"N" : 15,  "y" : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0], 
         "K" : 2,  "X" : np.ones((15, 2))}

# sampling runs fine
fit1 = test_model.sample(data = data1) 
fit2 = test_model.sample(data = data2) 

# both of these error but with rstan only the first one errors
gqs1 = test_ppc_model.generate_quantities(data=data1, mcmc_sample=fit1) 
gqs2 = test_ppc_model.generate_quantities(data=data2, mcmc_sample=fit2) 

Any idea where I’m going wrong?

Sorry it’s a dumb example. It’s not really sensible. Just mashed down a minimal model copied from another bug report. Not sure if that causes the problem you are having.

My real use cases uses new data to project on test-sim.stan using draws from test.stan.

I’ve just added a dummy variable on my use case as a work around to ensure K>=2.

No problem, it’s actually a pretty good example for debugging!

1 Like

Shall I file an issue on GitHub?

this is a CmdStan problem, and I’ll file an issue and will investigate.

2 Likes

Ah so RStan uses CmdStan? Not sure how it fits together. I was using RStan.

you’re right - the error is most likely in the Stan services layer.
issue should have been filed on the Stan repo.

3 Likes

Thanks @lrossouw and @mitzimorris!

Marking @mitzimorris comment as solution as bug filed and fix merged. Thanks!

2 Likes