Generated Quantities for 0 dim vectors/matrices

Dear stan users, I’m running into a problem with gqs() when generating from a stan model that may or may not contain some parameters.

I fit a model that contains a False/True flag matrix[flag ? nt : 0, flag ? nt : 0 ] phi;. If flag=FALSE, the parameter phi will not be estimated. Correctly, In the fitted stan model it will show as ..$ phi : num[0 , 0 ].
In the as.matrix(stan_fit), however, phi will not show up.
As a result, gqs() will complain: Exception: Variable phi missing ...

Any idea on how to have get around this problem?

Cheers,
Philippe

1 Like

Here’s a minimal exmple that illustrates my problem (I'm on the rstan development branch):

library(rstan)
m <- stan_model(model_code ='
data {
  int<lower = 0, upper = 1> flag;
}
parameters { 
  real y; 
  vector[flag ? 1 : 0] phi;
}
model {
  if ( flag == 1 ) {
    phi ~ normal(10, 0.1);
    y ~ normal(phi, 1);
  } else if (flag == 0) {
    y ~ normal(0, 1);
    }
}')

f <- sampling(m, data = list( flag = 1 ))

f
## Inference for Stan model: c8cae5506057d06542f71d41e99c8846.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## y      10.01    0.02 0.99  8.12  9.34  9.99 10.65 11.99  3757    1
## phi[1] 10.00    0.00 0.10  9.81  9.94 10.00 10.07 10.20  3024    1
## lp__   -0.96    0.02 0.96 -3.54 -1.33 -0.65 -0.27 -0.02  1578    1
## 
## Samples were drawn using NUTS(diag_e) at Tue Oct 22 10:11:13 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

And with flag = 0

f <- sampling(m, data =  list( flag =  0 ))
f
## Inference for Stan model: c8cae5506057d06542f71d41e99c8846.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##       mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## y     0.02    0.03 1.03 -1.99 -0.68  0.02  0.72  1.98  1175    1
## lp__ -0.53    0.02 0.75 -2.58 -0.68 -0.24 -0.06  0.00  1445    1
## 
## Samples were drawn using NUTS(diag_e) at Tue Oct 22 10:11:18 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

So far, so good: When flag = 0, phi is a null-dimensional vector and not included in the output – nor in the as.matrix(f)

GQS part:

First, model is compiled with vector[flag ? 1 : 0] phi; commented out. Flag will be always 0:

mc <-'
data {
  int<lower = 0, upper = 1> flag;
}
parameters {
  real y;
  //  vector[flag ? 1 : 0] phi;
}
generated quantities {
    real y_rep;
    y_rep = normal_rng(y, 1);
}'

m2 <- stan_model(model_code = mc)

f2 <- rstan::gqs(m2, draws = as.matrix(f), data = list( flag = 0))

f2
## Inference for Stan model: 17e2d78e13de006222877a7663a21f94.
## 1 chains, each with iter=4000; warmup=0; thin=1; 
## post-warmup draws per chain=4000, total post-warmup draws=4000.
## 
##       mean se_mean   sd  2.5%   25%  50%  75% 97.5% n_eff Rhat
## y_rep 0.02    0.04 1.46 -2.83 -0.96 0.02 1.03  2.87  1641    1
## 
## Samples were drawn using  at Tue Oct 22 10:11:25 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

This works because phi was commented out by hand.

Now, vector[flag ? 1 : 0] phi; is left in model, flag is still flag = 0

mc <-'
data {
  int<lower = 0, upper = 1> flag;
}
parameters {
  real y;
  vector[flag ? 1 : 0] phi;
}
generated quantities {
    real y_rep;
    y_rep = normal_rng(y, 1);
}'

m2 <- stan_model(model_code = mc)

## recompiling to avoid crashing R session

Problem:

Now let’s call gqs(), with flag = 0

f2 <- rstan::gqs(m2, draws = as.matrix(f), data =  list( flag =  0))

This returns Exception: Variable phi missing (in 'model49802b31ca9a_ea9a19bd60c8bdb46970b36a506c1c97' at line 7)

The summary indicates that nothing gets evaluated.

1 Like

I’ve changed the tag to more accurately reflect the content of the question. Please let me know if you disagree. I’ll also tag @bgoodri to make sure he sees this [he might be busy, though].

1 Like

Yeah, I don’t know how to handle that case at the moment.

So this looks like a bug in rstan, could you please file an issue for it (provided it was not already reported). Thanks and sorry we can’t help you more. (a workaround might be to instead have a matrix of size 1 when not estimating the parameter and give normal(0,1); distribution so that it is as easy as possible for the sampler).

1 Like

what happens if you run sampling with flag=0 and also specify initial parameter value for y?
does this produce a similar error?

I think it is possible to specify an initial value as an empty vector / matrix / array, but the interface to standalone_gqs takes a big matrix of draws which doesn’t have a column when the parameter vector / matrix / array is empty.

I took a closer look at the code and added my comments to the issue - https://github.com/stan-dev/rstan/issues/708

we could implement a workaround, which would set the value of missing parameters to 0 if the sample doesn’t contain a column for that param. this puts the burden on the user to provide a valid sample, that is, a sample that was generated from a model which corresponds to whatever’s going on in the generated quantities block.

had coffee, see the problem and solution -

3 Likes

I guess this request is obsolete? I can still try

request obsolete - many thanks