How can I change the way estimates are printed by fit$draws()

Hi,

I’m working with the following model:

data{
int N;
vector[N] obs_times;
vector[N] t;
}
parameters{
real<lower=0> lambda;
}
model{
obs_times~exponential(lambda);
lambda~normal(0,5);
}
generated quantities{
  //Draw survival function
  vector[N] surv;
  for(i in 1:N){
    surv[i]=exp(-lambda*t[i]);
  }
}

which I run as follows:

library(cmdstanr)
dat<-list(N=100,obs_times=data$x,t=seq(0,1.5,length=100))
mod<-cmdstan_model("surv_exp.stan")
fit<-mod$sample(data=dat,
                chains=4,
                parallel_chains=4)

I would like to extract “surv” values to make a plot. However, I found that “surv” values are not printed in the right order:

> fit$draws("surv",format="list")
# A draws_list: 1000 iterations, 4 chains, and 100 variables

[chain = 1]
$`surv[1]`
 [1] 1 1 1 1 1 1 1 1 1 1

$`surv[2]`
 [1] 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96

$`surv[3]`
 [1] 0.92 0.92 0.93 0.93 0.91 0.92 0.92 0.92 0.92 0.92

$`surv[4]`
 [1] 0.88 0.89 0.89 0.89 0.87 0.88 0.88 0.88 0.88 0.88

I would expect to the see the values ordered like this: 1, 0.96, 0.92, 0.88…

If I convert the cmdstanr object into an rstan object and then extract “surv” values, then the values are printed in the right order:

library(rstan)
stanfit <- rstan::read_stan_csv(fit$output_files())
post<-extract(stanfit)

post$surv[1:2,]
          
iterations [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]
      [1,]    1 0.960775 0.923088 0.886879 0.852091 0.818668 0.786555 0.755702 0.726060
      [2,]    1 0.951013 0.904426 0.860121 0.817987 0.777916 0.739809 0.703568 0.669102
          
iterations    [,10]    [,11]    [,12]    [,13]    [,14]    [,15]    [,16]    [,17]    [,18]
      [1,] 0.697580 0.670217 0.643927 0.618669 0.594402 0.571086 0.548685 0.527163 0.506484
      [2,] 0.636325 0.605154 0.575509 0.547317 0.520505 0.495008 0.470759 0.447698 0.425767

How can I change the way the values are printed?
data_surv.csv (1.8 KB)

Something similar to the extract rstan capability is available through the draws_rvars functionality of the posterior R package.

Example:

library(cmdstanr)

fit <- cmdstanr_example()

a <- fit$draws(format = "list")

b <- posterior::as_draws_rvars(a)

in the example there are 3 beta parameters:

> posterior::subset_draws(a, variable = "beta")
# A draws_array: 1000 iterations, 4 chains, and 3 variables
, , variable = beta[1]

         chain
iteration     1     2     3     4
        1 -0.17 -0.52 -0.75 -1.01
        2 -0.73 -0.74 -1.00 -0.78
        3 -0.45 -0.67 -1.13 -0.72
        4 -0.71 -0.37 -0.44 -0.53
        5 -0.65 -0.41 -1.00 -1.16

, , variable = beta[2]

         chain
iteration       1      2     3      4
        1 -0.2524 -0.093 -0.30 -0.513
        2 -0.5772 -0.369 -0.40 -0.359
        3 -0.3713 -0.298 -0.38 -0.376
        4 -0.4515 -0.269 -0.37 -0.083
        5  0.0029  0.094 -0.29 -0.273

, , variable = beta[3]

         chain
iteration    1    2    3    4
        1 0.40 0.50 0.62 1.15
        2 1.20 0.61 0.32 0.85
        3 0.96 0.31 0.72 0.62
        4 0.92 0.79 1.02 0.96
        5 0.66 0.43 0.63 0.35

# ... with 995 more iterations
> b$beta
rvar<1000,4>[3] mean ± sd:
[1] -0.67 ± 0.25  -0.28 ± 0.23   0.68 ± 0.27 

Also see rvar: The Random Variable Datatype • posterior

1 Like

I tried it:

post<-fit$draws(“surv”,format=“list”)
a ← posterior::as_draws_rvars(post)
posterior::as_draws_rvars(a)

A draws_rvars: 1000 iterations, 4 chains, and 1 variables
$surv: rvar<1000,4>[100] mean ± sd:
[1] 1.000 ± 0.0000 0.955 ± 0.0044 0.913 ± 0.0084 0.872 ± 0.0120 0.833 ± 0.0152
[6] 0.796 ± 0.0182 0.760 ± 0.0209 0.726 ± 0.0232 0.694 ± 0.0254 0.663 ± 0.0272
[11] 0.633 ± 0.0289 0.605 ± 0.0304 0.578 ± 0.0317 0.552 ± 0.0327 0.528 ± 0.0337
[16] 0.504 ± 0.0345 0.482 ± 0.0351 0.460 ± 0.0357 0.440 ± 0.0361 0.420 ± 0.0364
[21] 0.402 ± 0.0366 0.384 ± 0.0367 0.367 ± 0.0367 0.351 ± 0.0367 0.335 ± 0.0366

but I couldn’t find a way of not getting the summarized estimates (ie 0.955 ± 0.0044). For my workflow I would need to be able to access individual “surv” objects like this:

$surv[2]
[1] 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96

$surv[3]
[1] 0.92 0.92 0.93 0.93 0.91 0.92 0.92 0.92 0.92 0.92

$surv[4]
[1] 0.88 0.89 0.89 0.89 0.87 0.88 0.88 0.88 0.88 0.88

I believe

posterior::draws_of(a$surv[2])

returns you that.

post<-fit$draws(format="list")
a<-posterior::as_draws_rvars(post)
posterior::draws_of(a$surv[2])

Returns:

2    0.947864
3    0.956244
4    0.950469
5    0.957255
6    0.951365

The output is still written out of order. Did I write the model in a way that is not expected?

Quick question. Do you want ‘all draws for one surv’ item or ‘one draw’ with all surv items?

Each surv[i] contains values from a survival curve. I would like to obtain the values in the following order:

0.960775 0.923088 0.886879 0.852091 0.818668 0.786555 0.755702 0.726060

Right now cmdstanr is printing the values like this:

$surv[2]
[1] 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96

$surv[3]
[1] 0.92 0.92 0.93 0.93 0.91 0.92 0.92 0.92 0.92 0.92

$surv[4]
[1] 0.88 0.89 0.89 0.89 0.87 0.88 0.88 0.88 0.88 0.88

So I think the right answer is “one draw with all surv items”.

This case study has examples of obtaining expectation of the curve or draws from the posterior of the curve Gaussian process demonstration with Stan. It seems you are looking for something similar. See also Using tidybayes with the posterior package • tidybayes for approach combined with tidybayes plotting.

2 Likes

I wrote this Add rstan::extract()-like feature to fit$draws() · Issue #632 · stan-dev/cmdstanr · GitHub, but pasting it here as well so its all in once place:

library(cmdstanr)

fit <- cmdstanr_example()

library(rstan)
stanfit <- rstan::read_stan_csv(fit$output_files())
post<-extract(stanfit)

post$beta[1:2,]

will print:

> post$beta[1:2,]
          
iterations      [,1]      [,2]     [,3]
      [1,] -0.314607 -0.345438 0.913696
      [2,] -0.630833 -0.188254 0.426538

You can get the exact same thing by running:

a <- posterior::as_draws_rvars(fit$draws())

posterior::draws_of(a$beta)[1:2,]

and get

> posterior::draws_of(a$beta)[1:2,]
       [,1]      [,2]    [,3]
1 -0.508033 -0.731428 1.07239
2 -0.544134 -0.641063 1.05044

The only difference is that rstan::extract permutes the samples (I don’t have enough background in rstan to know why is that so, I am sure there is a valid reason). So running

library(rstan)
stanfit <- rstan::read_stan_csv(fit$output_files())
post<-extract(stanfit)

post$beta[1:2,]

will print different results every time for the same CSV files.

2 Likes

Historical reasons, making a wrong solution to a valid problem, and it has been much regretted choice for a long time. It’s just better to use posterior package for everything and if it doesn’t do something you like, ask if that could be added there. There are already a lot of useful functions for working with dras that are easier than trying to figure out indexing of the draws object directly Function reference • posterior

3 Likes

Thanks @rok_cesnovar @avehtari . This piece of code answers my question:

a <- posterior::as_draws_rvars(fit$draws())
posterior::draws_of(a$surv)[1:2,]
1 Like