How to run a Stan program recursively?

Adriana · April 16, 2021, 9:25pm

Hi,

I have the following program in Stan:

data {
int<lower=0> N;
int<lower=0> y[N];
real my;
}
parameters {
real<lower=0> omega;
real<lower=0, upper=1> alpha;
real<lower=0, upper=1> beta;
}
model {
real theta[N];
theta[1] = omega+alphamy+betamy;
y[1] ~ poisson(theta[1]);
for (n in 2:N){
theta[n] = omega+alphay[n-1]+betatheta[n-1];
y[n] ~ poisson(theta[n]);
}
}

which runs just fine in cmdstan, rstan and cmdstanr. I would like to run it from “n in 2:N-100” and save the posterior, then from “n in 2:N-99” and save the posterior, …, and finally from “n in 2:N” and save the posterior, by having everything in one program and calling the same dataset of length N. Is this possible? Currently the only way I manage to do this is to run the above program separately for each “n in 2:N-i”, i=0,1,…,100 and also create separate data files of the correct length for each case i. This if of course time consuming if N is large.

Thank you! Adriana

jsocolar · April 16, 2021, 9:52pm

Hi Adriana,
Unless I’m missing something, almost none of the computation is redundant between the different fits that you’re looking for, so we’re not looking for any super-clever speedup tricks in the Stan code itself. The main trick is to come up with a way to format the data so that the dataset changes minimally between fits, and so that the model definition does not change (and therefore does not need recompilation). And you’re pretty much there already. The model file requires no changes at all. All you need to do is iteratively update the data file, and the only change you need to make is in the value of N. Thus, you can do something like:

my_model <- cmdstanr::cmdstan_model(...)
max_N <- ... #whatever the highest value of N is
for(i in 0:100){
  my_data_list$N <- max_N - i
  my_model$sample(data = my_data_list)
}

Cheers
Jacob

Edit: That said, you can also get speedup by vectorizing the loop in your code, regardless of this issue of needing to run the model multiple times.

Charles_Driver · April 16, 2021, 9:54pm

it would be nice if stan could handle this sort of thing but I don’t think it does. The only way I’ve achieved something like this is by creating an extra parameter, which I convert to an integer and use for indexing in some form, and ignore the gradient/s from that element/s. This only works if you’ve also got your own optimizer / sampler going though, at which point stan is ‘just’ a log prob and gradient calculator.
edit: maybe I’m wrong, I thought you wanted to do it without updating the data file (which then also requires creating a new stan object for fitting).

Funko_Unko · April 16, 2021, 10:12pm

You could make your parameters into a vector of 100 each and then fit each parameter only on the corresponding data. But this is the worst possible solution.

You can probably save some/much of the time by reusing the state of the sampler after warmup (if the data are identical / subsets).

Adriana · April 21, 2021, 12:12pm

Thank you Jacob. But it doesn’t work. I get the error message:

mismatch in dimension declared and found in context; processing stage=data initialization; variable name=y1; position=0; dims declared=(352); dims found=(354).

So here max_N=354 and I have (i in 0:2).

Adriana

Adriana · April 21, 2021, 12:34pm

And this is my data with max_N=354:

my_data_list ← list(N=354,
y = c(0,2,3,5,2,1,1,5,3,8,18,10,14,12,12,9,28,31,47,55,59,77,94,144,157,151,178,153,205,228,285,319,316,241,274,202,216,320,338,318,304,299,260,268,325,314,474,373,351,365,418,400,349,563,450,591,414,422,510,489,714,575,620,385,384,351,441,414,458,352,245,297,425,375,405,355,336,259,244,261,388,295,382,267,235,205,254,231,254,274,200,165,177,191,197,206,179,155,121,151,187,173,226,197,247,193,159,189,216,230,257,202,163,116,156,172,156,143,152,149,98,133,112,131,104,107,77,90,120,103,126,97,92,85,98,135,133,119,86,100,97,101,115,100,103,106,111,88,91,143,150,163,140,99,81,85,144,146,157,177,120,104,97,221,185,171,179,161,108,79,126,122,140,165,129,89,93,150,135,119,192,161,170,143,192,263,373,335,302,312,305,524,380,393,461,384,313,297,386,419,544,569,557,559,665,615,714,728,922,801,822,1022,1449,1381,1786,2131,2202,1656,1677,2266,2414,2658,2425,1868,1917,1622,2743,2531,2773,2484,2432,2366,2219,3829,3565,3493,3301,2975,2248,2149,3850,3580,3578,3459,3467,2628,2820,4667,3948,3505,3586,3634,2832,2890,4491,3773,3704,3200,3076,2503,2093,3276,3059,2584,2090,1938,1366,1350,2058,1761,1605,1396,1315,1094,1038,1705,1485,1338,1151,1052,877,864,1403,1344,1338,1219,1233,950,1027,1718,1491,1228,1173,1450,1017,1413,1980,1867,1629,1252,590,1659,1958,2121,3221,2946,2165,1477,2736,2429,3467,2814,2576,2441,2094,1852,1746,2864,2512,2231,1201,1961,1694,1643,2526,2268,2100,1823,1657,1338,1326,1973,1747,1817,1595,1488,1231,1265,1709,1028,1747,1564,1369,1122,1134,1383,1308,1239,1216,1185,900,954,1375,1208),
my = 973.4915254)

jsocolar · April 21, 2021, 10:04pm

Yup, you’re right, I overlooked that the Stan program uses N to declare the dimension of y. To avoid this issue, try the following Stan program and R script:

data {
  int<lower=0> N;
  int<lower=0> n_max;
  int<lower=0> y[n_max];
  real my;
}
parameters {
  real<lower=0> omega;
  real<lower=0, upper=1> alpha;
  real<lower=0, upper=1> beta;
}
model {
  real theta[N];
  theta[1] = omega+alpha*my+beta*my;
  y[1] ~ poisson(theta[1]);
  for (n in 2:N){
    theta[n] = omega+alpha*y[n-1]+beta*theta[n-1];
    y[n] ~ poisson(theta[n]);
  }
}

library(cmdstanr)  
my_model <- cmdstan_model(...)
my_data_list <-list(N=NA,
                    n_max = 354,
                    y = c(0,2,3,5,2,1,1,5,3,8,18,10,14,12,12,9,28,31,47,55,59,77,94,144,157,151,178,153,205,228,285,319,316,241,274,202,216,320,338,318,304,299,260,268,325,314,474,373,351,365,418,400,349,563,450,591,414,422,510,489,714,575,620,385,384,351,441,414,458,352,245,297,425,375,405,355,336,259,244,261,388,295,382,267,235,205,254,231,254,274,200,165,177,191,197,206,179,155,121,151,187,173,226,197,247,193,159,189,216,230,257,202,163,116,156,172,156,143,152,149,98,133,112,131,104,107,77,90,120,103,126,97,92,85,98,135,133,119,86,100,97,101,115,100,103,106,111,88,91,143,150,163,140,99,81,85,144,146,157,177,120,104,97,221,185,171,179,161,108,79,126,122,140,165,129,89,93,150,135,119,192,161,170,143,192,263,373,335,302,312,305,524,380,393,461,384,313,297,386,419,544,569,557,559,665,615,714,728,922,801,822,1022,1449,1381,1786,2131,2202,1656,1677,2266,2414,2658,2425,1868,1917,1622,2743,2531,2773,2484,2432,2366,2219,3829,3565,3493,3301,2975,2248,2149,3850,3580,3578,3459,3467,2628,2820,4667,3948,3505,3586,3634,2832,2890,4491,3773,3704,3200,3076,2503,2093,3276,3059,2584,2090,1938,1366,1350,2058,1761,1605,1396,1315,1094,1038,1705,1485,1338,1151,1052,877,864,1403,1344,1338,1219,1233,950,1027,1718,1491,1228,1173,1450,1017,1413,1980,1867,1629,1252,590,1659,1958,2121,3221,2946,2165,1477,2736,2429,3467,2814,2576,2441,2094,1852,1746,2864,2512,2231,1201,1961,1694,1643,2526,2268,2100,1823,1657,1338,1326,1973,1747,1817,1595,1488,1231,1265,1709,1028,1747,1564,1369,1122,1134,1383,1308,1239,1216,1185,900,954,1375,1208),
                    my = 973.4915254)

outputs <- list()
for(i in 0:100){
  my_data_list$N <- my_data_list$n_max - i
  outputs[[i+1]] <- my_model$sample(data = my_data_list)
}

Adriana · April 21, 2021, 10:38pm

It works! Thank you Jacob!

Topic		Replies	Views
Is there any way to re use stan program? General	2	506	December 14, 2022
Can a compiled program accept data arguments of varying length between runs? General	2	345	February 18, 2023
Recreating results in Stan (FIXED) General	4	809	July 17, 2017
How to speed up my Stan code? Modeling rstan , fitting-issues	8	744	June 14, 2021
Overwriting of variable - correct way? Modeling	3	323	July 26, 2022

How to run a Stan program recursively?

Related topics