Help Translating Model from BUGS

I am trying to convert a model from the BUGS language (implemented in JAGS) but am having trouble, maybe in part because I’m not clear exactly what’s happening in the JAGS model. The model is from Schofield et al. 2016: https://www.tandfonline.com/doi/abs/10.1080/01621459.2015.1110524

The model is:

log(y_{it}) \sim \mathcal{N}(\alpha_0 + \alpha_1 age_{it} + \eta_t, \sigma_y^2)
\eta_t \sim \mathcal{N}(\beta X_t, \sigma_{\eta}^2)
X_t \sim \mathcal{N}(\mu_x, \sigma_x^2)

The current (with a bit more hierarchical complexity than described above) JAGS model is:

model{

for(i in 1:M){
  for(j in f[i]:l[i]) {
    y[i,j] ~ dnorm(alpha0[i] + alpha1[i]*age[i,j] + eta[j], 1/sd_y[i]/sd_y[i])
  }
  alpha0[i] ~ dnorm(mu_a0, 1/sd_a0/sd_a0)
  alpha1[i] ~ dnorm(mu_a1, 1/sd_a1/sd_a1)T( , 0) # truncated norm to force negative
  
  sd_y[i] ~ dt(0, 0.04, 3)T(0, ) # variance differs by tree
}

mu_a0 ~ dnorm(0, 0.0001)
mu_a1 ~ dnorm(0, 0.0001)T( , 0)
sd_a0 ~ dt(0, 0.04, 3)T(0, )
sd_a1 ~ dt(0, 0.04, 3)T(0, )

  for(t in 1:Tea) {
    x[t] ~ dnorm(mu_x, 1 / sd_x / sd_x)
    eta[t] ~ dnorm(beta0 * x[t], 1 / sd_eta / sd_eta)
  }

mu_x ~ dnorm(0, 0.0001)
sd_x ~ dt(0, 0.04, 3)T(0, )
beta0 ~ dnorm(0, 0.0001)
sd_eta ~ dt(0, 0.04, 3)T(0, )
}

y_{it} is a matrix with observations across all 500 values of t. The primary interest of the model is to backcast X_t (reconstruct historic climate). In the JAGS model X_t come in as data with values t 1-400 as NA and then 401-500 as observed values. With this formulation the missing values are estimated as the historic climate. I am not sure if sd_x is being estimated from just the observed values or if it’s somehow being estimated across all the potential values X_t given the observed values of y_{it} and age_{it}.

However, in Stan with the separate data, model, and generated quantities block I am not sure how to handle estimating the historic (unobserved) X_t values. I assume since X_t are brought in as data then I can’t then have X_t ~ normal(mu_x, sd_x). But if I wanted to put it in the generated quantities, I’m not sure where sd_x would come from. Maybe the model would need to be reparameterized. Any suggestions would be appreciated.

This isn’t the easiest model to start with, but I think it would be something like this in Stan:

data {
  int<lower = 0> N_obs;   // observations on x apparently 100
  vector[N_obs] x_obs;
  int<lower = 0> N_miss; // missing observations on x, apparently 400
  vector[N_obs + N_miss] log_y;
  vector[N_obs + N_miss] age;

  // I'm not really sure what these are
  int<lower = 0> M; // number of trees?
  int<lower = 1, upper = M> tree_ID[N];
}
transformed data {
  int N = N_obs + N_miss;
}
parameters {
  vector[N_miss] x_miss;
  vector[N] eta_raw;

  vector[M] alpha_0_raw;
  vector<upper = 0>[M] alpha_1;
  vector<lower = 0>[M] sigma_y;

  // need good priors for all of these
  real mu_x;
  real<lower = 0> sigma_x;
  real<lower = 0> sigma_eta;
  real beta;
  real mu_a0;
  real<upper = 0> mu_a1;
  real<lower = 0> sd_a0;
  real<lower = 0> sd_a1;
}
model {
  vector[N] x = append_row(x_miss, x_obs);
  vector[N] eta = beta * x + sigma_eta * eta_raw;
  vector[N] alpha0 = mu_a0 + sd_a0 * alpha0_raw; 
 
  log_y ~ normal(alpha_0[tree_ID] + alpha_1[tree_ID] * age + eta, sigma_y);
  x ~ normal(mu_x, sigma_x);
  eta_raw ~ std_normal(); // implies eta ~ normal(beta * x, sigma_eta);
  
  alpha0_raw ~ std_normal(); // implies alpha0 ~ normal(mu_a0, sd_a0)
  alpha1 ~ normal(mu_a1, sd_a1);
  target += -M * normal_lccdf(0 | mu_a1, sd_a1); // accounts for truncation
  sd_y ~ student_t(3, 0, 0.02); // do not need to account for truncation
  
  // put good priors here on all those scalars in the parameters block
}

So, to answer your question, the missing values on x go in the parameters block (unless there is some way you can integrate them out of the posterior distribution analytically, although you may be able to try using integrate_1d to integrate them out numerically). And then they get joined to the observed values of x at the top of the model block to form the complete data likelihood.

Perhaps the biggest thing you have to get when translating a model from BUGS to Stan is not the differences in syntax (although there are some; for example, Stan uses the standard deviation rather than the precision in the univariate normal distribution) but that the priors in the BUGS version are terrible. Take for example the normal prior on mu_x which has a mean of zero (which might be ok) and a precision of 0.0001, which works out to a standard deviation of 100. That is way too loose to be useful when 80% of the x values are missing. The same goes for a lot of the other priors in the original that you will have to tighten up when you put them into the model block of your Stan program.

You mostly do not need to use explicit truncation of the priors in this case because they are declared with lower or upper bounds when applicable in the parameters block and the amount of area being truncated away is a constant. The exception is alpha1, which you assert is negative but depends on the unknown mu_a1 and sd_a1 so the amount of area truncated away is not a constant.

You will find in hierarchical models — especially when the outcome is noisy — that non-centered parameterizations of priors in the location-scale family of distributions tend to work better in Stan (fewer divergent transitions, better effective sample size per second, etc.) That is what I have done with eta and alpha0 by writing them as a function of a “raw” primitive variable that has a standard normal prior.

2 Likes

Thanks so much! There a lot for me to unpack here as a Stan noob. That being said, I think I get the gist of the critical parts for my problem. You also wrote the code in long form (y_i) rather than wide form (y_it). I think that makes things easier and faster, allowing for vectorization, since the individual trees are recruited and die in different years and live different numbers of years. In the JAGS version I had loops over just the years that each tree was living. I’m going to try it out now. Thanks again. My next challenge will be to figure out how to make \alpha_0 and \alpha_1 vary by individual tree within species.

I ran into a challenge with the implementation in the long form, but I’m not sure. There are many more values of log_y than there are of x because it’s assumed that each tree is experiencing the same climate. So I’m reconstructing 400 years of past climate using observations from many overlapping trees of different ages. So in long form, I would have x repeated many times for different tree ring widths (log_y). That is where the matrix form of y_{it} was handy in the JAGS model. In that case I just looped through each tree for the years it was growing (f[i] = first year of growth for tree i). Any ideas how to handle this in Stan and maintain vectorization?

I think the model block would just be

vector[N] x = append_row(x_miss, x_obs)[tree_ID];

or replace tree_ID with whatever one-dimensional integer array of the same size as log_y expands the unique values of x_miss and x_obs into the same size as log_y. It is the same syntax as in R:

letters[rep(1:26, each = 2)]

yields

 [1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" "f" "f" "g" "g" "h" "h" "i" "i" "j" "j" "k" "k" "l" "l" "m" "m"
[27] "n" "n" "o" "o" "p" "p" "q" "q" "r" "r" "s" "s" "t" "t" "u" "u" "v" "v" "w" "w" "x" "x" "y" "y" "z" "z"

I guess it’s not a problem having in x_obs + x_miss in the same length as log_y. But with this setup, if I understand it correctly, I will estimate x_miss for each tree in each year. Since it would be the same across all trees in a given year, I really just want to estimate 400 values of x_miss rather than 400 x N_tree_obs. Maybe part of the problem is that the number of x_obs differs by tree.

Just for completeness I’ll post a link to the github repo with the code and full data: https://github.com/djhocking/dendro_bayes

You’ve already helped more than could be expected. Thank you so much.

It is fine if you only want one x_miss value per year. So, I guess there would be 400 of those. You just need a integer array that you can create in R with 400 unique values but a size that is the same as the size of log_y. For example, if the first three observations in log_y pertain to tree 1 and the next two observations on log_y pertain to tree 2, then the tree_ID variable would start like

1, 1, 1, 2, 2, ...

So, you can use that tree_ID to expand shorter vectors (with unique elements) into longer vectors (with duplicated elements) as needed, such as when evaluating the log-likelihood.

1 Like

Ok I think I get that now and how it’s used in this case. I’m finally at a point of trying to run it but get a vector * vector error associated with alpha1[tree_id] * age.

  error in 'model341345fd0ed_negexp_linear' at line 41, column 63
  -------------------------------------------------
    39:   vector[N] alpha0 = mu_a0 + sd_a0 * alpha0_raw; 
    40:  
    41:   log_y ~ normal(alpha0[tree_id] + alpha1[tree_id] * age + eta, sigma_y);
                                                                      ^
    42:   x ~ normal(mu_x, sigma_x);
  -------------------------------------------------

I thought alpha1[tree_id] would be a scalar so that it could be multiplied by the vector of ages. But I’m struggling doing this without loops and indexing. Oh how I love for loops. Maybe I should have sacrificed the speed for my ease of understanding with this leap forward in my Stan (and stats) difficulty.

If I replace the vectorized sampler with a loop:

 for(n in 1:N) {
 log_y[n] ~ normal(alpha0[tree_id[n]] + alpha1[tree_id[n]] * age[n] + eta[n], sd_y[tree_id[n]]);
 }

Then I get the error:

Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: assign: Rows of left-hand-side (44311) and rows of right-hand-side (247) must match in size  (in 'model341618dfa2b_negexp_linear' at line 40)

[1] "Error in sampler$call_sampler(args_list[[i]]) : "                                                                                                          
[2] "  Exception: assign: Rows of left-hand-side (44311) and rows of right-hand-side (247) must match in size  (in 'model341618dfa2b_negexp_linear' at line 40)"
error occurred during calling the sampler; sampling not done

The numbers make sense in that the number of observations (N) for log_y and age is 44311 and the number of unique trees (M) is 247 which is the length of alpha0 and alpha1 and sd_y. I’m just not sure why this is happening.

If alpha1 is a vector such that alpha1[tree_id] is a vector that is the same size as age, then you need to use elementwise multiplication, which is implemented with the .* operator, rather than regular multiplication that is implemented with the * and being interpreted as a vector product between vectors that are not conformable for multiplication.

1 Like

Oh of course. I figured that was the issue which is why I made the loop to be explicit. I had missed the .* operator, which is good to know so I can keep the vectorized form.

I found the other issue with the mismatch of dimensions on the left and right side. I had alpha0 and alpha1 defined as length M when declared but they needed to be length N with indexing by tree_id so they have M unique values.

Thank you for all the help!. For completeness here “working” Stan code (not yet evaluated for fit, pathologies, or good priors):

data {
  int<lower = 0> N_obs;   // observations on x (climate)
  vector[N_obs] x_obs;
  int<lower = 0> N_miss; // missing observations on x to estimate
  vector[N_obs + N_miss] log_y;
  vector[N_obs + N_miss] age;

  int<lower = 0> M; // number of trees
  int<lower = 0> N; // number of measured tree ring widths
  int<lower = 1, upper = M> tree_id[N];
  int<lower = 1, upper = N> year[N];
}
transformed data {
  // int N = N_obs + N_miss; // brought in as data instead because used above
}
parameters {
  vector[N_miss] x_miss;
  vector[N] eta_raw;

  vector[N] alpha0_raw;
  vector<upper = 0>[N] alpha1;
  vector<lower = 0>[N] sd_y;
 
  // need good priors for all of these
  real mu_x;
  real<lower = 0> sd_x;
  real<lower = 0> sd_eta;
  real beta0;
  real mu_a0;
  real<upper = 0> mu_a1;
  real<lower = 0> sd_a0;
  real<lower = 0> sd_a1;
}
model {
  vector[N] x = append_row(x_miss, x_obs)[year];
  vector[N] eta = beta0 * x + sd_eta * eta_raw;
  vector[N] alpha0 = mu_a0 + sd_a0 * alpha0_raw; 
 
 log_y ~ normal(alpha0[tree_id] + alpha1[tree_id] .* age + eta, sd_y[tree_id]);
  x ~ normal(mu_x, sd_x);
  eta_raw ~ std_normal(); // implies eta ~ normal(beta * x, sigma_eta);
  
  alpha0_raw ~ std_normal(); // implies alpha0 ~ normal(mu_a0, sd_a0)
  alpha1 ~ normal(mu_a1, sd_a1);
  target += -M * normal_lccdf(0 | mu_a1, sd_a1); // accounts for truncation
  sd_y ~ student_t(3, 0, 0.02); // do not need to account for truncation
  
  // put good priors here on all those scalars in the parameters block
  sd_x ~ cauchy(0, 2.5);
  sd_eta ~ cauchy(0, 2.5);
  beta0 ~ normal(0, 3);
  mu_a0 ~ normal(6, 3); // weakly informative prior with mean log(y) as mean
  mu_a1 ~ normal(0, 3);
  sd_a0 ~ cauchy(0, 2.5);
  sd_a1 ~ cauchy(0, 2.5);
}

1 Like

Hmm, I will have to do some model troubleshooting because in the current form the model takes 12 hours to run 500 warmup and 1000 iterations but the chains don’t move (flat traceplots, n_eff = 2). Seems like some part of my current model (above) is over-specified or redundant. Still lots to unpack. My next step should probably be to simulate some simple datasets to use in troubleshooting the model.