Coding an interaction term between a categorical variable and a continuous variable in Stan, where the continuous variable is a vector of continuous data

Hello, I am new to Stan and I am trying to model the relationship between fish length and scale size using a measurement error model that accounts for individual-level variation in fish scale size (individual-level variation was measured externally and included in the model as “tau”). In this model I also include a random year intercept, and a categorical effect of fish age. I was able to run this model without interactions, but I am struggling to get the syntax right to code an interaction between the true average scale radius estimated by the model and the categorical variable of fish age. In particular, Stan seems to be having an issue with the fact that “x” (the estimated true mean scale radius) is a vector when I attempt to code the interaction between “x” and fish age.

I would like to know if a) the code below is correctly specifying a continuous X categorical interaction, and if so, b) how I can work around “x” being a vector (whether that requires transforming “x” or using a different coding approach…).

Thanks so much for your time!


data {
  int<lower=0> N;                  // number of cases
  int j;                           // number of years
  int k;                           // number of levels of ocean age
  vector[N] fl;                    // log fish forklength
  vector[N] rad_scale;             // log scale radius
  int o_age[N];                    // ocean age (categorical)
  int year[N];                     // year ID (categorical)
  real<lower=0> tau;               // measurement noise
}
parameters {
  vector[N] x;             // unknown true value to be estimated
  vector[j] u;             // year intercepts
  vector[k] b;             // ocean age effects
  vector[k] b2;            // scale radius X ocean age interaction effects
  real mu_x;               // prior location
  real<lower=0> sigma_x;   // prior scale
  real alpha;              // intercept
  real beta;               // slope
  real<lower=0> sigma;     // outcome noise
  real<lower=0> sigma_u;   // year noise
}
model {
  x ~ normal(mu_x, sigma_x);     // prior 
  rad_scale ~ normal(x, tau);    // measured scale radius is a function of the true average scale radius with *tau* variation
  fl ~ normal(beta * x + u[year] + b[o_age] + b2[x, o_age], sigma);  // Stan indicates there is a model misspecification here, specifically for the term "b2[x, o_age]"
  alpha ~ normal(0, 5);
  beta ~ normal(0, 5);
  b ~ normal(0, 5);
  b2 ~ normal(0, 5);
  sigma ~ cauchy(0, 5);
  mu_x ~ normal(0,1);   // prior for  estimated true average scale radius
  sigma_x ~ cauchy(0,0.1);   // prior for  estimated true average scale radius error
  sigma_u ~ cauchy(0, 5);      // prior for random year intercept error
  u ~ normal(alpha, sigma_u);        //prior for year random intercept effects
}
generated quantities {
  vector[N] nuY;
  vector[N] nuX;
  for(i in 1:N){
    nuY[i] = normal_rng(beta * x[i] + u[year[i]] + b[o_age[i]] + b2[x[i], o_age[i]], sigma);
    nuX[i] = normal_rng(x[i],sigma_x);
  }
  
}


1 Like

Very sorry your message fell through :)

Whenever I’m hesitant about raw Stan code I usually write my model in a more high-level language (e.g., brms or rethinking) and then look at the generated Stan code to see if I’ve understood things correctly. In brms you simply call stancode() to get the model specification.

Have you tried that?

1 Like

No worries. I got some help managed to get the model to compile and it is currently running. I am waiting to see if it works before posting the code. I learned that Stan does not like to multiply 2 vectors that are oriented the same (i.e., horizontal x horizontal or vertical x vertical). Sometimes all that is needed to make the model run is an apostrophe to transpose a vector.

2 Likes

If it’s of any help: you can do element-wise multiplication of two column vectors (or two row vecters) by using u .* v.

3 Likes