Log Likelihood for Structural Equation Modeles

Hello!

I am working on a Dynamic Structural Equation model and I am having trouble generating log likelihood values for model comparisons. I am new to SEM in Stan and I am lost about how to proceed, I tried doing multidimensional arrays of likelihood values for each “node” but it ended up going nowhere. I am looking for some advice on how to proceed or any resources on how to properly program in the information I am looking for.


If anyone is curious, the code I am using so far looks like this:

data {
  int<lower=1> N_tanks;
  int<lower=1> N_time;
  int<lower=1> N_sal;

  matrix[N_tanks, N_time] log_chl;
  matrix[N_tanks, N_time] log_zoop;

  int<lower=1> N_obs_anuran;
  vector[N_obs_anuran] log_anuran_obs;
  array[N_obs_anuran] int<lower=1> tank_id_anuran;
  array[N_obs_anuran] int<lower=1> time_id_anuran;

  array[N_tanks] int<lower=1, upper=N_sal> sal_idx;
}

parameters {
  // --- Chl-a (tank level) ---
  vector[N_sal] alpha_chl;        // intercept per salinity level
  real phi_chl;                   // autoregression coefficient
  vector[N_tanks] u_chl;          // tank random intercepts
  real<lower=0> tau_chl;          // SD of tank random intercepts
  real<lower=0> sigma_chl;        // residual SD

  // --- Zooplankton (tank level) ---
  vector[N_sal] alpha_zoop;       // intercept per salinity level
  real phi_zoop;                  // autoregression coefficient
  vector[N_tanks] u_zoop;         // Tank random intercepts
  real<lower=0> tau_zoop;         // Tank Random SD
  real<lower=0> sigma_zoop;       // residual SD
  real beta_t_chl_to_zoop;        // Chl(t) -> Zoop(t)
  real beta_t1_chl_to_zoop;       // cross-lagged: Chl(t-1) -> Zoop(t)


  // --- Anuran (latent tank mean + enclosure noise) ---
  vector[N_sal] alpha_anuran;     // intercept oer salinity level
  real phi_anuran;                // autoregression coeff
  vector[N_tanks] u_anuran;       // Tank random intercepts
  real<lower=0> tau_anuran;       // Tank Random SD
  real<lower=0> sigma_anuran;     // enclosure-level residual SD
  real beta_t_chl_to_anuran;        // Chl(t) -> Anuran(t)
  real beta_t_zoop_to_anuran;       // Zoop(t) -> Anuran(t)
  real beta_t1_chl_to_anuran;      // cross-lagged: Chl(t-1) -> Anuran(t)
  real beta_t1_zoop_to_anuran;     // cross-lagged: Zoop(t-1) -> Anuran(t)
  
  // Latent tank-level anuran means across all tanks and timepoints.
  // These are unobserved quantities Stan will estimate. Enclosure
  // observations are then modeled as noisy draws from these means.
  matrix[N_tanks, N_time] mu_anuran;
}

model {
  // --- Priors ---
  // AR coefficients: centered on 0, SD=0.5 — allows moderate persistence
  // but is skeptical of near-unit-root processes with only 9 timepoints.
  phi_chl    ~ normal(0, 0.5);
  phi_zoop   ~ normal(0, 0.5);
  phi_anuran ~ normal(0, 0.5);

  // Cross-lagged effects: weakly informative
  beta_t_chl_to_zoop    ~ normal(0, 1);
  beta_t_chl_to_anuran  ~ normal(0, 1);
  beta_t_zoop_to_anuran ~ normal(0, 1);
  beta_t1_chl_to_zoop    ~ normal(0, 1);
  beta_t1_chl_to_anuran ~ normal(0, 1);
  beta_t1_zoop_to_anuran ~ normal(0, 1);
  
  // Intercepts: weakly informative on the log scale
  alpha_chl    ~ normal(0, 2);
  alpha_zoop   ~ normal(0, 2);
  alpha_anuran ~ normal(0, 2);

  // SDs: exponential(1) puts most mass below ~2, reasonable for log-scale data
  sigma_chl    ~ exponential(1);
  sigma_zoop   ~ exponential(1);
  sigma_anuran ~ exponential(1);


  // Tank random intercepts: drawn from a normal with estimated SD (tau)
  u_chl    ~ normal(0, tau_chl);
  u_zoop   ~ normal(0, tau_zoop);
  u_anuran ~ normal(0, tau_anuran);
  tau_chl      ~ exponential(1);
  tau_zoop     ~ exponential(1);
  tau_anuran   ~ exponential(1);

  // --- Likelihood: Chl-a ---
  // t=1: no lag available, model as intercept + random effect only
  for (k in 1:N_tanks) {
    log_chl[k, 1] ~ normal(alpha_chl[sal_idx[k]] 
                       + u_chl[k], sigma_chl);     
  }
  // t=2..T: autoregression on previous timepoint
  for (k in 1:N_tanks) {
    for (t in 2:N_time) {
      real mu_chl_kt = alpha_chl[sal_idx[k]]                     
                       + phi_chl * log_chl[k, t-1]                  
                       + u_chl[k];                                  // random effect
      log_chl[k, t] ~ normal(mu_chl_kt, sigma_chl);                
      }
  }

  // --- Likelihood: Zooplankton ---
  for (k in 1:N_tanks) {
    log_zoop[k, 1] ~ normal(alpha_zoop[sal_idx[k]] 
                        + beta_t_chl_to_zoop   * log_chl[k, 1] 
                        + u_zoop[k], sigma_zoop);
  }
  for (k in 1:N_tanks) {
    for (t in 2:N_time) {
      real mu_zoop_kt = alpha_zoop[sal_idx[k]]
                        + phi_zoop * log_zoop[k, t-1]
                        + beta_t_chl_to_zoop   * log_chl[k, t] 
                        + beta_t1_chl_to_zoop  * log_chl[k, t-1] 
                        + u_zoop[k];
      log_zoop[k, t] ~ normal(mu_zoop_kt, sigma_zoop);
    }
  }

  // --- Likelihood: Anuran latent means ---
  // The latent mu_anuran evolves over time via AR + cross-lagged effects.
  // We place a normal prior on the t=1 state, then model t=2..T dynamically.
  for (k in 1:N_tanks) {
    mu_anuran[k, 1] ~ normal(alpha_anuran[sal_idx[k]] 
                          + beta_t_chl_to_anuran  * log_chl[k, 1]
                          + beta_t_zoop_to_anuran * log_zoop[k, 1]
                          + u_anuran[k], 0.5);
  }
  for (k in 1:N_tanks) {
    for (t in 2:N_time) {
      real mu_anuran_kt = alpha_anuran[sal_idx[k]]
                          + phi_anuran   * mu_anuran[k, t-1]
                          + beta_t_chl_to_anuran  * log_chl[k, t]
                          + beta_t_zoop_to_anuran * log_zoop[k, t]
                          + beta_t1_chl_to_anuran * log_chl[k, t-1]
                          + beta_t1_zoop_to_anuran* log_zoop[k, t-1]
                          + u_anuran[k];
      mu_anuran[k, t] ~ normal(mu_anuran_kt, 0.5);
    }
  }

  // --- Likelihood: Enclosure-level anuran observations ---
  // Each observed enclosure value is a noisy draw from its tank-time mean.
  for (n in 1:N_obs_anuran) {
    log_anuran_obs[n] ~ normal(
      mu_anuran[tank_id_anuran[n], time_id_anuran[n]],
      sigma_anuran
    );
  }
}

generated quantities {
  vector[N_obs_anuran] rep_anuran;
  for (n in 1:N_obs_anuran) {
    rep_anuran[n] = normal_rng(
      mu_anuran[tank_id_anuran[n], time_id_anuran[n]],
      sigma_anuran
    );
  }
}

Hi, I’m not 100% sure of what is causing you problems, so I apologize if the issue is more complicated.

If you’re interested in log likelihood values for each observation, I would do something like this for the log_zoop observations.

generated quantities {
  matrix[N_tanks, N_time] log_lik_zoop;
  ...
  // --- Likelihood: Zooplankton ---
  for (k in 1:N_tanks) {
    log_lik_zoop[k, 1] = normal_lpdf(log_zoop[k, 1] | alpha_zoop[sal_idx[k]] 
                        + beta_t_chl_to_zoop   * log_chl[k, 1] 
                        + u_zoop[k], sigma_zoop);
  }
  for (k in 1:N_tanks) {
    for (t in 2:N_time) {
      real mu_zoop_kt = alpha_zoop[sal_idx[k]]
                        + phi_zoop * log_zoop[k, t-1]
                        + beta_t_chl_to_zoop   * log_chl[k, t] 
                        + beta_t1_chl_to_zoop  * log_chl[k, t-1] 
                        + u_zoop[k];
      log_lik_zoop[k, t] = normal_lpdf(log_zoop[k,t] | mu_zoop_kt, sigma_zoop);
    }
  }
  ... 
}

And so on.

But, since this is a time series model, you may be more interested in leave-future-out cross validation. If so, I’d start with this paper: https://arxiv.org/pdf/1902.06281

You can avoid the double computation by defining the log_lik values in the transformed parameters block.

And then don’t forget to add

model {
  target += sum(log_lik_zoop);
}

And you do need to worry not only about the fact that you have time series data, but the fact that you have two species. That’s two ways the data are not i.i.d., and most of the model comparison stuff like leave-one-out cross-validation assumes i.i.d. data.

On the other hand, we’re big fans of posterior predictive checks for validating model fits.