Non-nested multilevel model

Hello. I am having trouble specifying a non-nested multi-level model. I want to be up front and tell everybody that this is for part of my disseration. I feel I have a good idea of what I’m trying to do theoretically, but unfortunately, my model output appears much worse than I would expect from the data. I am getting various warnings regarding Rhat, tree depth, and adapt_delta. I believe I am coding the model incorrectly. The response is a gamma distribution, because it must be positive. I have a conceptual DAG that I am using to base the code from, but I am not sure how to attach that here. Hopefully, you can find it at this link:

https://smttt-my.sharepoint.com/:b:/g/personal/w130883_usm_edu/EczDQFhrdL9ApsJ7wqNt4PQBbloKwQsiDEbhPHPlooZyKQ?e=fBdPVI

I can put the posterior formula here it seems (sorry, but the align commands aren’t working as expected, so it may look a little messy):

[\alpha_0, \alpha_1, \beta_1, \beta_2, \beta_3, \gamma_0, \gamma_1, \gamma_2, \sigma^2, \varsigma_m^2, \varsigma_s^2| SPM_{mse}] \propto\

\prod_{e=1}^{n} \prod_{m=1}^{M} \prod_{s=1}^{S} \text{gamma}(SPM_{mse}|g(\alpha_0, \gamma_0, \beta_1, \beta_2, \beta_3), \sigma^2)\\
\times \text{normal}(\alpha_0|\alpha_1, \varsigma_m^2)\ \text{normal}(\gamma_0|h(\gamma_1, \gamma_2), \varsigma_s^2) \\
\times \text{gamma}(\alpha_1 | 0.001, 0.001)\ \text{gamma}(\beta_1 | 0.001, 0.001)\\
\times \text{gamma}(\beta_2 | 0.001, 0.001)\ \text{normal}(\beta_3 | 0, 100)\\
\times \text{gamma}(\gamma_1 | 0.001, 0.001)\ \text{gamma}(\gamma_2 | 0.001, 0.001)\\
\times \text{inverse gamma}(\sigma^2 | 0.001, 0.001)\\
\times \text{inverse gamma}(\varsigma_m^2 | 0.001, 0.001)\\
\times \text{inverse gamma}(\varsigma_s^2 | 0.001, 0.001)

This is my model code:

    data {
      int<lower=0> N; //rows of data
      int<lower=0> D_e; // number of event covariates
      int<lower=0> D_m; // number of month covariates
      int<lower=0> D_s; // number of station covariates
      int<lower=0> L_m; // number of months
      int<lower=0> L_s; // number of stations
      matrix[N,D_e] x_e; // event design matrix (predictors)
      matrix[N,D_m] x_m; // month design matrix (predictors)
      matrix[N,D_s] x_s; // station design matrix (predictors)
      vector[N] y; // response (outcomes)
      int<lower=1, upper=L_m> m_groups[N]; // month IDs
      int<lower=1, upper=L_s> s_groups[N]; // station IDs
    }
    parameters {
      real alpha_0; // month intercept
      real gamma_0; // station intercept
      vector[D_e] beta; // event slope parameters
      vector[D_m] alpha; // month slope parameters
      vector[D_s] gamma; // station slope parameters
      real<lower=0, upper=10> sigma_sq; // population variance
      vector[L_m] u_m; // month random effects
      vector[L_s] u_s; // station random effects
      real<lower=0, upper=10> sigma_u_m_sq; // variance of station effects
      real<lower=0, upper=10> sigma_u_s_sq; // variance of month effects
    }
    transformed parameters {
      vector[N] mu_e; // individual event means
      vector[N] mu_m; // individual month means
      vector[N] mu_s; // individual station means
      vector[L_m] alpha_0j; // varying month intercept
      vector[L_s] gamma_0j; // varying station intecept

      alpha_0j = alpha_0 + u_m;
      gamma_0j = gamma_0 + u_s;
      mu_m = alpha_0j[m_groups] + x_m * alpha;
      mu_s = gamma_0j[s_groups] + x_s * gamma;
      mu_e = alpha_0j[m_groups] + gamma_0j[s_groups] + x_e * beta;
    }
    model {
      // Priors
      alpha_0 ~ normal(0, 10);
      gamma_0 ~ normal(0, 10);
      alpha[1] ~ gamma(0.001, 0.001); // Prior for discharge
      gamma[1] ~ gamma(0.001, 0.001); // Prior for distance
      gamma[2] ~ gamma(0.001, 0.001); // Prior for depth
      beta[1] ~ gamma(0.001, 0.001); // Prior for wind stress
      beta[2] ~ gamma(0.001, 0.001); // Prior for PAR
      beta[3] ~ normal(0, 10); // Prior for temperature
      sigma_sq ~ inv_gamma(0.001, 0.001);
      sigma_u_m_sq ~ inv_gamma(0.001, 0.001);
      sigma_u_s_sq ~ inv_gamma(0.001, 0.001);
      // Random effects distribution
      u_m ~ normal(0, sqrt(sigma_u_m_sq));
      u_s ~ normal(0, sqrt(sigma_u_s_sq));
      // Data Model
      y ~ normal(mu_m, sqrt(sigma_u_m_sq));
      y ~ normal(mu_s, sqrt(sigma_u_s_sq));
      y ~ gamma(mu_e, sqrt(sigma_sq));
    }

Hopefully, the data can be downloaded here.

https://smttt-my.sharepoint.com/:x:/g/personal/w130883_usm_edu/EThVIZUA3uRCkElidH-VY90BHPAu0PcUGvsTTPMWolXr2A

Does anyone see what I might be doing wrong?

I’ve never seen a model with the outcome on the left-hand-side of a sampling statement multiple times like this; I don’t think this implies a proper generative model. I think you’re trying to express that y informs on multiple domains of parameters, but you should be combining those domains explicitly to derive a single thing that y informs.

Ah yes. That doesn’t make much sense does it. Interestingly, if I remove

y ~ normal(mu_m, sqrt(sigma_u_m_sq));
y ~ normal(mu_s, sqrt(sigma_u_s_sq));

I get an “Initialization failed” error. I will try to think harder about what my code is doing and what it should be doing.

Also be sure to run some prior-predictive checks; I know gamma/inv-gamma are popular for historical reasons (related to conjugacy and therefore not really necessary in the context of MC samplers), but they look pretty heavy-tailed to me.