Bayesian analysis model - implicit function, priors and parameter constraints

Dear Community,

I hope you are all doing well.

I am joining this forum to reach out for help with the following Bayesian Analysis application. I am working on replicating the methodology presented in this paper by Feng et al. (2014). I posted the paper for reference, but I am breaking it out here so the post stands on its own. The subjacent model I am estimating is:

\begin{align} y &= \beta_0 + \beta_1 (b + y) + \sum_{n=1}^{3} \gamma_n x_n + \beta_{\tau} t + \frac{1}{2} \beta_{11} (b + y)^2 + \frac{1}{2} \sum_{n=1}^{3} \sum_{n'=1}^{3} \gamma_{nn'} x_n x_{n'} \nonumber\\ &+ \frac{1}{2} \beta_{\tau \tau} t^2 + \sum_{n=1}^{3} \phi_{n1} x_n (b + y) + \beta_{\tau 1} t (b + y) + \sum_{n=1}^{3} \gamma_{\tau n} t x_n - u + v \end{align}

Which after rearranging, renaming, and adding observation and time subscripts, can be written as:

\begin{equation} q_{it} = s'_{it} \theta - u_{it} + v_{it}, \quad i = 1, \dots, K, \quad t = 1, \dots, T \end{equation}

Where:

  • q_{it} = y_{it}
  • \theta is the vector of parameters to be estimated (all the “betas”, “gammas” and “Phis”)
  • s'_{it} is a vector containing the data (all the x_{it,...}, t, b_{it}, y_{it}, all data is always positive)
  • Assume v_{it} \backsim iid \quad N(0, \sigma^{2})

My first question is theoretical :

  1. I wonder if, when estimating this model using Bayesian methods, there is any potential challenge or consequence arising from the fact that the subjacent equation is defining y implicitly.

Further, I have some questions relating to the actual implementation of the previously defined model. Let’s assume we know the following priors:

  • p (u_{it} | \lambda^{-1}) = f_{\text{Gamma}} (u_{it} | 1, \lambda^{-1})

  • p (\lambda^{-1}) = f_{\text{Gamma}} (\lambda^{-1} | 1,\rho ) \quad (\rho is just a know constant, say \rho = 0.1)

  • p(h) = f_{\text{Gamma}} (h | 1, c) \quad where \quad h = \frac{1}{\sigma^2} \quad (c is just a know constant, say c = 0.0001)

  • Let’s also assume noninformative priors for the parameters \theta (e.g. each parameter is distributed N(0, 1)

Finally, and here is the more critical aspect, following the paper by Feng et al. (2014), I want to impose some restrictions on the estimated parameters. For example, I want parameters that meet the following inequality:

-1 + \beta_{1} + \beta_{11}(b+y) + \sum_{n=1}^{3} \phi_{n1} x_n + \beta_{\tau 1} t \leq 0

Ideally, the above inequality should be satisfied for all the x_{it,...}, t, b_{it}, y_{it}, but in a less restrictive application, I could also assume that the inequality is met only at the means.

This takes me to the next question:

  1. Is there any way to impose this type of condition in a Stan program? I have read that the restriction may be imposed during sampling by using a metropolis hastings algorithm (See, for example this paper on which the Feng et al. 2014 is based). Can this type of procedure be implemented in stan? is there any other way of doing this?

My final question is related to the implementation of a stan program for this application. For now, I am working on the non-restricted case, and I have made progress with the stan code pasted below.

  1. Can this code be adjusted to include the restriction related to the second question? The model also seems to take a very long time to run if I use 4 chains, 4,000 samples, and 1,000 warmup iterations. Any advice for making it more efficient?

  2. How can the prior for \sigma^2 be specified?

P.s. The below code was generated by the brms package in R. I posted the STAN code instead because, given my application, I think that I need a more flexible approach (e.g., STAN) than the available R packages.

# // generated with brms 2.22.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K_beta0;  // number of population-level effects
  matrix[N, K_beta0] X_beta0;  // population-level design matrix
  int<lower=1> K_beta1;  // number of population-level effects
  matrix[N, K_beta1] X_beta1;  // population-level design matrix
  int<lower=1> K_gamma1;  // number of population-level effects
  matrix[N, K_gamma1] X_gamma1;  // population-level design matrix
  int<lower=1> K_gamma2;  // number of population-level effects
  matrix[N, K_gamma2] X_gamma2;  // population-level design matrix
  int<lower=1> K_gamma3;  // number of population-level effects
  matrix[N, K_gamma3] X_gamma3;  // population-level design matrix
  int<lower=1> K_gamma4;  // number of population-level effects
  matrix[N, K_gamma4] X_gamma4;  // population-level design matrix
  int<lower=1> K_gamma5;  // number of population-level effects
  matrix[N, K_gamma5] X_gamma5;  // population-level design matrix
  int<lower=1> K_betatau;  // number of population-level effects
  matrix[N, K_betatau] X_betatau;  // population-level design matrix
  int<lower=1> K_beta11;  // number of population-level effects
  matrix[N, K_beta11] X_beta11;  // population-level design matrix
  int<lower=1> K_gamma11;  // number of population-level effects
  matrix[N, K_gamma11] X_gamma11;  // population-level design matrix
  int<lower=1> K_gamma12;  // number of population-level effects
  matrix[N, K_gamma12] X_gamma12;  // population-level design matrix
  int<lower=1> K_gamma13;  // number of population-level effects
  matrix[N, K_gamma13] X_gamma13;  // population-level design matrix
  int<lower=1> K_gamma14;  // number of population-level effects
  matrix[N, K_gamma14] X_gamma14;  // population-level design matrix
  int<lower=1> K_gamma15;  // number of population-level effects
  matrix[N, K_gamma15] X_gamma15;  // population-level design matrix
  int<lower=1> K_gamma22;  // number of population-level effects
  matrix[N, K_gamma22] X_gamma22;  // population-level design matrix
  int<lower=1> K_gamma23;  // number of population-level effects
  matrix[N, K_gamma23] X_gamma23;  // population-level design matrix
  int<lower=1> K_gamma24;  // number of population-level effects
  matrix[N, K_gamma24] X_gamma24;  // population-level design matrix
  int<lower=1> K_gamma25;  // number of population-level effects
  matrix[N, K_gamma25] X_gamma25;  // population-level design matrix
  int<lower=1> K_gamma33;  // number of population-level effects
  matrix[N, K_gamma33] X_gamma33;  // population-level design matrix
  int<lower=1> K_gamma34;  // number of population-level effects
  matrix[N, K_gamma34] X_gamma34;  // population-level design matrix
  int<lower=1> K_gamma35;  // number of population-level effects
  matrix[N, K_gamma35] X_gamma35;  // population-level design matrix
  int<lower=1> K_gamma44;  // number of population-level effects
  matrix[N, K_gamma44] X_gamma44;  // population-level design matrix
  int<lower=1> K_gamma45;  // number of population-level effects
  matrix[N, K_gamma45] X_gamma45;  // population-level design matrix
  int<lower=1> K_gamma55;  // number of population-level effects
  matrix[N, K_gamma55] X_gamma55;  // population-level design matrix
  int<lower=1> K_betatautau;  // number of population-level effects
  matrix[N, K_betatautau] X_betatautau;  // population-level design matrix
  int<lower=1> K_phi1;  // number of population-level effects
  matrix[N, K_phi1] X_phi1;  // population-level design matrix
  int<lower=1> K_phi2;  // number of population-level effects
  matrix[N, K_phi2] X_phi2;  // population-level design matrix
  int<lower=1> K_phi3;  // number of population-level effects
  matrix[N, K_phi3] X_phi3;  // population-level design matrix
  int<lower=1> K_phi4;  // number of population-level effects
  matrix[N, K_phi4] X_phi4;  // population-level design matrix
  int<lower=1> K_phi5;  // number of population-level effects
  matrix[N, K_phi5] X_phi5;  // population-level design matrix
  int<lower=1> K_betatau1;  // number of population-level effects
  matrix[N, K_betatau1] X_betatau1;  // population-level design matrix
  int<lower=1> K_gammatau1;  // number of population-level effects
  matrix[N, K_gammatau1] X_gammatau1;  // population-level design matrix
  int<lower=1> K_gammatau2;  // number of population-level effects
  matrix[N, K_gammatau2] X_gammatau2;  // population-level design matrix
  int<lower=1> K_gammatau3;  // number of population-level effects
  matrix[N, K_gammatau3] X_gammatau3;  // population-level design matrix
  int<lower=1> K_gammatau4;  // number of population-level effects
  matrix[N, K_gammatau4] X_gammatau4;  // population-level design matrix
  int<lower=1> K_gammatau5;  // number of population-level effects
  matrix[N, K_gammatau5] X_gammatau5;  // population-level design matrix
  int<lower=1> K_u;  // number of population-level effects
  matrix[N, K_u] X_u;  // population-level design matrix
  // covariates for non-linear functions
  vector[N] C_1;
  vector[N] C_2;
  vector[N] C_3;
  vector[N] C_4;
  vector[N] C_5;
  vector[N] C_6;
  vector[N] C_7;
  vector[N] C_8;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  vector[K_beta0] b_beta0;  // regression coefficients
  vector[K_beta1] b_beta1;  // regression coefficients
  vector[K_gamma1] b_gamma1;  // regression coefficients
  vector[K_gamma2] b_gamma2;  // regression coefficients
  vector[K_gamma3] b_gamma3;  // regression coefficients
  vector[K_gamma4] b_gamma4;  // regression coefficients
  vector[K_gamma5] b_gamma5;  // regression coefficients
  vector[K_betatau] b_betatau;  // regression coefficients
  vector[K_beta11] b_beta11;  // regression coefficients
  vector[K_gamma11] b_gamma11;  // regression coefficients
  vector[K_gamma12] b_gamma12;  // regression coefficients
  vector[K_gamma13] b_gamma13;  // regression coefficients
  vector[K_gamma14] b_gamma14;  // regression coefficients
  vector[K_gamma15] b_gamma15;  // regression coefficients
  vector[K_gamma22] b_gamma22;  // regression coefficients
  vector[K_gamma23] b_gamma23;  // regression coefficients
  vector[K_gamma24] b_gamma24;  // regression coefficients
  vector[K_gamma25] b_gamma25;  // regression coefficients
  vector[K_gamma33] b_gamma33;  // regression coefficients
  vector[K_gamma34] b_gamma34;  // regression coefficients
  vector[K_gamma35] b_gamma35;  // regression coefficients
  vector[K_gamma44] b_gamma44;  // regression coefficients
  vector[K_gamma45] b_gamma45;  // regression coefficients
  vector[K_gamma55] b_gamma55;  // regression coefficients
  vector[K_betatautau] b_betatautau;  // regression coefficients
  vector[K_phi1] b_phi1;  // regression coefficients
  vector[K_phi2] b_phi2;  // regression coefficients
  vector[K_phi3] b_phi3;  // regression coefficients
  vector[K_phi4] b_phi4;  // regression coefficients
  vector[K_phi5] b_phi5;  // regression coefficients
  vector[K_betatau1] b_betatau1;  // regression coefficients
  vector[K_gammatau1] b_gammatau1;  // regression coefficients
  vector[K_gammatau2] b_gammatau2;  // regression coefficients
  vector[K_gammatau3] b_gammatau3;  // regression coefficients
  vector[K_gammatau4] b_gammatau4;  // regression coefficients
  vector[K_gammatau5] b_gammatau5;  // regression coefficients
  vector[K_u] b_u;  // regression coefficients
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += normal_lpdf(b_beta0 | 0,1);
  lprior += normal_lpdf(b_beta1 | 0,1);
  lprior += normal_lpdf(b_gamma1 | 0,1);
  lprior += normal_lpdf(b_gamma2 | 0,1);
  lprior += normal_lpdf(b_gamma3 | 0,1);
  lprior += normal_lpdf(b_gamma4 | 0,1);
  lprior += normal_lpdf(b_gamma5 | 0,1);
  lprior += normal_lpdf(b_betatau | 0,1);
  lprior += normal_lpdf(b_beta11 | 0,1);
  lprior += normal_lpdf(b_gamma11 | 0,1);
  lprior += normal_lpdf(b_gamma12 | 0,1);
  lprior += normal_lpdf(b_gamma13 | 0,1);
  lprior += normal_lpdf(b_gamma14 | 0,1);
  lprior += normal_lpdf(b_gamma15 | 0,1);
  lprior += normal_lpdf(b_gamma22 | 0,1);
  lprior += normal_lpdf(b_gamma23 | 0,1);
  lprior += normal_lpdf(b_gamma24 | 0,1);
  lprior += normal_lpdf(b_gamma25 | 0,1);
  lprior += normal_lpdf(b_gamma33 | 0,1);
  lprior += normal_lpdf(b_gamma34 | 0,1);
  lprior += normal_lpdf(b_gamma35 | 0,1);
  lprior += normal_lpdf(b_gamma44 | 0,1);
  lprior += normal_lpdf(b_gamma45 | 0,1);
  lprior += normal_lpdf(b_gamma55 | 0,1);
  lprior += normal_lpdf(b_betatautau | 0,1);
  lprior += normal_lpdf(b_phi1 | 0,1);
  lprior += normal_lpdf(b_phi2 | 0,1);
  lprior += normal_lpdf(b_phi3 | 0,1);
  lprior += normal_lpdf(b_phi4 | 0,1);
  lprior += normal_lpdf(b_phi5 | 0,1);
  lprior += normal_lpdf(b_betatau1 | 0,1);
  lprior += normal_lpdf(b_gammatau1 | 0,1);
  lprior += normal_lpdf(b_gammatau2 | 0,1);
  lprior += normal_lpdf(b_gammatau3 | 0,1);
  lprior += normal_lpdf(b_gammatau4 | 0,1);
  lprior += normal_lpdf(b_gammatau5 | 0,1);
  lprior += exponential_lpdf(b_u | 0.1);

}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] nlp_beta0 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_beta1 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma1 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma2 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma3 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma4 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma5 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_betatau = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_beta11 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma11 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma12 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma13 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma14 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma15 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma22 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma23 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma24 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma25 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma33 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma34 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma35 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma44 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma45 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gamma55 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_betatautau = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_phi1 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_phi2 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_phi3 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_phi4 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_phi5 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_betatau1 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gammatau1 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gammatau2 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gammatau3 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gammatau4 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_gammatau5 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_u = rep_vector(0.0, N);
    // initialize non-linear predictor term
    vector[N] mu;
    nlp_beta0 += X_beta0 * b_beta0;
    nlp_beta1 += X_beta1 * b_beta1;
    nlp_gamma1 += X_gamma1 * b_gamma1;
    nlp_gamma2 += X_gamma2 * b_gamma2;
    nlp_gamma3 += X_gamma3 * b_gamma3;
    nlp_gamma4 += X_gamma4 * b_gamma4;
    nlp_gamma5 += X_gamma5 * b_gamma5;
    nlp_betatau += X_betatau * b_betatau;
    nlp_beta11 += X_beta11 * b_beta11;
    nlp_gamma11 += X_gamma11 * b_gamma11;
    nlp_gamma12 += X_gamma12 * b_gamma12;
    nlp_gamma13 += X_gamma13 * b_gamma13;
    nlp_gamma14 += X_gamma14 * b_gamma14;
    nlp_gamma15 += X_gamma15 * b_gamma15;
    nlp_gamma22 += X_gamma22 * b_gamma22;
    nlp_gamma23 += X_gamma23 * b_gamma23;
    nlp_gamma24 += X_gamma24 * b_gamma24;
    nlp_gamma25 += X_gamma25 * b_gamma25;
    nlp_gamma33 += X_gamma33 * b_gamma33;
    nlp_gamma34 += X_gamma34 * b_gamma34;
    nlp_gamma35 += X_gamma35 * b_gamma35;
    nlp_gamma44 += X_gamma44 * b_gamma44;
    nlp_gamma45 += X_gamma45 * b_gamma45;
    nlp_gamma55 += X_gamma55 * b_gamma55;
    nlp_betatautau += X_betatautau * b_betatautau;
    nlp_phi1 += X_phi1 * b_phi1;
    nlp_phi2 += X_phi2 * b_phi2;
    nlp_phi3 += X_phi3 * b_phi3;
    nlp_phi4 += X_phi4 * b_phi4;
    nlp_phi5 += X_phi5 * b_phi5;
    nlp_betatau1 += X_betatau1 * b_betatau1;
    nlp_gammatau1 += X_gammatau1 * b_gammatau1;
    nlp_gammatau2 += X_gammatau2 * b_gammatau2;
    nlp_gammatau3 += X_gammatau3 * b_gammatau3;
    nlp_gammatau4 += X_gammatau4 * b_gammatau4;
    nlp_gammatau5 += X_gammatau5 * b_gammatau5;
    nlp_u += X_u * b_u;
    for (n in 1:N) {
      // compute non-linear predictor values
      mu[n] = (nlp_beta0[n] + nlp_beta1[n] * (C_1[n] + C_2[n]) + nlp_gamma1[n] * C_3[n] + nlp_gamma2[n] * C_4[n] + nlp_gamma3[n] * C_5[n] + nlp_gamma4[n] * C_6[n] + nlp_gamma5[n] * C_7[n] + nlp_betatau[n] * C_8[n] + nlp_beta11[n] * ((1 / 2) * (C_1[n] + C_2[n]) ^ 2) + nlp_gamma11[n] * C_3[n] ^ 2 + nlp_gamma12[n] * C_3[n] * C_4[n] + nlp_gamma13[n] * C_3[n] * C_5[n] + nlp_gamma14[n] * C_3[n] * C_6[n] + nlp_gamma15[n] * C_3[n] * C_7[n] + nlp_gamma22[n] * C_4[n] ^ 2 + nlp_gamma23[n] * C_4[n] * C_5[n] + nlp_gamma24[n] * C_4[n] * C_6[n] + nlp_gamma25[n] * C_4[n] * C_7[n] + nlp_gamma33[n] * C_5[n] ^ 2 + nlp_gamma34[n] * C_5[n] * C_6[n] + nlp_gamma35[n] * C_5[n] * C_7[n] + nlp_gamma44[n] * C_6[n] ^ 2 + nlp_gamma45[n] * C_6[n] * C_7[n] + nlp_gamma55[n] * C_7[n] ^ 2 + nlp_betatautau[n] * ((1 / 2) * (C_8[n]) ^ 2) + nlp_phi1[n] * C_3[n] * (C_1[n] + C_2[n]) + nlp_phi2[n] * C_4[n] * (C_1[n] + C_2[n]) + nlp_phi3[n] * C_5[n] * (C_1[n] + C_2[n]) + nlp_phi4[n] * C_6[n] * (C_1[n] + C_2[n]) + nlp_phi5[n] * C_7[n] * (C_1[n] + C_2[n]) + nlp_betatau1[n] * C_8[n] * (C_1[n] + C_2[n]) + nlp_gammatau1[n] * C_8[n] * C_3[n] + nlp_gammatau2[n] * C_8[n] * C_4[n] + nlp_gammatau3[n] * C_8[n] * C_5[n] + nlp_gammatau4[n] * C_8[n] * C_6[n] + nlp_gammatau5[n] * C_8[n] * C_7[n] - nlp_u[n]);
    }
    target += normal_lpdf(Y | mu, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
}

Many thanks in advance for any insight or support!

Best wishes,
Juan P. Henao
(Ph.D Student)

Hi @martinmodrak, @mitzimorris, @Bob_Carpenter

I hope you’re all doing well!

Please allow me to tag you, as I have seen you are very active in the forum. I just kindly wanted to ask you if you might be able to provide any feedback or information regarding this query. Any resource material or tag to someone who could help me will be hugely appreciated.

Also, if there’s any way I can improve my post to make it more transparent or more helpful, I’d appreciate your feedback.

I truly appreciate the time and effort you put into moderating the forum, and I completely understand that things can get busy.

Looking forward to hearing from you whenever you have the chance.

Thanks again!

I admit I am having trouble parsing all the details, but it appears that you assume that the noise term is a difference of a normally-distributed variable (v_{it}) and an exponentially distributed variable (u_{it}), right? If that’s right, than I think u_{it} - v_{it} should follow the exponentially-modified normal distribution and you should be able to avoid estimating u and v directly.

Parameter restrictions are a much harder problem. Generall Gibbs/Metropolis-Hastings tricks don’t interact well with Stan’s sampler and cannot be easily implelemented. You have basically two ways to approach that problem in a Stan-friendly way:

  1. Find a sufficiently well behaved parametrization of your model that meets the inequality by construction (e.g. to satisfy a constraint that \beta_1 + 2\beta_2 \leq 0, you could have
parameters {
   real<upper=0> beta_sum; //== beta_1 + 2*beta_2
   real beta_1;
}

transformed parameters {
   real beta_2 = (beta_sum - beta_1) / 2;
}

This is a bit tedious to generalize for your case (but could definitely be done), the bigger problem is that there are many ways to build such parametrizations and many of those could sample pretty badly with Stan. But it allows you to enforce the penalty directly and fully.

  1. Use a soft constraint/penalization. In this setting you give up on satisfying the equation exactly and instead do something like:
real constraint_lhs = ... //your complex formula here
if(constraint_lhs > 0) {
   target += normal_lpdf(constraint_lhs | 0, penalization_scale);
}

(the code above is for a \leq 0 constraint). The lower the penalization_scale the more closely you mimic a hard constraint and the worse your model will sample. In some cases soft constraints can perform reasonably well (e.g. Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain)

brms generated code is very hard to read, could you share the brms formula as well if you want some feedback on that?

Hope that helps a bit.

Dear @martinmodrak,

Thanks so much for your answer. This helps a lot, indeed.

Regarding the error term, yes, your understanding is correct. That is a good point regarding the distribution of the difference. In principle, I am interested in the estimates for the “u” part, so I guess I need to keep them separate for now, but in another round, I might consider the model assuming the distribution of the difference. Thanks for noticing this!

I am also looking forward to testing the two alternatives you suggest for the parameter constraints, now I understand it and it makes sense to me. May I ask you what you mean by the model sampling well or not well? Do you mean the convergence of the model? This seems to be a critical aspect for me at the moment, as I have not managed to run a model that converges well. If I run 4000 iterations with 1000 warm up, it takes very, very long to run.

I would indeed appreciate some feedback on the brms code, I am pasting the code below. My main questions would be the implementation of the prior for sigma (the standard deviation of “v”) and, related to my last question, the possibilities of improving efficiency.

Many thanks again!

formula <- bf(
  
  # Main Equation: 
  pr_1 ~ 
    
    beta0 +                                         
    
    beta1 * (GHG_Ag + pr_1) +                       
    
    gamma1 * water_1 + 
    gamma2 * ag_land_1 + 
    gamma3 * ct_1 +                                
    gamma4 * v1_1 + 
    gamma5 * v6_1 +
    
    betatau * t +
    
    beta11 * ((1/2) * (GHG_Ag + pr_1)^2) +
    
    gamma11 * water_1^2 +
    gamma12 * water_1 * ag_land_1 +
    gamma13 * water_1 * ct_1 +
    gamma14 * water_1 * v1_1 +
    gamma15 * water_1 * v6_1 + 
    gamma22 * ag_land_1^2 +
    gamma23 * ag_land_1 * ct_1 + 
    gamma24 * ag_land_1 * v1_1 + 
    gamma25 * ag_land_1 * v6_1 + 
    gamma33 * ct_1^2 +
    gamma34 * ct_1 * v1_1 + 
    gamma35 * ct_1 * v6_1 + 
    gamma44 * v1_1^2 +
    gamma45 * v1_1 * v6_1 + 
    gamma55 * v6_1^2 +
    
    betatautau * ((1/2) * (t)^2) +
    
    phi1 * water_1 * (GHG_Ag + pr_1) + 
    phi2 * ag_land_1 * (GHG_Ag + pr_1) + 
    phi3 * ct_1 * (GHG_Ag + pr_1) + 
    phi4 * v1_1 * (GHG_Ag + pr_1) + 
    phi5 * v6_1 * (GHG_Ag + pr_1) + 
    
    betatau1 * t * (GHG_Ag + pr_1) + 
    
    gammatau1 * t * water_1 +
    gammatau2 * t * ag_land_1 +
    gammatau3 * t * ct_1 +
    gammatau4 * t * v1_1 +
    gammatau5 * t * v6_1 - u,
  
  
   beta0 +                                    
    beta1 +                       
    gamma1 + 
    gamma2 + 
    gamma3 +                              
    gamma4 + 
    gamma5 +
    betatau +
    beta11 +
    gamma11 + 
    gamma12 + 
    gamma13 +
    gamma14 +
    gamma15 + 
    gamma22 + 
    gamma23 +
    gamma24 +
    gamma25 +
    gamma33 +
    gamma34 +
    gamma35 +
    gamma44 +
    gamma45 +
    gamma55 +
    betatautau +
    phi1  + 
    phi2 +
    phi3 +
    phi4 + 
    phi5 +
    betatau1 +
    gammatau1 +
    gammatau2 +
    gammatau3 +
    gammatau4 +
    gammatau5 + u ~ 1,
  
  
  nl = TRUE
)


priors <- c(
  
  set_prior("normal(0,1)", nlpar = "beta0"),
  set_prior("normal(0,1)", nlpar = "beta1"),
  
  set_prior("normal(0,1)", nlpar = "gamma1"),
  set_prior("normal(0,1)", nlpar = "gamma2"),
  set_prior("normal(0,1)", nlpar = "gamma3"),
  set_prior("normal(0,1)", nlpar = "gamma4"),
  set_prior("normal(0,1)", nlpar = "gamma5"),
  
  set_prior("normal(0,1)", nlpar = "betatau"), 
  
  set_prior("normal(0,1)", nlpar = "beta11"),
  
  set_prior("normal(0,1)", nlpar = "gamma11"),
  set_prior("normal(0,1)", nlpar = "gamma12"),
  set_prior("normal(0,1)", nlpar = "gamma13"),
  set_prior("normal(0,1)", nlpar = "gamma14"),
  set_prior("normal(0,1)", nlpar = "gamma15"),
  set_prior("normal(0,1)", nlpar = "gamma22"),
  set_prior("normal(0,1)", nlpar = "gamma23"),
  set_prior("normal(0,1)", nlpar = "gamma24"),
  set_prior("normal(0,1)", nlpar = "gamma25"),
  set_prior("normal(0,1)", nlpar = "gamma33"),
  set_prior("normal(0,1)", nlpar = "gamma34"),
  set_prior("normal(0,1)", nlpar = "gamma35"),
  set_prior("normal(0,1)", nlpar = "gamma44"),
  set_prior("normal(0,1)", nlpar = "gamma45"),
  set_prior("normal(0,1)", nlpar = "gamma55"),
  
  set_prior("normal(0,1)", nlpar = "betatautau"),
  
  set_prior("normal(0,1)", nlpar = "phi1"),
  set_prior("normal(0,1)", nlpar = "phi2"),
  set_prior("normal(0,1)", nlpar = "phi3"),
  set_prior("normal(0,1)", nlpar = "phi4"),
  set_prior("normal(0,1)", nlpar = "phi5"),
  
  set_prior("normal(0,1)", nlpar = "betatau1"),
  
  set_prior("normal(0,1)", nlpar = "gammatau1"),
  set_prior("normal(0,1)", nlpar = "gammatau2"),
  set_prior("normal(0,1)", nlpar = "gammatau3"),
  set_prior("normal(0,1)", nlpar = "gammatau4"),
  set_prior("normal(0,1)", nlpar = "gammatau5"),
  
  
  set_prior("exponential(0.1)", nlpar = "u", lb = 0)
  
)


fit <- brm(
  formula = formula,
  data = WEFE_Nexus_Prod,
  family = gaussian(),  
  prior = priors,
  chains = 4,           
  iter = 4000,         
  warmup = 1000,       
  seed = 123,       
  control = list(max_treedepth = 12, adapt_delta = 0.9),
  cores = getOption("mc.cores", 4)
)

You should be able to recover estimates for u even when the parameters are marginalized out for the model (I’d need to think a bit more to be confident if just directly sampling from the exponential distribution with correct rate is enough or you need to be a bit more careful, but I think it should be enough).

Yes, I mean the model manifesting any of the following: problems initializing, divergent transitions, low ESS, high Rhat or just very slow sampling.

In general I would strongly advise starting with a very simple variant of the model (e.g. treat most of the beta/gamma/phi terms as zero or known constants) and with simulated data matching that simplified model. Now you have too many moving pieces so any problem is hard to diagnose.

If I understand the code correctly, there is a mismatch between brms code and your model - it appears your brms formula estimates just a single shared u for all rows, whereas the math description assumes different u_{it} for each observation.

Many thanks, Martin. All of this sounds very reasonable. I will give some thought and work on these points.

Well noted, I will correct this.

Thanks again!