Modeling group parameter

Hi stan users!
I’ve been struggled on modeling the hierarchical model with grouped parameter these days, so I want some advice here.
The goal of my model is to figure out the group parameter of some parameters. Below is simple example of my model.

functions{
 functions{
  vector example_model(real t, vector x, real r1f, real r1r, real r2f, real r3f, real r4f, real r6f ){
    vector[8] dydt;
    
    real r5f;
    
    r5f = 0.083;
   
    dydt[1] = -r1f*x[1]*x[2] +r1r*x[3] + r6f*x[7];
    dydt[2] = -r1f*x[1]*x[2] + r1r*x[3] + r6f*x[7];
    dydt[3] = r1f*x[1]*x[2] -r1r*x[3] - r2f*x[3]*x[4];
    dydt[4] = -r3f*x[5]*x[4]-r2f*x[5]*x[4];
    dydt[5] = r2f*x[3]*x[4]-r4f*x[5];
    dydt[6] = r4f*x[5] -r5f*x[6];
    dydt[7] = r5f*x[6]-r6f*x[7];
    dydt[8] = r5f*x[7];
    
    return dydt;
  }
}

data{
  int<lower=1> T; // number of row
  int<lower=1> N; // number of column (components number) 
  array[N] vector[7] y0; // inital data 
  array[20,7] real z; // observed data
  real t0;
  array[T] real ts; // time seqeunce
}

transformed data{
// to make array vector to vector for using ode solver
  vector[8] y1 = to_vector(y0[,1]);
  vector[8] y2 = to_vector(y0[,2]);
  vector[8] y3 = to_vector(y0[,3]);
  vector[8] y4 = to_vector(y0[,4]);
  vector[8] y5 = to_vector(y0[,5]);
  vector[8] y6 = to_vector(y0[,6]);
  vector[8] y7 = to_vector(y0[,7]);
}

parameters{
  
  // hyperparameters for r1f ~ r6f (group)

  vector<lower = 0, upper = 1>[6] mu;
  vector<lower = 0, upper = 3>[6] sigma;
  vector<lower = 0, upper = 3>[6] sigma_2;

  //subject-level raw parameters for fitting

  vector<lower = 0>[7] r1f_rp;
  vector<lower = 0>[7] r3f_rp;
  vector<lower = 0>[7] r1r_rp;
  vector<lower = 0>[7] r2f_rp;
  vector<lower = 0>[7] r4f_rp;
  vector<lower = 0>[7] r6f_rp;
  
  }
  
transformed parameters{
  
  // subject-level pars
  real<lower = 0> r1f[7];
  real<lower = 0> r3f[7];
  real<lower = 0> r1r[7];
  real<lower = 0> r2f[7];
  real<lower = 0> r4f[7];
  real<lower = 0> r6f[7];
  
  for(i in 1:7){
    
    r1f[i] = mu[1] + sigma[1]*r1f_rp[i];
    r1r[i] = mu[2] + sigma[2]*r1r_rp[i];
    r2f[i] = mu[3] + sigma[3]*r2f_rp[i];
    r3f[i] = mu[4] + sigma[4]*r3f_rp[i];
    r4f[i] = mu[5] + sigma[5]*r4f_rp[i];
    r6f[i] = mu[6] + sigma[6]*r6f_rp[i];

  }

}

model{

  mu ~ normal(0,1);
  sigma ~ cauchy(3,1);
  
  sigma_2 ~ cauchy(3,1);
  
  //prior distribution
  
  r1f_rp ~ normal(1,1);
  r1r_rp ~ normal(1,1);
  r2f_rp ~ normal(1,1);
  r3f_rp ~ normal(1,1);
  r4f_rp ~ normal(1,1);
  r6f_rp ~ normal(1,1);
  

    r1f ~ normal(0,sigma_2[1]);
    r1r ~ normal(0,sigma_2[2]);
    r2f ~ normal(0,sigma_2[3]);
    r3f ~ normal(0,sigma_2[4]);
    r4f ~ normal(0,sigma_2[5]);
    r6f ~ normal(0,sigma_2[6]);

  // ode solve 
  array[T] vector[8] z_hat1 = ode_bdf(example_model, y1, t0, ts, r1f[1], r1r[1], r2f[1], r3f[1], r4f[1], r6f[1]);
  array[T] vector[8] z_hat2 = ode_bdf(example_model, y2, t0, ts, r1f[2], r1r[2], r2f[2], r3f[2], r4f[2], r6f[2]);
  array[T] vector[8] z_hat3 = ode_bdf(example_model, y3, t0, ts, r1f[3], r1r[3], r2f[3], r3f[3], r4f[3], r6f[3]);
  array[T] vector[8] z_hat4 = ode_bdf(example_model, y4, t0, ts, r1f[4], r1r[4], r2f[4], r3f[4], r4f[4], r6f[4]);
  array[T] vector[8] z_hat5 = ode_bdf(example_model, y5, t0, ts, r1f[5], r1r[5], r2f[5], r3f[5], r4f[5], r6f[5]);
  array[T] vector[8] z_hat6 = ode_bdf(example_model, y6, t0, ts, r1f[6], r1r[6], r2f[6], r3f[6], r4f[6], r6f[6]);
  array[T] vector[8] z_hat7 = ode_bdf(example_model, y7, t0, ts, r1f[7], r1r[7], r2f[7], r3f[7], r4f[7], r6f[7]);

  array[20,8] real y;
    
  // likelihood
  for(i in 1:20){
      y[i,1] = z_hat1[i*100,8];
      y[i,2] = z_hat2[i*100,8];
      y[i,3] = z_hat3[i*100,8];
      y[i,4] = z_hat4[i*100,8];
      y[i,5] = z_hat5[i*100,8];
      y[i,6] = z_hat6[i*100,8];
      y[i,7] = z_hat7[i*100,8];

    }
  
  for(i in 1:7){
    
    z[,i] ~ lognormal(log(y[,i]), 5);
    
  }
  
}

}

I have 7 data each are different in initial condition. And with these data, I implement ode solver for each data to predict the final component(x[8]). Lastly, I use these as likelihood with real data z.

What I got problem is : when I input below code, I got very low parameters(they were close to 0).


  for(i in 1:7){
    
    r1f[i] = mu[1] + sigma[1]*r1f_rp[i];
    r1r[i] = mu[2] + sigma[2]*r1r_rp[i];
    r2f[i] = mu[3] + sigma[3]*r2f_rp[i];
    r3f[i] = mu[4] + sigma[4]*r3f_rp[i];
    r4f[i] = mu[5] + sigma[5]*r4f_rp[i];
    r6f[i] = mu[6] + sigma[6]*r6f_rp[i];

  }

The purpose of above lines is each of 7 data are belong in to one normal distribution. But I think I’m doing wrong thing.

Second, the result of sampling does not give information of grouped parameter at all. I can only get raw parameters result.
As I know, transformed parameters are constraint for raw_parameters and this applies to raw_parameters when mcmc sampling is going on. Then, the result of raw_parameter refelcts the grouped parameter now?(I’m not sure that I understood correctly.)

This is my first time to modeling hierarchical, so some advice will real help for me.
Thank you for help!

I can’t answer your actual question without seeing the rest of the model definition. I do have a few suggestions from what I could see, but this isn’t addressing your main question.

In the ODE system, it’s better to avoid duplicated arithmetic and define intermediate values that can be reused. Something like this:

vector example_model(real t, vector x, real r1f, real r1r, real r2f, real r3f, real r4f, real r6f ){
  real r5f = 0.083;
  vector[8] dydt;
  real neg_rf1x1x2 = -rf1 * x[1] * x[2];
  dydt[1] = neg_rf1x1x2 + r1r*x[3] + r6f*x[7];
  dydt[2] = neg_rf1x1x2  + r1r*x[3] + r6f*x[7];
  dydt[3] = negrf1x1x2 * x[1]*x[2] -r1r*x[3] - r2f*x[3]*x[4];
  dydt[4] = -r3f*x[5]*x[4]-r2f*x[5]*x[4];
  dydt[5] = r2f*x[3]*x[4]-r4f*x[5];
  dydt[6] = r4f*x[5] -r5f*x[6];
  dydt[7] = r5f*x[6]-r6f*x[7];
  dydt[8] = r5f*x[7];
  return dydt;
}

and so on for the other variables that are re-used.

Then if the idea of this

r1f[i] = mu[1] + sigma[1] * r1f_rp[i];

is to manually do non-centered parameterizations, we added an affine transform that makes this much easier. This can just be

parameters {
  vector<offset=mu[1], multiplier=sigma[1]>[N] r1f;
...
model {
  r1f ~ normal(mu[1], sigma[1]);

Then if you drop the offset and multiplier you go back to the centered parameterization.

Then this means that I consequently tried centered parameterization because I implemented additional prior declaration. I should try without this term.

Unless you have a lot of data per parameter being estimated, the non-centered parameterization with offset and multiplier specified is usually better.