Multivariate growth outcomes with multivariate Gaussian

Hi all,

I am attempting to specify a model that approximates the underlying data-generating process of several traits related to human growth. Let x be a scalar independent variable and y be a vector of observed continuous variables indexed by k = 1,2,...,K. I assume the overall response vector is distributed as y_n ~ N (h(x), \sigma^2(x)) where h(x) is a function for the mean and \sigma^2(x) a function for the standard deviation. The mean is a K length response vector where h_k(x) = a_k*x^r_k. For the sd I assume linearity where \sigma(x) = s_k[1+\kappa_k*x]. Essentially each continuous response has a parameter vector [a, r, b, s, \kappa]. The ultimate goal is to translate this up where K is quite large and is a mix of continuous and ordinal traits, but currently I have simulated some data and am trying to test the model with 2 continuous responses.

I attach a toy model that does run, but throws MANY divergence errors, so I am curious as to the best way to code out what I’ve described above. Note, I have not included the SD function here because when I do, the diag_pre_multiply() function throws an error. I’m sure I could supply better priors and such, but I am most concerned about the most efficient way to model the mean and sd function and then the decomposition of \Sigma. From research, I know noise increases with x (demonstrated with the sd function) and I know there is dependence between response traits.

My simulated input data looks something like this:

[1] 2
[1] 1000
[1] 0.75311542 1.15415400 0.06741298 2.67019476 3.96470039 3.32218096
          [,1]      [,2]
[1,]  49.12179  50.41544
[2,]  66.51528  66.02949
[3,] -15.19807  37.58481
[4,] 130.97413  96.35293
[5,] 152.44352 104.65998
[6,] 138.78272 106.90922
data{
  //multivariate outcomes
  int N; // # of individuals
  int K; // # of responses
  vector[K] y[N]; // vector of responses per individual
  //predictors
  real X[N]; //age
}
parameters{
  vector[K] a;
  vector[K] r;
  vector[K] b;
  vector[K] s_scale;
  vector[K] kappa;
  //cholesky factors for rho
  cholesky_factor_corr[K] L_Omega;
}
transformed parameters{
  vector[K] mu[N];
  matrix[K,K] Sigma;
  for(i in 1:N){
    for(k in 1:K){
      mu[i,k] = a[k]*X[i]^(1-r[k])+b[k];
      }
  }
  Sigma = diag_pre_multiply(s_scale,L_Omega);
}
model{
  for(k in 1:K){
    a[K] ~ gamma(0.001,0.001);
	  r[K] ~ gamma(0.001,0.001);
	  b[K] ~ normal(0,1000);
	  kappa[K] ~ gamma(0.001,0.001);
	  s_scale[k] ~ gamma(4,1);
  }
  L_Omega ~ lkj_corr_cholesky(1);
	y ~ multi_normal_cholesky(mu, Sigma);
}

You really shouldn’t have a prior with a SD this big. Either you’re trying to convey a true prior and you should be rescaling your data so that you can make the scale of b such that you can use a normal(0,1) prior, or you’re trying to convey large uncertainty, but not realizing that the prior predictive distribution also matters, in which case 1000 is probably going to yield a ridiculous set of outcomes you’re saying are believable a priori.

I’d also reconsider your gamma(0.001,0.001) priors; those have pretty weird prior predictive consequences, being simultaneously strongly peaked at zero as well as having very heavy tails.

Oh! You are looping over the K outcomes, but when indexing, you’re using the wrong variable K when you want k (good argument for not using the for(lowercase in 1:UPPERCASE) naming scheme for loops; I prefer for(this_K in 1:K) ). That’s probably where most of your issues are coming from. By the way, you don’t need to loop to express the priors at all, you can do just:

  a ~ gamma(0.001,0.001);
  r ~ gamma(0.001,0.001);
  b ~ normal(0,1000);
  kappa~ gamma(0.001,0.001);
  s_scale ~ gamma(4,1);

Thank you! I think I got lost translating and thinking between R, JAGS, and now Stan when writing out the model. I will go back and change the for loops. In terms of the priors, I should have known better to specify more realistic priors on the model.

I do have a question about coding out the sd function and the resulting covariance matrix. How do I incorporate the scaling of the standard deviation by the independent variable. That is we know \Sigma = diag(\sigma_i)*\rho_ik*diag(\sigma_i). As the model is written above this assumes \sigma is constant across the independent variable with each response having their own \sigma. But, from research we know the actual noise scales with age where \sigma = s[1 + \kappa*x]. How can this be incorporated into the larger covariance structure in Stan?

You can simply build a vector for the sd in the same way you build a vector for the mean (mu in your model.

In the code block below I define a new parameter s and use that to create a new vector of standard deviations with the equation. However, when I go to combine the the cholesky correlation term and the new sd with diag_pre_multiply() I get an error that says unmatched arguments. Is there some reformatting I should do beforehand? Alternatively, have I coded the new parameter incorrectly?

transformed parameters{
  vector[K] mu[N];
  vector[K] s[N];
  matrix[K,K] Sigma;
  for(i in 1:N){
    for(k in 1:K){
      mu[i,k] = a[k]*X[i]^(r[k])+b[k];
      s[i,k] = s_scale[k]*(1 + kappa[k]*X[i]);
      }
  }
  Sigma = diag_pre_multiply(s,L_Omega);
}

Ah, darn. You’re going to have to create N covariances:

matrix[K,K] Sigma[N] ;
...
for(i in 1:N){
 Sigma[i] = diag_pre_multiply(s,L_Omega);
}
model{
...
 for(i in 1:N){
   y[i] ~ multi_normal_cholesky(mu,Sigma[i]) ;
}

Oh! It’s probably way faster to instead instead transform the data in the model block:

transformed data{
  vector[K] zeroes = rep_vector(1,K) ;
}
model{
...
  vector[K] y_prime[N] ;
  for( i in 1:N){
    y_prime[i] = (y[i] - mu[i]) / s[i] ;
  }
  y_prime ~ multi_normal_cholesky(zeroes,L_Omega) ;
}