Reparameterize high-dimensional problems

Hi all,

I am attempting to specify a model that approximates the underlying data-generating process of several traits related to human growth. Let x be a scalar independent variable and Y be a vector of observed continuous variables indexed by k = 1,2,...,K. I assume the overall response vector is distributed as y_n ~ N (h(x), \sigma(x)) where h(x) is a function for the mean and \sigma(x) a function for the standard deviation. The mean is a K length response vector where h_k(x) = a_k*x^r_k. For the sd I assume linearity where \sigma(x) = s_k[1+\kappa_k*x]. Essentially each continuous response has a parameter vector [a, r, b, s[\Sigma], \kappa].

To start, I tested the below model with simulated data where K =1, 2, and 6. In all cases, I am able to retrieve my original parameters. However, when I switch over to real data (where x = age and Y = continuous growth traits (i.e. bone lengths in mm) I begin to run into several issues where MANY divergences occur, the R-hat indicates a lack of convergence, there are mixing issues, tree depth issues, and ESS issues. After going through the forums, the tutorials, and reading some of the work by Betancourt, I am pretty sure the main issue is the geometric complexity of the posterior I am after. I have adjusted the adapt_delta and the tree depth, but still run into issues, telling me the model may be poorly written or to complex. Now, I know the difference between the simulated data and the “real” data rests on the fact that there is significant conditional dependence between each single response variable, the data are heteroskedastic scaling with age (hence the sd function), and according to the shinystan output there is significant autocorrelation in all chains that is not present in the simulated data. It could also be possible that the correlation structure also changes according to x. Again, I presume this is because of the complex structure of the “real” data.

QUESTION
How can I reparameterize my model to simplify the geometric properties of the posterior but is logically equivalent? Any advice would be wonderful. Essentially, my domain expertise suggests that each vector of traits Y is age dependent x, are heteroskedastic, and conditionally dependent, and the correlation structure may change according to x. The long term goal here is to scale this up where K is large, > 50 and I include ordinal responses with a probit link as well incorporating instances where some of the Y vector may be missing. It may also be that the multivariate normal is an inappropriate choice and I do plan on testing the multivariate generalization of the exponential power distribution (Gomez-Villegas and Marin, 1998), but I am struggling in writing this custom in Stan based on a cholesky parameterization for \Sigma… But, I am not there yet! Any advice or suggestions would be wonderful!

data{
  //multivariate outcomes
  int N; // # of individuals
  int K; // # of responses
  vector[K] y[N]; // vector of responses per individual
  //predictors
  real X[N]; //age
}
parameters{
  vector<lower=0>[K] a;
  vector<lower=0>[K] r;
  vector[K] b;
  vector<lower=0>[K] s_scale;
  vector<lower=0>[K] kappa;
  //cholesky factors for rho
  cholesky_factor_corr[K] L_Omega;
}
transformed parameters{
  vector[K] mu[N];
  vector[K] s[N];
  matrix[K,K] Sigma[N];
  
  for(i in 1:N){
    for(k in 1:K){
      mu[i,k] = a[k]*X[i]^(r[k])+b[k];
      s[i,k] = s_scale[k]*(1 + kappa[k]*X[i]);
      }
  }
  for(i in 1:N){
    Sigma[i] = diag_pre_multiply(s[i],L_Omega);  
      }
}
model{

  a ~ gamma(4,1);
	r ~ gamma(4,1);
	b ~ normal(0,100);
	kappa ~ gamma(4,1);
	s_scale ~ gamma(4,1);
  
  L_Omega ~ lkj_corr_cholesky(1);
  
	for(i in 1:N){
	 y[i] ~ multi_normal_cholesky(mu[i], Sigma[i]); 
	}
}

Hi,
good question! And glad that you tested with simulated data first. This means that the model is quite likely OK by itself, but some of the assumptions the model makes about the data generating process are likely wrong.

I definitely don’t see anything obviously problematic about your model. It might help in debugging to test with various subsets of the data (a subset of ages, a subset of responses) and see if you can fit say at least for low ages and just two of the traits. Investigating when things stop to work could then be useful.

It might also be sensible to check if intercept-only version of the model (i.e. having just mu[i,k] = b[k] and fitting only for a single value of age) fits well. I wouldn’t be surprised if the exponential is a big source of trouble.

You might also want to try rescaling your responses to avoid very large or very small numbers.

Also, are your observed values bound to be positive? If so, then maybe b should also be constrained to be positive?

Best of luck with your model!

You may consider changing the scale model to be on a log scale, as is done in (mixed effects) location-scale models usually.

I.e., a typical LSM looks something like:

logsd = XB

y ~ N(…, exp(logsd))

In MELSM, it’s similar:

Multivariate LSMs are similar; all the diagonal elements to Sigma are modeled on log scale. Then exponentiate and do diag_pre_multiply.

Basically, just try converting your scale model to instead model the log-scales, on the real line (no need for lower=0 anymore). Then just exponentiate and use the exp(logsd)'s instead.