Hello,
I am trying to fit a hierarchical model for longitudinal data. The mean function is specified as a asymmetric Gaussian function (6 parameters, all positive), which is put within a normal distribution to allow it to be stochastic. I have group data for multiple curves and when I fit the model for a single group (curve) it seems to fit fine as long as I specify relatively informative priors. However, there is strong correlation between some of the parameters, and I would like to model the data hierarchically using the LKJ prior. The MCMC takes quite some time (1.5 hours, N=5000) to run and for most of the parameters the effective sample size is < 10 after estimation. I have a few questions that I was hoping someone could help me with.

What is the best way to specify that the parameters should be positive? I have seen non linear examples (e.g., Gelman, Lee, Guo, 2015) where all the parameters are log transformed. I have also seen hard constraints put in the parameters block and then uniform or lognormal priors specified. I have also seen defining the parameters on the real line and then putting normal priors that put little mass on values less than zero. It seems like the speed of the sampling varies considerably depending on the choice here. All of my parameters are greater than zero and some of them are bounded between 0 and 1. Right now I am putting informative normal hyper priors on those (see mu_theta priors in code) but this is possibly causing some problems?

Also, I am getting very small sample sizes for the posterior parameters of the hierarchical covariance matrix. I think I have the LKJ prior specified correctly to ensure speed, and I have a diffuse prior on the correlation structure, but maybe this is part of the problem as well.
Any suggestions would be much appreciated. I can also post a subset of the data if that would be helpful as well.
Thank you,
Colin
data {
int N; //total number of obs
int G; //number of groups
int<lower = 1, upper = G> group[N]; //indicator for Each group
int d[N]; // day of year
real y[N]; // observed value of T
int<lower=1> K; // number of parameters
}
parameters {
cholesky_factor_corr[K] Lcorr;
//corr_matrix[K] Omega; // Correlation Matrix
vector<lower=0>[K] sigma; // Standard Deviations
vector [K] theta[G]; // Parameter for Each Group
real<lower=0> sigma_e[G]; // Separate Error Term for Each Group
//real <lower=0> sigma_error;
vector[K] mu_theta; //Means for Hierarchical Parameters
}
//transformed parameters {
// cov_matrix[K] Sigma;
// Sigma = quad_form_diag(Omega, sigma);
//}
model {
vector[N] f; //Mean at Each Day
sigma ~ cauchy(0, 2.5); //Prior on Standard Deviations
//Omega ~ lkj_corr(1); //LKJ Prior on Correlation Matrix
Lcorr ~ lkj_corr_cholesky(1);
sigma_e ~ cauchy(0,1);
//informative hyper priors
mu_theta[1] ~ normal(.1, .5);
mu_theta[2] ~ normal(.2, .5);
mu_theta[3] ~ normal(.5, .5);
mu_theta[4] ~ normal(225, 30);
mu_theta[5] ~ normal(20, 25);
mu_theta[6] ~ normal(50, 25);
theta ~ multi_normal_cholesky(mu_theta, diag_pre_multiply(sigma, Lcorr));
//theta ~ multi_normal(mu_theta, Sigma); //Hyper Prior on K Parameters
//Likelihood
for(i in 1:N){
if (d[group[i]] < theta[group[i], 4]){
f[i] = theta[group[i], 1] + (theta[group[i], 3]  theta[group[i], 1])*exp(((d[group[i]]  theta[group[i], 4])^2)/(2*(theta[group[i], 5])^2));
}
else{
f[i] = theta[group[i], 2] + (theta[group[i], 3]  theta[group[i], 2])*exp(((d[group[i]]  theta[group[i], 4])^2)/(2*(theta[group[i], 6])^2));
}
//y[i] ~ normal(f[i], sigma_e[group[i]]);
y[i] ~ normal(f[i], sigma_e[group[i]]);
}
}
generated quantities {
matrix[K,K] Omega;
matrix[K,K] Sigma;
Omega = multiply_lower_tri_self_transpose(Lcorr);
Sigma = quad_form_diag(Omega, sigma);
}