I am fitting a multilevel model to a collection of J curves with total N data points. I have a group level predictor grp_level_x for each of the J curves.
And I am fitting the following functional form:
y = (A*((x-B)/(100-B))^(-n))

The following stan code runs very slow and I am unable to run it long enough to see if this works. I had implemented an unpooled version before and that gave me reasonable results so now I was trying to get a multilevel model working. I am relatively new to stan so any help is appreciated. Thanks,

data {
int<lower=4> N;
int<lower=0> J;
vector[N] y;
vector[N] x;
int curveId[N];
vector[J] grp_level_x;
}
transformed data{
}
parameters {
vector<lower = 0.25,upper=5>[J] n;
vector<lower=0>[J] A;
vector[J] B;
real<lower=0> eps;
// B model
real alpha_B;
real beta_B;
real<lower=0> eps_B;
// n model
real alpha_n;
real beta_n;
real<lower=0> eps_n;
// A model
real alpha_A;
real beta_A;
real<lower=0> eps_A;
}
transformed parameters {
}
model {
vector[N] y_hat;
for (i in 1:N){
y_hat[i] = (A[curveId[i]]*((x[i] - B[curveId[i]])/(100. - B[curveId[i]]))^(-n[curveId[i]]));
}
for (j in 1:J){
B[j] ~ lognormal(alpha_B + beta_B*grp_level_x[j],eps_B);
n[j] ~ lognormal(alpha_n + beta_n*grp_level_x[j],eps_n)T[0.25,10];
A[j] ~ lognormal(alpha_A + beta_A*grp_level_x[j],eps_A);
}
y ~ normal(y_hat,eps);
}

First off, add priors for eps_*, alpha_*, and beta_*. If you donâ€™t know anything else, just do normal(0.0, approrpriate_guess_of_scale).

If this doesnâ€™t work, and since your simple models work, what Iâ€™d do is:

Just do J independent fits in one model and see if it works. Something like:

for (j in 1:J){
B[j] ~ normal(0.0, 2.0);
n[j] ~ normal(0.0, 2.0);
A[j] ~ normal(0.0, 2.0);
}

Or whatever your prior was in the original fits. I think lognormals are kinda heavy-tailed, so if you just need a zero avoiding prior, maybe switch to a gamma?

Make it hierarchical, but donâ€™t do group level predictors

for (j in 1:J){
B[j] ~ lognormal(alpha_B, eps_B);
n[j] ~ lognormal(alpha_n, eps_n)T[0.25,10];
A[j] ~ lognormal(alpha_A,eps_A);
}

Add in the group level predictors

Maybe they need centered or something. Hopefully something in there works!

Thanks Ben. I had been following a similar route but as soon as I jump to hierarchical (even #2 above) the code gets stuck. If I run 4 chains, a couple of them run very fast but the remaining stay stuck for multiple hours.
One other question I have is regarding my likelihood function

y_ij ~ A_j ((x - b_j)/(100 - b_j)) ^(-n_j)

Itâ€™s not really defined for xâ€™s smaller than b_j. For each of the J curves the minimum value of x is different.How do I specify that and is this something which might be causing stan to get stuck?

Priors on the hyperparameters didnâ€™t help at all?

It sounds like you want to constrain all the values of B to be greater than the minimum x in each curve fit?

Vector constraints arenâ€™t coded in Stan yet, but you can do this stuff manually. Check the â€śTransformations of Constrained Variablesâ€ť section in the manual.

Something like this, maybe:

data {
real max_x_in_group[J]; // Pass in the maximum x values in each group (B can't go less than this)
}
parameters {
real B_unconstrained[J];
}
// Transform from unconstrained to constrained space
transformed parameters {
real B[J];
for(i in 1:J) {
B[i] = exp(B_unconstrained[i]) + max_x_in_group[i];
}
}
model {
// Add in Jacobian adjustment
for(i in 1:J) {
target += B_unconstrained[i];
}
...
}

Yes Bob, I have been working with the simulated data now for past one week. I am running into issues but have found most of the answers by searching through the forum discussions. I am still not there but making progress one step at a time!

How large are N and J Here? How many observations per group are you simulating? I have a strange feeling that there is some kind of unidentifiability for the (A_{j}, \alpha_{A}, \beta_{A}, \epsilon_{A}) tuples that may be present for small numbers of observations per group. It almost feels like you want some kind of sum-to-zero constraint on A_{j, \text{raw}}, where A_{j} = A_{j, \text{raw}} + \exp\{\alpha_{A} + \beta_{A}z_{j}\} , where z_{j} is your group level predictor. This doesnâ€™t quite work as written though (the distribution for A_{j, \text{raw}} is a truncated normal with lower bound \exp\{\alpha_{A} + \beta_{A}z_{j}\}, which is very strange). Fair chance Iâ€™m talking complete rubbish though.

Do you have some pairs plots of (A_{j}, \alpha_{A}, \beta_{A}, \epsilon_{A}) for a single j? For each of the three curve defining parameters?

I have been able to get the hierarchical model to work.

However, in the current setup I do not have a generated quantities block in my training code. I am using a separate generative code to make predictions on my test data by feeding in the fitted mean values. I would like to keep the training and testing part separate. I am currently doing the following to run the generative code:

Without knowing which warnings, itâ€™s impossible to say whether you should be worried about them or not.

Why? Is it an efficiency concern? Itâ€™s not â€ścheatingâ€ť if thatâ€™s what youâ€™re worried about. In fact, itâ€™s typical to use the inputs of the held-out data, not the outputs, as thereâ€™s information there.

Weâ€™re about to roll out a standalone generated quantities option that will let you run generated quantities on previously fits.

Until then, you can do full Bayesian posterior inference in R or Python on the outside, but you have to be careful with averaging over the draws and with parameterizations of things like densities.

Thanks @Bob_Carpenter@bbbales2
Itâ€™s an efficiency concern. My model is a hierarchical model where I have group level predictors. So, the final use of my model is to generate curves and their bounds for unseen group level predictors. I want to use the fitted model to make these predictions and not have to go through the fitting phase.
The standalone version is what I was looking for. Thanks for the info.