I am fitting a multilevel model to a collection of J curves with total N data points. I have a group level predictor grp_level_x for each of the J curves.
And I am fitting the following functional form:
y = (A*((x-B)/(100-B))^(-n))
The following stan code runs very slow and I am unable to run it long enough to see if this works. I had implemented an unpooled version before and that gave me reasonable results so now I was trying to get a multilevel model working. I am relatively new to stan so any help is appreciated. Thanks,
data {
int<lower=4> N;
int<lower=0> J;
vector[N] y;
vector[N] x;
int curveId[N];
vector[J] grp_level_x;
}
transformed data{
}
parameters {
vector<lower = 0.25,upper=5>[J] n;
vector<lower=0>[J] A;
vector[J] B;
real<lower=0> eps;
// B model
real alpha_B;
real beta_B;
real<lower=0> eps_B;
// n model
real alpha_n;
real beta_n;
real<lower=0> eps_n;
// A model
real alpha_A;
real beta_A;
real<lower=0> eps_A;
}
transformed parameters {
}
model {
vector[N] y_hat;
for (i in 1:N){
y_hat[i] = (A[curveId[i]]*((x[i] - B[curveId[i]])/(100. - B[curveId[i]]))^(-n[curveId[i]]));
}
for (j in 1:J){
B[j] ~ lognormal(alpha_B + beta_B*grp_level_x[j],eps_B);
n[j] ~ lognormal(alpha_n + beta_n*grp_level_x[j],eps_n)T[0.25,10];
A[j] ~ lognormal(alpha_A + beta_A*grp_level_x[j],eps_A);
}
y ~ normal(y_hat,eps);
}
First off, add priors for eps_*, alpha_*, and beta_*. If you don’t know anything else, just do normal(0.0, approrpriate_guess_of_scale).
If this doesn’t work, and since your simple models work, what I’d do is:
Just do J independent fits in one model and see if it works. Something like:
for (j in 1:J){
B[j] ~ normal(0.0, 2.0);
n[j] ~ normal(0.0, 2.0);
A[j] ~ normal(0.0, 2.0);
}
Or whatever your prior was in the original fits. I think lognormals are kinda heavy-tailed, so if you just need a zero avoiding prior, maybe switch to a gamma?
Make it hierarchical, but don’t do group level predictors
for (j in 1:J){
B[j] ~ lognormal(alpha_B, eps_B);
n[j] ~ lognormal(alpha_n, eps_n)T[0.25,10];
A[j] ~ lognormal(alpha_A,eps_A);
}
Add in the group level predictors
Maybe they need centered or something. Hopefully something in there works!
Thanks Ben. I had been following a similar route but as soon as I jump to hierarchical (even #2 above) the code gets stuck. If I run 4 chains, a couple of them run very fast but the remaining stay stuck for multiple hours.
One other question I have is regarding my likelihood function
y_ij ~ A_j ((x - b_j)/(100 - b_j)) ^(-n_j)
It’s not really defined for x’s smaller than b_j. For each of the J curves the minimum value of x is different.How do I specify that and is this something which might be causing stan to get stuck?
Priors on the hyperparameters didn’t help at all?
It sounds like you want to constrain all the values of B to be greater than the minimum x in each curve fit?
Vector constraints aren’t coded in Stan yet, but you can do this stuff manually. Check the “Transformations of Constrained Variables” section in the manual.
Something like this, maybe:
data {
real max_x_in_group[J]; // Pass in the maximum x values in each group (B can't go less than this)
}
parameters {
real B_unconstrained[J];
}
// Transform from unconstrained to constrained space
transformed parameters {
real B[J];
for(i in 1:J) {
B[i] = exp(B_unconstrained[i]) + max_x_in_group[i];
}
}
model {
// Add in Jacobian adjustment
for(i in 1:J) {
target += B_unconstrained[i];
}
...
}
Yes Bob, I have been working with the simulated data now for past one week. I am running into issues but have found most of the answers by searching through the forum discussions. I am still not there but making progress one step at a time!
How large are N and J Here? How many observations per group are you simulating? I have a strange feeling that there is some kind of unidentifiability for the (A_{j}, \alpha_{A}, \beta_{A}, \epsilon_{A}) tuples that may be present for small numbers of observations per group. It almost feels like you want some kind of sum-to-zero constraint on A_{j, \text{raw}}, where A_{j} = A_{j, \text{raw}} + \exp\{\alpha_{A} + \beta_{A}z_{j}\} , where z_{j} is your group level predictor. This doesn’t quite work as written though (the distribution for A_{j, \text{raw}} is a truncated normal with lower bound \exp\{\alpha_{A} + \beta_{A}z_{j}\}, which is very strange). Fair chance I’m talking complete rubbish though.
Do you have some pairs plots of (A_{j}, \alpha_{A}, \beta_{A}, \epsilon_{A}) for a single j? For each of the three curve defining parameters?
I have been able to get the hierarchical model to work.
However, in the current setup I do not have a generated quantities block in my training code. I am using a separate generative code to make predictions on my test data by feeding in the fitted mean values. I would like to keep the training and testing part separate. I am currently doing the following to run the generative code:
Without knowing which warnings, it’s impossible to say whether you should be worried about them or not.
Why? Is it an efficiency concern? It’s not “cheating” if that’s what you’re worried about. In fact, it’s typical to use the inputs of the held-out data, not the outputs, as there’s information there.
We’re about to roll out a standalone generated quantities option that will let you run generated quantities on previously fits.
Until then, you can do full Bayesian posterior inference in R or Python on the outside, but you have to be careful with averaging over the draws and with parameterizations of things like densities.
Thanks @Bob_Carpenter@bbbales2
It’s an efficiency concern. My model is a hierarchical model where I have group level predictors. So, the final use of my model is to generate curves and their bounds for unseen group level predictors. I want to use the fitted model to make these predictions and not have to go through the fitting phase.
The standalone version is what I was looking for. Thanks for the info.