We use Stan to do non-linear regression on summary datasets. Specifically, we’re having an issue fitting our Weibull model. y = a + (1 - a) \times (1 - e^{-c \times x^{b}}).
Our Stan model is below and the priors for parameters a, b, and c are all uniform distributions with bounds [0,1], [0, 15] and [0, 50] respectively. In this case the pwr_lbound is 1.
data {
int<lower=0> len;
array[len] real<lower=0> y;
array[len] real<lower=0> n;
array[len] real<lower=0> x;
real pwr_lbound;
array[2] real p_a; // prior for a
array[2] real p_b; // prior for b
array[2] real p_c; // prior for c
}
parameters {
real<lower=0, upper=1> a;
real<lower=pwr_lbound> b;
real<lower=0> c;
}
model {
a ~ uniform(p_a[1], p_a[2]);
b ~ uniform(p_b[1], p_b[2]);
c ~ uniform(p_c[1], p_c[2]);
for (i in 1 : len) {
real theta;
theta = a + (1 - a) * (1 - exp(-c * x[i] ^ b));
target += lgamma(n[i] + 1) - lgamma(y[i] + 1) - lgamma(n[i] - y[i] + 1)
+ y[i] * log(theta) + (n[i] - y[i]) * log(1 - theta);
}
}
This works pretty well for 99% of the datasets we’ve used so far, but recently we’ve run into the following issue for a few datasets. For instance, the following dataset has this issue.
x | n | y |
---|---|---|
0 | 8 | 0 |
0.03 | 8 | 0 |
0.1 | 8 | 1 |
0.22 | 8 | 2 |
0.56 | 8 | 3 |
1 | 8 | 8 |
By the time the sampler finishes the warmup it seems to be stuck in some sort of loop so all the sampled iterations are the exact same value for each parameter. I’ve attached the csv output with save warmup on here as well.
weibull-20231023120901.csv (1.9 MB)
Amongst other problems this causes the values for certain parameters in the summary to be NaN:
Mean | MCSE | StdDev | 2.5% | 5% | 50% | 95% | 97.5% | N_Eff | N_Eff/s | R_hat | |
---|---|---|---|---|---|---|---|---|---|---|---|
lp__ | -3.275210 | NaN | 1.225730e-13 | -3.275210 | -3.275210 | -3.275210 | -3.275210 | -3.275210 | NaN | NaN | NaN |
a | 0.182526 | NaN | 1.654290e-14 | 0.182526 | 0.182526 | 0.182526 | 0.182526 | 0.182526 | NaN | NaN | NaN |
b | 12.949200 | NaN | 8.526800e-13 | 12.949200 | 12.949200 | 12.949200 | 12.949200 | 12.949200 | NaN | NaN | NaN |
c | 36.967500 | NaN | 2.366190e-12 | 36.967500 | 36.967500 | 36.967500 | 36.967500 | 36.967500 | NaN | NaN | NaN |
Does anyone have advice on how to avoid this sort of issue? We’re using stan 2.33.0 and cmdstanpy 1.2.0 inside a Linux Docker container if that matters for any reason.