I’m new to both Bayesian Inference and Stan, I’m experimenting a bit and I ran into an error message I don’t understand.
After a successful linear regression I’m now trying to fit a power function: y = a*x^b + Normal(0,\sigma)
I have a very strong reason to believe that b should be around 3. A good value for a would be 4.
\sigma in my case is expected to be around 2000, but I read in the manual that it is better when parameters are of the same scale so I normalize so the expected variance is around 1.
My Stan model is:
data {
int<lower=1> N;
vector[N] x;
vector[N] y;
real nrm_factor;
}
transformed data {
vector[N] y_nrm;
y_nrm <- y/nrm_factor;
}
parameters {
real<lower=0> a;
real<lower=0> b;
real<lower=0> sigma;
}
model {
vector[N] y_pred;
for (i in 1:N)
y_pred[i] <- (a*x[i]^b)/nrm_factor;
//priors
a ~ normal(4, 1);
b ~ normal(3, 0.1);
sigma ~ gamma(2, 2);
//likelihood
y_nrm ~ normal(y_pred, sigma);
}
When I run this (with MatlabStan’s defaults) I get around 20 of the following warnings:
output-4.csv: Exception: normal_lpdf: Location parameter[1] is 1.#INF, but must be finite! (in 'H:/Projects/stan/powercurvetest/anon_model.stan' at line 31)
output-4.csv: If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
output-4.csv: but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
However the model does provide answers that I expect based on the data:
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
Warmup took (13, 13, 14, 12) seconds, 53 seconds total
Sampling took (12, 14, 14, 13) seconds, 53 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -846 3.6e-002 1.3e+000 -849 -846 -845 1266 24 1.0e+000
accept_stat__ 0.93 1.7e-003 1.1e-001 0.70 0.97 1.00 4000 75 1.0e+000
stepsize__ 0.048 5.2e-005 3.3e-003 0.045 0.048 0.053 4000 75 1.$e+000
treedepth__ 4.8 2.4e-002 1.5e+000 2.0 5.0 6.0 3597 68 1.0e+000
n_leapfrog__ 54 6.3e-001 3.6e+001 3.0 59 127 3305 62 1.0e+000
divergent__ 0.00 0.0e+000 0.0e+000 0.00 0.00 0.00 4000 75-1.$e+000
energy__ 848 4.9e-002 1.8e+000 845 847 851 1312 25 1.0e+000
a 12 2.2e-002 7.8e-001 10 12 13 1277 24 1.0e+000
b 2.6 6.9e-004 2.4e-002 2.5 2.6 2.6 1269 24 1.0e+000
sigma 1.1 5.5e-004 2.1e-002 1.0 1.1 1.1 1514 28 1.0e+000
Samples were drawn using hmc with nuts.
My question is; can you provide me some insight where the infinity comes from? The parameters a and b are of course strongly correlated. Is this what is causing the problem with the infinite location parameter?
Also, can you give some pointers on how to resolve the correlation between a and b?
Some googling pointed me into the direction of transforming this into a linear problem: log(y) = log(a) + b*log(x) and then do a QR-decomposition on that linear problem. I’m interested in an explanation of that strategy, but also generally in strategies for uncorrelating parameters in similar parameter estimation problems.
Thanks in advance!