Was this for the default 1000 warmup/1000 sampling iterations in 4 chains (i.e., 4000 posterior draws). If so, it’s fine—it’s an integrated autocorrelation time of roughly 7, which is generally considered good mixing.
The way to evaluate if it’s mixing through the right region of the posterior is to simulate data then fit it and check whether the posterior has the right coverage (e.g., 50% intervals contain about 50% of the true values).
This is a NaN
issue, which is not-a-number, a fixed floating point value for errors that is part of the IEEE standard. NA
is what R uses for undefined values. NaN values come up when arithmetic is incoherent, like 0 / 0
, or inf - inf
. I’m guessing that what’s happening is that you are rounding rho
to 1
, so that nuse[j]
becomes NaN.
You can instrument your code with print statements to see where things are going wrong. For instance, print out the values of rho
and ur
and nuse[j]
and util[j + 1]
and you can see where util
gets a NaN introduced.
Are you sure you want -exp(...)
before applying categorical_logit
? The categorical_logit distribution applies softmax (exp(x) / sum(exp(x))) to the argument, so it’s getting doubly exponentiated. This is a quick route to overflow or underflow.
The thing to do to mitigate this problems is to work on the log scale for as long as you can. This may, unfortunately, require unfolding categorical_logit and coding it up manually, because that can be part of the problem.
For example, we know that your target +=
statement will amount to this.
target += log(softmax(util))[Choice[ob]];
Now you can start unfolding the algebra back through the definition of util
.
But given that things seem to be mixing OK, I wouldn’t bother with this. I would try to do the validation with simulated data, though.
this is very unlikely given that the existing fit has an MCMC standard error (MCSE) of only 0.02. You wouldn’t expect to find results further away than 4 times the MCSE.
You have to be careful about what you mean by the “true” value of V
. We’re trying to find the posterior mean of V
, not recover the simulated value. Here’s an example. If I simulate a bunch of data i.i.d. via V ~ normal(0, 1)
, it might have a mean of 0.3 and a standard deviation of 0.9, so that you don’t expect to recover mu = 0
and sigma = 1
by fitting V ~ normal(mu, sigma)
. In the limit of infinite data, you will you get convergence to the true values with a well-behaved model.