I am trying to define a prior distribution parameter recursively from the model parameter value. In other words, I want the prior distribution on my parameter to be a uniform distribution whose parameters depend on my parameter’s value from the previous NUTS/HMC iteration. Syntactically, this would look something like this: mm[i] ~ uniform(mm[i-1]-0.01, mm[i-1]+0.01). As per my understanding, the code should look something like this:
data {
int iters;
}
transformed data {
array[5] unit_vector[3] a = {[ 0.7660444,0.,0.7660444]',
[-0.7660444,0.,0.6427876]',
[0., 0.7660444,-0.6427876]',
[0.,-0.6427876,-0.7660444]',
[0.,0.6427876,0.6427876]'};
matrix[4, 3] n;
for (h in 1:4) {
n[h] = a[h]';
}
}
parameters {
array[5] unit_vector[3] m;
}
transformed parameters {
matrix[5, 3] mm;
for (k in 1:5) {
mm[k] = m[k]';
}
array[iters] matrix[5, 3] prev_mm;
prev_mm[1] = n;
for (iter in 1:iters-1){
prev_mm[iter+1] = mm;
}
}
model {
for (iter in 1:iters){
for (i in 1:3){
for (j in 1:5) {
mm[j, i] ~ uniform(prev_mm[iter][j, i]-0.01, prev_mm[iter][j, i]+0.01);
}
}
}
target += sum(mm-n).^2;
}
However, I am getting the following error:
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.`
I understand that this is coming from prev_mm[1] = n; and target += sum(mm-n).^2, but that shouldn’t be the problem as mm randomly sampled and should be equal to n. What is the issue here and how can I use the previous iteration’s mm value for parametrizing the current iteration’s prior distribution?
This is not possible. The transformed parameters block is re-calculated each evaluation based on the current parameters, so you cannot use this to store information over time
Thanks for the prompt and clear reply! It’s very helpful!
A few quick related follow-ups tho.
Is there any other way to do this in stan (e.g. through generated quantities block or any other way)?
I have tried to implement this in form of mm[j, i] ~ uniform(mm[j, i]-0.01, mm[j, i]+0.01) and I am getting no errors and the results are somewhat sensible. What is happening there, why does it (seem?) to work? Should I be suspicious?
Is transformed parameters recalculated after prior sampling and before posterior sampling? If no, then how can I have model block with only transfored parameters variables? (This was suggested to me by Bob Carpenter here, so I have little to no doubt that it’s correct, but would like to understand why this works and what is the order of computation).
There is no “prior sampling” and “posterior sampling” in distinct phases in Stan. The model is simply defining a log joint density, and then the algorithm
Supplies a set of parameters
Calculates transformed parameters
Calculates the log density (target) as a function of these, and its gradient. The gradient is a function of the original parameters, flowing through the transformed parameters necessary, which is why the model block can only mention transformed parameters and it will still sample
These steps may occur many times for one “draw” in the output.
Because of this, your code is essentially saying, in each evaluation, each parameter is distributed roughly around itself, right now, not including any previous evaluations.
I’m not sure what you’re trying to achieve, but that’s not a coherent Bayesian prior. A Bayesian prior is given by a density p(\theta) over the parameters \theta. This density is a joint density over the parameters, but it should not depend on anything other than constants.
It sounds like what you’re trying to do is constrain the Markov chain from moving too far per step. You can do that by lowering the step size in the algorithm if that’s the goal.
No matter what x is, \log \textrm{uniform}(x | x - 0.01, x + 0.01) = \log(1 / 0.02) = \log 50. So this statement adds a constant to the target, which has no affect whatsoever on sampling. The only difference you should see is an additive different of \log 50 times the number of i, j values.
Should I be suspicious?
Yes, this statement isn’t doing anything.
What next?
What were you trying to achieve with this statement?
@WardBrian@Bob_Carpenter Thanks, that clarifies things a lot! Given @Bob_Carpenter’s feedback, I have adjusted the code to a more sensible mm[j, i] ~ uniform(n[j, i]-0.01, n[j, i]+0.01). However, I am getting the following error:
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
It seems that Stan is sampling mm such that mm=n which yields sum(mm-n).^2. But that shouldn’t be the case since some noise should come from the prior sampling statement. I have tried using the default initial values and supplying sensible arrays explicitly different from n, to make sure mm-n!=0 (at least for the beginning of the sampling). I have also tried using values other than 0.01 but to no avail. Also, I have tried using mm[j, i] ~ normal(n[j, i], 0.01) as a prior sampling statement, and it works! What could be the problem with centring mm prior around n in uniform()?
What’s happening here is that the elements of mm are initializing out of the bounds of their uniform priors. If you want to place uniform priors on these elements, then you should declare them with upper and lower bound constraints. For what it’s worth, without knowing more about your modeling scenario, I’m pretty skeptical that these narrow uniform priors are actually the best encapsulation of your prior knowledge in this system. Why not just use the tight normal distributions?
@jsocolar you are absolutely right! Normal distribution is a more appropriate prior for this model and I have decided to use it in the end. However, I just wanted to know why sampling fails in this case, in order to understand better what’s happening “behind the scenes” in Stan.
Stan requires support over all of \mathbb{R}^N on the constrained space.
If you declare a variable constrained to an interval, then it needs to be declared with the constraint. In other words, this is OK:
parameters {
real alpha;
real<lower=alpha-0.01, alpha+0.01> beta;
}
model {
beta ~ uniform(alpha-0.01, alpha+0.01); // consistent with constraints
If you don’t have the constraint, then beta can get initialized outside of the range (alpha-0.01, alpha+0.01), at which point the uniform log density is negative infinity, and Stan rejects and fails to initialize.
So I should set the constraints on parameters in this case to avoid prior (and therefore lp__) being evaluated to zero when beta gets initialized outside the range. That makes sense, thanks!
Yes. You need to do this in all cases. Stan expects any parameter values that respect the constraints to have a finite log density (sometimes you can’t avoid overflow/underflow in the tails, but if it happens where you sample, it can cause serious problems like the inability to initialize).
Overall, I’m still not sure why you’re trying to jitter alpha a little bit here. There may be alternative ways to achieve your goal.
@Bob_Carpenter Taking into consideration responses in this thread I realized that jittering alpha wouldn’t make much sense (especially in Stan), so I decided to use normal prior with mean n.