Changing default inits

Stan uses default init of 2; that is, initial values for parameters are sampled from uniform(-2,2) on the internal (unconstrained) scale.

I think it would be better to use the default of 0.1.

It should be even better to use inits from Pathfinder, and that’s fine too. My thought is that switching to default of 0.1 is very low cost, and that can be done while waiting for Pathfinder inits to be put in.

P.S. See my comment below. My concern here is not Stan crashing or being unable to draw any samples (I think that’s what Steve and Aki are talking about when they refer to the sampler “failing”). Rather, my concern is that the default starting values are in many problems so far away from the mass of the distribution that Stan gets lost trying to get anywhere. My concern is not Stan failing in the sense of it not running at all, but rather Stan having a practical failure in the sense of going very slowly and having poor mixing because it’s wasting time spinning its wheels out in the boondocks.

Do you mean sampling from a range of -0.1 to 0.1? I think we could change the default to try (-2, 2) and then if that fails try (-0.1,0.1)

My suggestion is to start with current default (-2.2) so that when it works, the behavior doesn’t change. But in case of reject, halve the range to (-1,1) and keep halving until success. This will keep the original idea of aiming for a large variation in the initial values, but with 5th try we’re already in the range @andrewgelman proposes. We talked about this with Andrew yesterday, and he thinks the halving would be too complicated to implement, but I think changing the default initial init that much is too big change with unknown consequences, too. @stevebronder’s suggestion of trying (-2,2) and then (-.1,.1) is less flexible than what I propose. Of course, in my approach it is still possible that even (-\epsilon,\epsilon) init doesn’t work because 0 is bad for some parameters.

Implementation isn’t difficult; all you need to do is to adjust the user-supplied init_radius on each loop iteration:

I don’t like halving strategy, though.
The current code retries 100 rejections but floating point arithmetic doesn’t really support more 50 halvings. Actually, I’d guess that even 20 halvings remove so much entropy that you might as well zero-initialize at that point.
I propose changing the radius linearly until it reaches zero

init_radius_adjusted = init_radius * (MAX_INIT_TRIES - num_init_tries)

That’s what I thought :D

The benefit of halving is that we don’t need to retry 100 times. I think it would be fine to stop after 20 retries when using halving. I guess that 100 times has been used just because [-2,2] was such a bad choice originally, and increasing the number of retries was just the wrong fix. Changing the init linearly will waste more retries.

1 Like

Just to clarify the above discussion . . . There are two issues:

  1. Often with the default inits Stan moves very slowly, has trouble converging, gets stuck in bad parts of parameter space. This problem sometimes goes away when we use more reasonable inits. I suspect that if we replace the default inits setting of 2.0 with a new default of 0.1, we will get better performance on average.

As a first step here, I think it could make sense to do some experimentation, starting with the examples in posteriordb, with different default inits values to see what happens.

  1. Sometimes with the default inits, Stan can never get a starting point that avoids underflow/overflow and it just gives up. In that case it seems to make sense to try smaller inits.

The above discussion is all about issue 2, which is fine. But my concern is about issue 1.

I added a P.S. to my post above to clarify.

1 Like