Comparision: simple AR(1) model in stan vs. pymc

I’am trying to replicate an AR model I once code in PyMC. Th Stan version gives yield some different results and would like to know why this is the case.

This is the PyMC code:

with pm.Model() as pm_model:
    rho = pm.Normal("alpha", mu=[0, 0], sigma=[10, 1])
    sigma = pm.HalfNormal("sigma", 1)
    
    ar = pm.AR("ar", rho=rho, sigma=sigma, constant=True, init_dist=pm.Normal.dist(4, 1), steps=n_samples-1)
    y_hat = pm.Normal("y_hat", mu=ar, sigma=sigma, observed=data) 
                      
    pm_data = pm.sample()
    pm_data.extend(pm.sample_posterior_predictive(pm_data))

And this is the result of translating the above to Stan

data {
  int<lower=1> N;
  vector[N] y;
}

parameters {
  real init_dist;
  real rho1;
  real<lower=-1, upper=1> rho2;
  real<lower=0> sigma;
}

model {
  init_dist ~ normal(4, 1);
  rho1 ~ normal(0, 10);
  rho2 ~ normal(0, 1);
  sigma ~ normal(0, 1);

  y[1] ~ normal(rho1 + rho2 * init_dist, sigma);
  y[2:] ~ normal(rho1 + rho2 * y[1:(N-1)], sigma);
}

generated quantities {
  vector[N] y_hat;
  y_hat[] = to_vector(normal_rng(rho1 + rho2 * y[], sigma));
}

Event though the estimated parameters are similar, the posterior predictive samples are way more smoothed in the result from Stan. See this image below for illustration:

Can anyone help me to understand what’s happening here?

Probably the lower and upper bounds on rho2. I don’t how pymc would know that.

1 Like

Thanks for the hint. Interestingly it’s not the bounds. But I found that, I can scale the smoothness, by forcing rho1 in a certain direction. So, for example, setting rho1 ~ normal(2, 0.5) the result is nearly identical to pymc.

However, this poses some further questions. The data was generated with rho1 = 4. rho1 ~ normal(2, 0.5) produces an adequate result without any numerical issues. However, using rho1 ~ normal(4, 0.1) produces a worse result, including numerical issues. Is suspect, this is because the first prior fits this very sample better.

But why is the model in Stan so susceptible to the prior? I’d guess because of the small sample size. However, in pymc I can give a very broad range of prior parameters, and it’s always converging to the expected result without any numerical issues.
Any further thoughts on that?

Can imagine this being a difference in parameterisations – pymc seems to do it via the initial value + the innovations. The latter are probably more likely to be zero-mean / more like a non-centred parameterisation and perhaps “easier” to sample from?

1 Like

I’m curious how you determined that, because adding bounds will force rho1 away from the boundaries and can bias estimates compared to the unconstrained solution. I’m pretty sure the PyMC code as written does not put bounds on the rho values. It is not typical to put bounds on these values in an AR(1) model.

I would suggest simplifying your model so that y[1] ~ normal(4, 1); if you think normal(4, 1) is a good distribution for the initial state. Alternatively, you can push all the normals through and keep the same behavior as you have now without the init_dist parameter. What you get is additional variance from rho2 * init_dist. I think it works out to the following, but you should check my algebra, taking

y[1] = rho2 * normal(4, 1) + normal(rho1, sigma)
    = normal(rho2 * 4, rho2) + normal(rho1, sigma)
    = rho2 * 4 + rho1 + normal(0, rho2) + normal(0, sigma)
    = rho2 * 4 + rho1 + normal(0, 1 / sqrt(1 / rho2^2 + 1 / sigma^2)))
    = normal(rho1 + rho2 * 4, 1 / sqrt(1 / rho2^2 + 1 / sigma^2))

so that

y[1] ~ normal(rho1 + rho2 * 4, inv(sqrt(inv(square(sigma)) + inv(square(rho2)))));

In any case, I wouldn’t recommend calling a scalar value init_dist as it’s not a distribution.

1 Like