Probit and Correlated Errors

I am fitting a probit model (below) using the standard approach described in the Stan documentation.

I am fitting the model to time-series data, rather than cross-sectional. There are long periods where the dependent variable (y) is 0 and then short periods where it is 1. As a result, I noticed that when I calculate residuals (not shown), then there is a strong presence of autocorrelation (ignoring issues on calculating residuals on probit models).

I tried to figure some ways to mitigate the issue, but honestly it’s a bit tricky. For instance, the most common way to account for autocorrelation is to include the lag of the dependent variable. However, in this case, I might not be able to observe the value of the dependent variable after just one lag. I might not observe it for 6 lags, which doesn’t help so much. Even then, I would need to transform the variable to include it in the Phi() function. I was thinking that using a latent variable approach MIGHT work (i.e. what is in the Conceptual Framework here:, but I wasn’t sure how to do that in Stan. If I can’t get that to work, then I was also debating using hidden Markov Model.

Any suggestions?

data {
        int<lower=0> T;
	int<lower=0> N_X;
	int<lower=0,upper=1> y[T];
	matrix[T, N_X] X;
parameters {
	real A;
	vector[N_X] B;
model {
	A ~ normal(0, 1);
	B ~ normal(0, 10);
	y ~ bernoulli(Phi(A + X * B));
generated quantities {
	vector[T] y_sim;
	for (i in 1:T) {
		y_sim[i] = bernoulli_rng(Phi(A + X[i] * B));

I would try the latent variable approach, but check out the stuff in the Stan User manual on latent variables to implement a non-centered parameterization.

@bgoodri I figured I would start with a simple probit model using the latent variable approach and then adjust it to include autoregressive terms. The latent variable is easy, but I can’t figure out how to get the likelihood right.

The “likelihood” (although it is weird to think of it that way) is just independent standard normals on the latent errors, which get shifted by the linear predictor to form the latent utility for Y = 1.

@bgoodri I think I understand. I’ll try to code something up tomorrow and see if it works.