Probit and Correlated Errors

jmh530 · October 31, 2018, 9:09pm

I am fitting a probit model (below) using the standard approach described in the Stan documentation.

I am fitting the model to time-series data, rather than cross-sectional. There are long periods where the dependent variable (y) is 0 and then short periods where it is 1. As a result, I noticed that when I calculate residuals (not shown), then there is a strong presence of autocorrelation (ignoring issues on calculating residuals on probit models).

I tried to figure some ways to mitigate the issue, but honestly it’s a bit tricky. For instance, the most common way to account for autocorrelation is to include the lag of the dependent variable. However, in this case, I might not be able to observe the value of the dependent variable after just one lag. I might not observe it for 6 lags, which doesn’t help so much. Even then, I would need to transform the variable to include it in the Phi() function. I was thinking that using a latent variable approach MIGHT work (i.e. what is in the Conceptual Framework here: https://en.wikipedia.org/wiki/Probit_model), but I wasn’t sure how to do that in Stan. If I can’t get that to work, then I was also debating using hidden Markov Model.

Any suggestions?

data {
        int<lower=0> T;
	int<lower=0> N_X;
	
	int<lower=0,upper=1> y[T];
	matrix[T, N_X] X;
}
parameters {
	real A;
	vector[N_X] B;
}
model {
	A ~ normal(0, 1);
	B ~ normal(0, 10);
	y ~ bernoulli(Phi(A + X * B));
}
generated quantities {
	vector[T] y_sim;
	for (i in 1:T) {
		y_sim[i] = bernoulli_rng(Phi(A + X[i] * B));
	}
}

bgoodri · October 31, 2018, 9:19pm

I would try the latent variable approach, but check out the stuff in the Stan User manual on latent variables to implement a non-centered parameterization.

jmh530 · October 31, 2018, 9:27pm

@bgoodri I figured I would start with a simple probit model using the latent variable approach and then adjust it to include autoregressive terms. The latent variable is easy, but I can’t figure out how to get the likelihood right.

bgoodri · October 31, 2018, 9:30pm

The “likelihood” (although it is weird to think of it that way) is just independent standard normals on the latent errors, which get shifted by the linear predictor to form the latent utility for Y = 1.

jmh530 · October 31, 2018, 9:39pm

@bgoodri I think I understand. I’ll try to code something up tomorrow and see if it works.

Topic		Replies	Views
Mixed Process Model: Probit and Linear Regression Modeling	1	601	June 8, 2018
Computing autocorrelation/autocovariance in Stan Modeling	7	1010	November 17, 2020
Dynamic panel data models with Stan? Modeling	59	9548	November 20, 2023
Linear Regression with Highly Correlated Covariates Modeling	8	2346	July 12, 2019
Latent variable inference in regression model Modeling rstan	2	506	November 11, 2022

Probit and Correlated Errors

Related topics