Computing autocorrelation/autocovariance in Stan

Has anyone tackled coding the computation of the sample autocorrelation (or autocovariance) in Stan yet? I know there are a variety of methods for modelling structure in the autocorrelation, but I’m not finding anything for simply computing what the autocorrelation is. (I’ll be looking at the autocorrelation on a latent parameter, so I can’t just compute it from the observed data)

I see some neat fft-based code in the posterior package’s autocovariance() function, which presumably is a computationally efficient way to do it. Thought I’d check if anyone had coded it for Stan yet.

You mean in generated quantities?

Sure, though I’m also thinking of including the distribution of autocorrelations as part of the model likelihood to help constrain the latent parameter.

You mean autocorrelation not necessarily in an AR1 structure?

Yes. The context is periodic models where there are two classes of model configurations that I think are competing: High estimated signal-to-noise-ratio with accurate estimates of the frequency and phase of the periodic signal, versus low estimated signal-to-noise-ratio with arbitrary estimates of the frequency and phase. It occurred to me today that the residuals in the latter should retain high autocorrelation, so explicitly modelling the autocorrelation of the residuals as samples from a population with a true correlation of zero should change the geometry of the parameter space to eliminate this spurious mode.

The Math library has autocorrelation and autocovariance functions, but these look to be for internal use since they write the results into a provided output. It might be worth opening an issue in the Math library to provide versions of these that can be exposed to the Stan language.

1 Like

Below is code for computing them by hand. I haven’t had a chance to benchmark but I suspect it’s very slow. I also opened an issue over on stan-math as you suggested.

functions{
	// correlation
	real cor(vector x, vector y){
		real ex = mean(x);
		real ey = mean(y);
		real exy = mean(x .* y);
		real covxy = (exy - (ex*ey)) ;
		real sdx = sd(x);
		real sdy = sd(y);
		return( covxy / (sdx*sdy) ) ;
	}
	// normalized autocorrelations (for lags with >=4 observations)
	vector autocorz(vector x){
		int nx = num_elements(x);
		vector[nx-4] z ;
		for(i in 4:(nx-1)){
			z[i-3] = atanh(cor( x[i:] , x[:(nx-i+1)] ));
		}
		return(z);
	}
]

Edit: benchmarked and yikes it slows things down by a couple orders of magnitude (and I left off the bit implementing my idea to constain the likelihood, so that’s just the computation, not computation+change-of-posterior-geometry). Hopefully the stan-math implementation performs better.

Oops, didn’t code the autocorrelation loop properly. Here’s the fixed version:

	// normalized autocorrelations (for lags with >=4 observations)
	vector autocorz(vector x){
		int nx = num_elements(x);
		vector[nx-4] z ;
		for(i in 1:(nx-4)){
			z[i] = atanh( //atanh "normalizes" correlations (Fisher's r-to-z transform)
				cor( x[1:(nx-i)] , x[(1+i):nx] )
			);
		}
		return(z);
	}