Issue with model and transformed parameters

Hi - I’m fitting a model for an error structure for a structural vector autoregression, but I the issue I’m encountering is much more basic.

Essentially, for data y, my model has

y[,t]=H*diag_matrix(vol[,t])*z[,t],

that is, y is a mixture of white noise series z with stochastic volatilities. I can implement this easily enough by using an ugly loop with many inversions in my model block:
for (t in 1:T) y[,t]~multi_normal(zeros,H*diag_matrix(vol[,t].*vol[,t])*H')

This runs slowly. It seems to me that a more desirable approach would be to express (presumably in transformed parameters block?)
eps=inv(H)*y zhat=eps./vol
and then, in my model block, I have
zhat[i,]~normal(0,1) for each dimension i. Unsurprisingly, this runs much faster (like two orders of magnitude faster), as I’m avoiding both a loop and the multi_normal. However, the results aren’t “right”. By that I mean, for simulated data the first approach estimates the true parameters well, the latter does terribly. Mathematically, I’m doing the same thing - just standardizing a random normal before computing the likelihood. The rest of the code is identical, I just change the element whose value enters the likelihood in the model block. Any ideas what’s causing the dramatic difference in accuracy? Comparing to other things I’ve run, it looks to me that Stan is possibly ignoring the distribution of zhat specified in the model completely. I’m attaching sample code, in case that makes things clearer (or you catch typos!). Thanks so much!

Daniel

SV_code= {
'data {'
 ' int<lower=0> T;   // # time points (equally spaced)'
  'matrix[2,T] y;      
'}'
'transformed data {'
'vector[2] zeros; '
'zeros = rep_vector(0, 2);'
'}'
'parameters {'
 'vector<lower=-1,upper=1>[2] phi;'
  ' cov_matrix[2] Sigma;         // innovation variance'
 ' matrix[2,T] h_innov;// std log variance time t' 
  'real H21;'
 ' real H12;'
'}'
'transformed parameters {'
 ' matrix[2,T] h;                 // log variance at time t'
  'matrix[2,2] H;'
  'matrix[2,2] L;'
  'matrix[2,T] zhat;//comment out for other method'
'L=cholesky_decompose(Sigma);'
 'H[1, 1] = 1; H[1, 2] = H12; // Construct normalized H matrix'
 'H[2, 1] = H21; H[2, 2] = 1;'
 ' h = L*h_innov; // Assemble the path of volatility'
 ' h[1,1] = h[1,1] / sqrt(1 - phi[1] * phi[1]);' 
 ' h[1,2] = h[1,2] / sqrt(1 - phi[2] * phi[2]);' 
 ' for (t in 2:T){'
 '   h[,t] = h[,t] +diag_matrix(phi) * h[,t-1];'
'}'
'zhat=inv(H)*y; // standardize the data,y to something that is iid N(0,1) //comment out for other method'
'zhat=zhat ./ exp(h/2);//comment out for other method'
'}'
'model {'
'zhat[1,]~normal(0,1);//comment out for other method'
'zhat[2,]~normal(0,1);//comment out for other method'
 'h_innov[1,]~normal(0,1);'
 'h_innov[2,]~normal(0,1);'
  'for (t in 1:T){'
  '//y[,t]~multi_normal(zeros,H*diag_matrix(exp(h[,t]))*H'');// uncomment this line to run without standardization'
  '}'
'}'
};
rng(1)
T=500;
H=[1,-.5;.5,1];
innov=[0.2,-0.025;-0.025,0.2];
innovchol=chol(innov);
eta=NaN(2,T);
    X=sqrt(diag(innov))./sqrt(1-phi.^2).*randn(2,1);
    V=exp(X);
    z=sqrt(V).*randn([2 1]);
    eta(:,1)=H*z;
    for t=2:T
        X=phi.*X+innovchol'*randn([2,1]);
        V=exp(X);
        z=sqrt(V).*randn([2 1]);
        eta(:,t)=H*z;
    end

SV_data = struct('T',length(eta),'y',eta);

fitcomplile=stan('model_code',SV_code,'data',SV_data,'warmup',10,'iter',10,'chains',1);
fit=stan('fit',fitcompile,'data',SV_data,'warmup',100,'iter',1000,'chains',1,'seed',1);

%%% With standardization, H estimated to be     [1.0000   12.2478 ;-17.4181
%%% 1.0000]. With the multi_normal,     [1.0000    1.3922; -1.9250
%%% 1.0000] which is close to a column-reordering of H given the small
%%% run-length

Forgot to specify - code above is for matlab. I’d upload the data generated in the dump, but I’m new here and not sure if I can attach files.
Also, as a test I’ve tried something similar on a simple stochastic volatility model like the example here (https://github.com/stan-dev/example-models/blob/master/misc/moving-avg/stochastic-volatility-optimized.stan). I’ve had the same problem there. I’ve also tried a jacobian adjustment, but that doesn’t seem to be all that’s going on here.

I looked at this but the problem has alluded me as well :D.

If you end up having to go back to your original form then maybe you could write out the 2x2 Cholesky decomposition for H*diag_matrix(exp(h[,t]))*H' analytically and use multi_normal_cholesky instead?

Thanks, I’ll look into that. However, given that it needs to be computed separately for each t, I’m not sure if just doing it outside of the multinormal is necessarily going to speed things up?

In any case, I’ve been experimenting with a very simple 1-d model that may be easier to think through (see below), with a jacobian adjustment. Besides that, I’m just adding a couple of lines to replace the y ~ normal(0,exp(h/2)) part. I thought maybe defining zhat as a local in the model block might help - doesn’t seem to. I’m getting answers for phi and sigma that show something is still way off.

SV_code= {
'data {'
 ' int<lower=0> T;   // # time points (equally spaced)'
  'vector[T] y;      // mean corrected return at time t'
'}'

'parameters {'
 ' real<lower=-1,upper=1> phi;  // persistence of volatility'
 ' real<lower=0> sigma;         // white noise shock scale'
 ' vector[T] h_std;             // std log volatility time t'
'}'

'transformed parameters {'
 ' vector[T] h; // log volatility at time t'
 ' h <- h_std * sqrt(sigma);'
 ' h[1] <- h[1] / sqrt(1 - phi * phi);'
 ' for (t in 2:T)'
 '   h[t] <- h[t] + phi * h[t-1];'
'}'

'model {'
'vector[T] zhat;'
'zhat=y ./ exp(h/2);'
  'h_std ~ normal(0,1);'
  'zhat ~ normal(0,1);'
  'increment_log_prob(log(exp(h/2)));'
  '//y ~ normal(0,exp(h/2));'
  'phi ~ uniform(-1,1);' 
'}'
};

rng(1)
T=500;
innov=0.2;
innovchol=chol(innov);
phi=.97;
z=NaN(T,1);
    X=sqrt(diag(innov))./sqrt(1-phi.^2).*randn(1,1);
    V=exp(X);
    z(t,:)=sqrt(V).*randn([1 1]);
    for t=2:T
        X=phi.*X+innovchol'*randn([1,1]);
        V=exp(X);
        z(t,:)=sqrt(V).*randn([1 1]);
    end

SV_data = struct('T',length(eta),'y',eta);

fitcomplile=stan('model_code',SV_code,'data',SV_data,'warmup',10,'iter',10,'chains',1);
fit=stan('fit',fitcompile,'data',SV_data,'warmup',100,'iter',1000,'chains',4,'seed',1);

Mathematically, you are ignoring the logarithm of the absolute value of the determinant of the Jacobian matrix of the transformation from y to zhat. Stan’s parser should have given you a warning to that effect from these two lines:

zhat[1,]~normal(0,1);//comment out for other method
zhat[2,]~normal(0,1);//comment out for other method

Since getting that right is pretty much all multi_normal does (which is why there is the determinant term in the PDF), I don’t think there are many computational gains to be had by implementing it correctly. You could do the Cholesky thing, but the fundamental reason why these stochastic volatility models are slow to fit is because NUTS has to take a lot of small steps.

Ben

1 Like

Goodrich’s answer is the answer to listen to, but I went and worked through a simpler form of this just to make sure I can at least do it if I had to:

y ~ normal(a, b)

vs

yhat = (y - a) / b
yhat ~ normal(0, 1)

The matching models are here (https://gist.github.com/bbbales2/718f65ccfc8210783c59a8925327bc4d). From working through that, should this line:

increment_log_prob(log(exp(h/2)));

be

target += sum(log(exp(-h/2))); (I rudely changed your syntax, the only diff is -h/2 vs h/2)?

Thanks both - it turns out I had two separate problems. One, I had initially omitted the jacobian, as Goodrich pointed out. Two, having incorporated it (in the second block of code I posted), I still didn’t have the Stan syntax right, as bbbales demonstrated in his example. I’ve now got it working in a simple example. In my work, it turns out that it does save a LOT of time over multi_normal because it avoids the choleskydecomp/inversion of T matrices, and the loop to do that, so this is a real lifesaver! I’ll be back if I run into trouble in more complicated case.

Thanks so much!

@daniellewis, can you mark either of the Ben’s responses as an answer (see the icons at the bottom of each post) so the post will appear as resolved? Thanks!