Difference in parameter outcome with equivalent but different model section


#1

I am trying to model a stochastic volatility model as
v[t] = omega + phi*v[t-1] + e[t], |phi| < 1
e[t] = sigma[t]z[t], z[t]~student_t(nu, 0, 1)
sigma[t]^2 = eta + alpha
e[t-1]^2

The stan code is the following :
data {
int<lower=100> T;
real<lower=0> v[T];
real<lower=0> v0; // v[0]
real<lower=0> v_1; // v[-1]
}

	parameters {
			real<lower=0> omega;
			real<lower=0, upper=1> phi;
			real<lower=0> eta;
			real<lower=0, upper=1> alpha; 
			real<lower=2, upper=10> nu;
		}

	transformed parameters {
			real e[T];
			real<lower=0> sigma[T];
			real e0 = v0 - omega - phi*v_1;
			e[1] = v[1] - omega - phi*v0;
			for (t in 2:T) {
				e[t] = v[t] - omega - phi*v[t-1];
			} 
			sigma[1] = sqrt(eta + alpha*pow(e0,2));
			for (t in 2:T) {
				sigma[t] = sqrt(eta + alpha*pow(e[t-1],2));
			}
		}

	model {
			e ~ student_t(nu, 0, sigma);
		} 

This code runs quite fast, however if I change the model section to
model {
for (t in 2:T) {
v[t] ~ student_t(nu, omega + phi*v[t-1], sigma);
}
}
which I think is equivalent to the original, then it runs much slower. However the most important things is the outcome of the simulation or the posterior mean of the parameters are different by an none insignificant amount. This difference is reproducible across simulations and is very stable. Can anyone help to explain why is the difference in the posterior mean of the parameters?


#2

Is that actually the model section for your second model? If sigma is an array, I don’t see how v[t] ~ student_t(nu, omega + phi*v[t-1], sigma); would work. Is it v[t] ~ student_t(nu, omega + phi*v[t-1], sigma[t]); by chance?

Also, I’m probly missing something, but it seems like your second model is missing the likelihood for v[1].

Something like?
v[1] ~ student_t(nu, omega + phi * v_0, sigma[1])


#3

Hi Ben,
Very good point! I think I missed [t] in sigma[t] and add v[1]. However when stan compiles this model code to C++, there is not even a warning. So, if this is impossible what model does that code is modeling? Also if v[1] is missing from the model what actual consequence would it cause?

jicun


#4

However when stan compiles this model code to C++, there is not even a warning

I think it is a valid statement now. Should be equivalent to:

target += student_t_lpdf(y[t] | nu, omega + phi*v[t-1], sigma[1])  +
  student_t_lpdf(y[t] | nu, omega + phi*v[t-1], sigma[2]) +
  ... +
  student_t_lpdf(y[t] | nu, omega + phi*v[t-1], sigma[N]);

Which is not what you wanted I think. This could also explain why things were unusually slower. Your target log probability was a sum of N times too many things.

what actual consequence would it cause

What actual consequence did it cause :P? I dunno, but I think you modeled it in the first model so you probly want it here. I don’t see why you wouldn’t want to model it though, esp. since things at t + 1 depend on things at time t.

edit: I forgot to add "y[t] | "s to the lpdf statements originally


#5

I see. Thanks for the explanation.