I am trying to fit a Vector Autoregression (VAR(1)) model but I am not sure how to define the log_lik to be used later for P_loo calculation. Suppose the model looks something like this:

Semantic error in '/tmp/RtmpxBWcjx/model-7c9b391d61dc.stan', line 97, column 0 to column 58:
-------------------------------------------------
95:
96:
97: log_lik = multi_normal_lpdf(y[1:T] | mu, sigma);
^
98:
9
9: }
-------------------------------------------------
Ill-typed arguments supplied to assignment operator =: lhs has type array[] vector and rhs has type real

With VAR models in Stan, it is always helpful to check that the coefficients you are simulating produce a stationary model. It is harder to constrain them properly than an AR model. There was a paper a few years back that was presented at a StanCon that had implemented it so that they were stationary and the priors made some kind of sense.

Another thing to be aware of is that is that the LKJ prior on the correlation matrices has a prior that the series are uncorrelated. If you have many time series in the VAR and they have lots of correlation with each other, then that prior may not be so good.

Thanks, yes I guess you are referring to Sarah Heapâ€™s work. The model I am trying to fit is an extension of VAR. I am trying to model Heteroscedasticity by assuming an ARCH like structure for variance, so the model would be VAR + ARCH. At the moment I am mostly struggling with some of the parameters not converging and I realized thatâ€™s because of large chunks of missing data in my time series (if I impute the missing values before sampling there is no convergence issue).

Not an easy problem! Best of luck. My usual approach in Stan is to try to start from as simple a model as possible and build it up. In this case though, Iâ€™m not sure whether it is the VAR that is more complicated than the ARCH (i.e. which to add first).

Is the issue with missing data that the data have different start dates or are they missing for some other reason?

VAR and ARCH combined are problematic whereas fitting each separately doesnâ€™t cause any sampling issues in my case but the fit wouldnâ€™t look good.

Data is missing for some unknown reasons but I am not really interested in the imputation per se, I would simply remove them if I wasnâ€™t dealing with time series data but now I have to somehow deal with them in a way that the model convergences properly and the parameters of interest (covariates effects) are not affected (so I canâ€™t simply impute them with the mean). In other words, I want the learning process to be stopped during the missingness and resumed afterwards.

@Aminsn Iâ€™ve had some success fitting a modified ARCH model where I use some hyperparameters to control the coefficients after the first one (basically using a logistic function to send them to zero). I fit it to monthly S&P 500 returns since 1970 and it seemed to do well and didnâ€™t take too long to fit. I got some warnings about ESS being low, but that can be fixed by running it for more iterations.

Off hand, I would guess that an ARCH model would probably fit faster in Stan than GARCH since GARCH volatilities have correlation with each other.

data {
int<lower=0> T;
int<lower=0> p;
vector[T] Y;
}
parameters {
real mu;
real<lower = 0> sigma_A;
real<lower = 0> sigma_B;
real<lower = 0> L;
real<lower = 0> B;
}
transformed parameters {
vector[p + 1] sigma_Bs;
for (i in 1:p) {
sigma_Bs[p + 1 - i] = L * exp(-(i - 1) * B) / (1 + exp(-(i - 1) * B));
}
sigma_Bs[p + 1] = sigma_B;
}
model {
mu ~ normal(0.001, 0.05);
sigma_A ~ cauchy(0.0, 0.2);
sigma_B ~ normal(0, 2);
L ~ cauchy(0, 2);
B ~ cauchy(0, 1);
{
vector[T - 1] sigmas;
int j;
for (i in 2:T) {
j = min(p + 1, i - 1);
sigmas[i - 1] = dot_product(sigma_Bs[1:j], square(Y[(i - j):(i - 1)]));
}
sigmas = sigmas + sigma_A;
Y[2:T] ~ normal(mu, sqrt(sigmas));
}
}