Log_lik for multivariate normal

I am trying to fit a Vector Autoregression (VAR(1)) model but I am not sure how to define the log_lik to be used later for P_loo calculation. Suppose the model looks something like this:

data {
vector[m] y[T];
parameters {
  cov_matrix[m] sigma; // Error variance, Sigma
  vector[m] mu[T];
model {
  y[1:T] ~  multi_normal(mu, sigma);
generated quantities {
  vector[m] log_lik[T];
log_lik = multi_normal_lpdf(y[1:T] | mu, sigma);

This specification gives the below error:

Semantic error in '/tmp/RtmpxBWcjx/model-7c9b391d61dc.stan', line 97, column 0 to column 58:
    97:  log_lik = multi_normal_lpdf(y[1:T] | mu, sigma);
9:  }

Ill-typed arguments supplied to assignment operator =: lhs has type array[] vector and rhs has type real

How should I define the log_lik for such a model?

Not sure if there is a quick vectorized way to do this, but looping over each T should suffice:

for (t in 1:T) {
    log_lik[t] = multi_normal_lpdf(y[t] | mu[t], sigma);

This is not valid, as m and T are not defined. It would be good to use the new array syntax which is more clear (and the old syntax will go away)

  array[T] vector[m] y;

_lpdf functions return sum of log densities

No, vectorization is not possible (at the moment), and a loop like you suggested is needed to get T terms.

For timeseries model, you may also be interested to look at Approximate leave-future-out cross-validation for Bayesian time series models • loo. In case of time series, LOO is also valid for certain purposes, e.g. focus in the conditional observation model, or task is to impute single missing values.

1 Like

With VAR models in Stan, it is always helpful to check that the coefficients you are simulating produce a stationary model. It is harder to constrain them properly than an AR model. There was a paper a few years back that was presented at a StanCon that had implemented it so that they were stationary and the priors made some kind of sense.

Another thing to be aware of is that is that the LKJ prior on the correlation matrices has a prior that the series are uncorrelated. If you have many time series in the VAR and they have lots of correlation with each other, then that prior may not be so good.


Thanks, yes I guess you are referring to Sarah Heap’s work. The model I am trying to fit is an extension of VAR. I am trying to model Heteroscedasticity by assuming an ARCH like structure for variance, so the model would be VAR + ARCH. At the moment I am mostly struggling with some of the parameters not converging and I realized that’s because of large chunks of missing data in my time series (if I impute the missing values before sampling there is no convergence issue).

Not an easy problem! Best of luck. My usual approach in Stan is to try to start from as simple a model as possible and build it up. In this case though, I’m not sure whether it is the VAR that is more complicated than the ARCH (i.e. which to add first).

Is the issue with missing data that the data have different start dates or are they missing for some other reason?

VAR and ARCH combined are problematic whereas fitting each separately doesn’t cause any sampling issues in my case but the fit wouldn’t look good.

Data is missing for some unknown reasons but I am not really interested in the imputation per se, I would simply remove them if I wasn’t dealing with time series data but now I have to somehow deal with them in a way that the model convergences properly and the parameters of interest (covariates effects) are not affected (so I can’t simply impute them with the mean). In other words, I want the learning process to be stopped during the missingness and resumed afterwards.

Might make for an interesting paper to investigate the issues with VAR and ARCH together.

@Aminsn I’ve had some success fitting a modified ARCH model where I use some hyperparameters to control the coefficients after the first one (basically using a logistic function to send them to zero). I fit it to monthly S&P 500 returns since 1970 and it seemed to do well and didn’t take too long to fit. I got some warnings about ESS being low, but that can be fixed by running it for more iterations.

Off hand, I would guess that an ARCH model would probably fit faster in Stan than GARCH since GARCH volatilities have correlation with each other.

data {
    int<lower=0> T;
    int<lower=0> p;

    vector[T] Y;
parameters {
    real mu;
    real<lower = 0> sigma_A;
    real<lower = 0> sigma_B;
    real<lower = 0> L;
    real<lower = 0> B;
transformed parameters {
    vector[p + 1] sigma_Bs;
    for (i in 1:p) {
        sigma_Bs[p + 1 - i] = L * exp(-(i - 1) * B) / (1 + exp(-(i - 1) * B));
    sigma_Bs[p + 1] = sigma_B;
model {
    mu ~ normal(0.001, 0.05);
    sigma_A ~ cauchy(0.0, 0.2);
    sigma_B ~ normal(0, 2);
    L ~ cauchy(0, 2);
    B ~ cauchy(0, 1);

        vector[T - 1] sigmas;
        int j;
        for (i in 2:T) {
            j = min(p + 1, i - 1);
            sigmas[i - 1] = dot_product(sigma_Bs[1:j], square(Y[(i - j):(i - 1)]));
        sigmas = sigmas + sigma_A;

        Y[2:T] ~ normal(mu, sqrt(sigmas));