# Example of VAR in Stan

Does anyone have a pointer to an example or at least potential approach for a VAR(1) in Stan with `K` non negative time series `matrix[T, K] y`, In particular, I’d like to make sure for every timestep t, the estimate of `y[t]` sums up to the corresponding row of` y[t]`.

If there isn’t missingness in `y`, then you can just directly regress the elements of `y` on the relevant sets of lagged elements of `y`, right, presumably through a log-link to ensure positivity?

If I understand correctly, you want the row sums to be fixed in the generative model? And not only that, but you want the error terms within a row to sum to zero, such that the linear predictors for `y[t, ]` also sum to the fixed generative value? This seems like a pretty exotic assumption that’s a bit hard to get my head around. Would you mind sharing a bit more about the data you’re trying to model and the generative process you’re assuming?

One particular question: What do you want the behavior to be if the row sums of `y` contain a big jump? What assumption should we make about how that jump is distributed across the elements?

It’s usually good to start with a simple model and make it more complicated. So you can start with an AR(1) and then add the features that you want and then extend to a VAR(1) (or start with an AR(1), extend to VAR(1), then add the additional features you want).

The documentation provides an AR(p) example that is a useful starting point. There are a few more issues with a VAR since it is a little more annoying to impose stationarity (less of a problem if you have I(0) data and impose a strong prior on the coefficients) and you need to model the errors as multivariate normal, which can be annoying in Stan if they have some kind of factor structure.

To your question about non-negative time series, it is usually much easier to make some kind of transformation in this case so that it being non-negative is not a problem. For instance, GDP and stock prices are non-negative time series. What we typically do is take the log of them and model that. When we make predictions, we can convert back to normal values. As a result, the forecasts will also be non-negative. If there is a way to make a transformation, then that will be much easier. If you aren’t able to do that, then your problem with get quite a bit messier. Depends on the data you are working with I guess.

I don’t entirely understand your point about the estimates summing up to the values in the rows, but one way to accomplish something similar would be to drop one of the variables in the VAR, at least on the left side of the equation, keep them all on the right. You can then solve for the last variable using the sum.

1 Like

Hi ,
I just have K time series of positive values. I’d like the post predictive checks to sum up to the row sum in the original data. For example, let’s say we are predicting the population over time of three countries A, B, C., and y is a matrix of T time stamps and 3 columns.

For each time stamps, you would want to predict three positive numbers that sum up to A[t] + B[t] + C[t]. Alternatively, one may just model the proportions over time, but I’m not an expert and not sure which approach to take.

Why? Why do you want your model for the populations of three countries to be equipped with absolutely certain information about the sum of the populations of the three countries?

I really think this is a case where we will make faster progress discussing your actual modeling problem rather than a hypothetical one.