Missing data

Hi,

I have observations y at different stations over time, so y [s,t]. s=1 to 426, t = 1:65
But sometimes, some stations have missing data for particular times, say y[10, 12] = NA
How I represent such data in stan in most clear way?
Thanks in advance.

Best
Munir

The simplest way is to

  1. Create vectors with row and column indices for the missing data.
  2. replace missing values in y with some number before including it in the standata list (assuming you are using R)

In Stan.
3. Have a parameter variable “imputed_data” with as many elements as you have missing data.
4. “Sample” these parameters from an appropriate prior (you won’t be able to generate integer valued missing data in this simple approach)
5. In the model block specify a new variable y_imputed, where you set all non-misaung values to the original values and filll in the missing values from the imputed_data variable.
6. Evaluate the log posterior using y_imputed.

This is the most basic approach. There are more sophisticated ways to generate the imputed values, using for example multivariate normal distribution or regression on other variables.

1 Like