How to handle missing values in Stan

Hi,
I developed a model which works well. But now, I have data with missing values and I’m not sure how I can handle it in rstan. I got an error message that says ‘‘Stan does not support NA (in y) in data’’. Can anyone help me with handling the missing values? I attach the data and the code below.
Antigen_Decay_Nan.csv (8.0 KB)

STAN Code

data {
  int N;         
  int K; 
  real x[K];
  real y[N,K];              
 }
parameters {
 real beta;      
 real<lower=1> sigma;
 real<lower=1> muu;
 real<lower=0> sigmau;
 vector[N] U_raw;
}
transformed parameters { 
  real nu[N,K]; 
  vector[N] U=U_raw * sigmau+muu;
  for(n in 1:N){
    for(k in 1:K){
      nu[n,k]=U[n]+beta * x[k];
    }
  }
}
model {
  U_raw ~ std_normal();
  for(n in 1:N){
    for (k in 1:K){
      y[n,k] ~ normal(nu[n,k],sigma);
    }
  }
}

R Code

datay <- read.csv("Antigen_Decay_Nan.csv")
N=100
age=c(6.5, 7.5, 10.5,13.5,16.5,21.5)
n=length(age)
X <- log(age-5.5)
antigen_dat <- list(N = N, 
                    K = n,
                    x = X,
                    y=datay)
fit <- stan(file = 'measles_stan.stan', data = antigen_dat)
print(fit, pars = c("beta", "muu","sigmau", "sigma"))
traceplot(fit, pars = c("beta", "muu","sigmau", "sigma"), inc_warmup = FALSE, nrow = 2)

edit: code formatted by @Max_Mantei

See this lecture, and the missing data section of the SUG.

2 Likes

Thanks for the links. I now changed the format of the data data to have id for individual measurements. Since we are not interested in estimating the missing values, I deleted all the missing data points; I’m left with observed data which has unequal number of measurements for each individual ( i.e the id(s) are not the same size). Do you have an idea how I can handle such case in Stan
Antigen_Decay_Nan3_id.csv (9.0 KB)