How to treat missing data "NA"?


#1

I want to simulate missing data and then use NUTS, my code is as follows:
for (k in 1:iter){
pz<-rnorm(pnum,0,(var[1])^0.5)
iz<-rnorm(inum,0,(var[2])^0.5)
piz<-array(rnorm((pnuminum),0,(var[3])^0.5),dim=c(pnum,inum))
x<-array(0,dim=c(pnum,inum))
for(p in 1:pnum){
for(i in 1:inum){
x[p,i]<-0+pz[p]+iz[i]+piz[p,i]
}
}
file_path<-“C:/Users/Administrator/Desktop/t/data_”
write.csv(x,file=paste(file_path,k,".csv",sep = “”),row.names = F )
library(simFrame)
a<-read.csv(file=paste(file_path,k,".csv",sep = “”),header = TRUE )
n<-1
for(n in 1:1000){
set.seed(n)
nc<-NAControl(NArate=0.1) ##MCAR missing rate
x<-setNA(a,nc) ###insert into missing data from x
x[is.na(x)] <- 0
file_path<-“C:/Users/Administrator/Desktop/t/datamiss_”
write.csv(x,file=paste(file_path,k,".csv",sep = “”),row.names = F )
}
##MCMC
gtdata<-list(np=pnum,ni=inum,vc=var,x=x)
mcmc.model<-'data{
int<lower=1> np;
int<lower=1> ni;
real x[np,ni];
real vc[3];
}
parameters{
real<lower=0> sigmap;
real<lower=0> sigmai;
real<lower=0> sigmae;
real u;
vector[np] P;
vector[ni] I;
}
transformed parameters{
real sigmap2;
real sigmai2;
real sigmae2;
sigmap2=sigmap^2;
sigmai2=sigmai^2;
sigmae2=sigmae^2;
}
model{
//define variable
real mu;
real sdvarp;
real sdvari;
real sdvare;
//likelihood
for (p in 1:np){
P[p]~normal(0,sigmap);
}
for (i in 1:ni){
I[i]~normal(0,sigmai);
}
for (p in 1:np){
for (i in 1:ni){
mu=u+P[p]+I[i];
x[p,i]~normal(mu,sigmae);
}
}
//prior
u~normal(0,sqrt(1000));
sdvarp=1.5
sqrt(vc[1]);
sdvari=1.5sqrt(vc[2]);
sdvare=1.5
sqrt(vc[3]);
sigmap~uniform(0,sdvarp);
sigmai~uniform(0,sdvari);
sigmae~uniform(0,sdvare);
}’
require(rstan)
require(reshape)
library(ggplot2)
mcmc.fit<-stan(model_code=mcmc.model,data=gtdata,chains=3,iter=1000,warmup=500,algorithm=“NUTS”, pars=c(“sigmap2”,“sigmai2”,“sigmae2”))

But there is a mistake at the time of running, and I would like to ask for help.

And the operating results are as follows:

Error in FUN(X[[i]], …) : Stan does not support NA (in x) in data
failed to preprocess the data; sampling not done
Error in dimnames_sims[[3]] : subscript out of bounds


#2

You can either change the NA values to some magic number like Inf and then in the Stan code check whether the value is equal to the magic number. Or you can pass in the non-missing values along with a structure that tells you their indices in the original data. You should read the chapter on missing values in the Stan User Manual.


#3

Thank you! I will read manual again and try it.


#4

I recommend also to read @paul.buerkner 's Handle Missing Values with BRMS: https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html

Even if you use “pure” stan/rstan, it will give you some ideas.


#5

I also have a video on missing data imputation in Stan/RStan here


#6

Thank you!
But I can not open this link, and I don’t know why.


#7

It just goes straight to YouTube. Possibly you’re behind a firewall that blocks YouTube?


#8

You’re right.
I can’t turn over the wall at the moment and cannot watch You Tobe’s video.