How to treat missing data "NA"?

Patricia · June 12, 2018, 11:18am

I want to simulate missing data and then use NUTS, my code is as follows:
for (k in 1:iter){
pz<-rnorm(pnum,0,(var[1])^0.5)
iz<-rnorm(inum,0,(var[2])^0.5)
piz<-array(rnorm((pnuminum),0,(var[3])^0.5),dim=c(pnum,inum))
x<-array(0,dim=c(pnum,inum))
for(p in 1:pnum){
for(i in 1:inum){
x[p,i]<-0+pz[p]+iz[i]+piz[p,i]
}
}
file_path<-“C:/Users/Administrator/Desktop/t/data_”
write.csv(x,file=paste(file_path,k,".csv",sep = “”),row.names = F )
library(simFrame)
a<-read.csv(file=paste(file_path,k,".csv",sep = “”),header = TRUE )
n<-1
for(n in 1:1000){
set.seed(n)
nc<-NAControl(NArate=0.1) ##MCAR missing rate
x<-setNA(a,nc) ###insert into missing data from x
x[is.na(x)] <- 0
file_path<-“C:/Users/Administrator/Desktop/t/datamiss_”
write.csv(x,file=paste(file_path,k,".csv",sep = “”),row.names = F )
}
##MCMC
gtdata<-list(np=pnum,ni=inum,vc=var,x=x)
mcmc.model<-'data{
int<lower=1> np;
int<lower=1> ni;
real x[np,ni];
real vc[3];
}
parameters{
real<lower=0> sigmap;
real<lower=0> sigmai;
real<lower=0> sigmae;
real u;
vector[np] P;
vector[ni] I;
}
transformed parameters{
real sigmap2;
real sigmai2;
real sigmae2;
sigmap2=sigmap^2;
sigmai2=sigmai^2;
sigmae2=sigmae^2;
}
model{
//define variable
real mu;
real sdvarp;
real sdvari;
real sdvare;
//likelihood
for (p in 1:np){
P[p]~normal(0,sigmap);
}
for (i in 1:ni){
I[i]~normal(0,sigmai);
}
for (p in 1:np){
for (i in 1:ni){
mu=u+P[p]+I[i];
x[p,i]~normal(mu,sigmae);
}
}
//prior
u~normal(0,sqrt(1000));
sdvarp=1.5sqrt(vc[1]);
sdvari=1.5sqrt(vc[2]);
sdvare=1.5sqrt(vc[3]);
sigmap~uniform(0,sdvarp);
sigmai~uniform(0,sdvari);
sigmae~uniform(0,sdvare);
}’
require(rstan)
require(reshape)
library(ggplot2)
mcmc.fit<-stan(model_code=mcmc.model,data=gtdata,chains=3,iter=1000,warmup=500,algorithm=“NUTS”, pars=c(“sigmap2”,“sigmai2”,“sigmae2”))

But there is a mistake at the time of running, and I would like to ask for help.

And the operating results are as follows：

Error in FUN(X[[i]], …) : Stan does not support NA (in x) in data
failed to preprocess the data; sampling not done
Error in dimnames_sims[[3]] : subscript out of bounds

bgoodri · June 12, 2018, 12:31pm

You can either change the NA values to some magic number like Inf and then in the Stan code check whether the value is equal to the magic number. Or you can pass in the non-missing values along with a structure that tells you their indices in the original data. You should read the chapter on missing values in the Stan User Manual.

Patricia · June 13, 2018, 2:34am

Thank you! I will read manual again and try it.

torkar · June 13, 2018, 6:22am

I recommend also to read @paul.buerkner 's Handle Missing Values with BRMS: https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html

Even if you use “pure” stan/rstan, it will give you some ideas.

mike-lawrence · June 13, 2018, 2:23pm

I also have a video on missing data imputation in Stan/RStan here

Patricia · June 13, 2018, 4:30pm

Thank you！
But I can not open this link, and I don’t know why.

mike-lawrence · June 13, 2018, 4:31pm

It just goes straight to YouTube. Possibly you’re behind a firewall that blocks YouTube?

Patricia · June 14, 2018, 9:31am

You’re right.
I can’t turn over the wall at the moment and cannot watch You Tobe’s video.

Topic		Replies	Views
Missing data in binary outcome model Modeling	1	406	July 6, 2019
Truncated model for neg_binomial_2 Modeling	20	1568	June 9, 2017
Modeling Missing Data Modeling	2	555	April 29, 2020
Divergent transitions after warmup to be sloved Modeling rstan , techniques , fitting-issues , performance , math	9	1192	February 7, 2021
Ill-typed arguments to '~' statement. No distribution 'bernoulli_logit' was found with the correct signature Modeling rstan	7	759	April 8, 2023

How to treat missing data "NA"?

Related Topics