I am trying to apply regularized horseshoe prior to a simulation setting with a binary outcome.
I am using the R stan code for logistic regression from Piironen &Vehtari (2017) paper.
The posterior credible interval always includes zero whatever the simulation setting is. The first five "betas " are high so they should be realized. Later on, my plan is to apply to gene-expression data with at least 1000 genes (covariates), The paper mentions that the method works for p>n cases. I couldn’t understand how the “betas” are chosen to be significant. Kindly help.
My rcode follows:
################ simulated data#########################
beta<-c(rep(2,5),rep(0,95))
n<-20
p<-100
x1<-matrix(rnorm(n*p),nrow=n,ncol=p,byrow=F)
dim(x1)
cor=0.3; #%correlation
for(i in 2:5)
{
x1[,i]<-cor%%x1[,1]+(1-cor)%%x1[,i]
}
l=(x1%%beta+0.3rnorm(n,0, 1));
prob=exp(l)/(1 + exp(l));
prob[which(prob>=0.5)] =1;
prob[which(prob<0.5)]=0;
y1=prob;
y1
data<-cbind(y1,x1)
dim(data)
colnames(data)<-c(“y”,paste0(‘x’,1:((dim(data)[2]-1))))
head(data)
data[1:10,1:10]
dim(data)
#####################################################
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
Sys.setenv(LOCAL_CPPFLAGS = ‘-march=native’)
path<-path
sigma<-1; p0<-4; p<-p; n<-n;
scale_global<- (p0/(p-p0))*sigma/sqrt(n)
fit<-stan(file=paste(path,"/sta.stan",sep=""),
data=list(n=n,d=p,y=data[,1],x=data[,-1],
scale_icept=10,scale_global=scale_global,nu_global=1,nu_local=1,
warmup=1000,iter=5000,chains=4,
algorithm = “NUTS”,refresh=0,adapt_delta=0.999,seed=1234))
##Here, nu_global=1, and nu_local=1, corresponds to horseshoe prior for both global and local ##parameters.
######################################################################
############posterior mean and credible interval###########################
post_beta <-extract(fit)$beta
m_beta<-apply(post_beta,2,mean)
ci<-t(apply(post_beta,2,quantile,probs = c(0.05,0.5,0.95)))
ci
#############################################################