Regularized Horseshoe prior posterior credible intervals

I am trying to apply regularized horseshoe prior to a simulation setting with a binary outcome.
I am using the R stan code for logistic regression from Piironen &Vehtari (2017) paper.
The posterior credible interval always includes zero whatever the simulation setting is. The first five "betas " are high so they should be realized. Later on, my plan is to apply to gene-expression data with at least 1000 genes (covariates), The paper mentions that the method works for p>n cases. I couldn’t understand how the “betas” are chosen to be significant. Kindly help.
My rcode follows:
################ simulated data#########################
beta<-c(rep(2,5),rep(0,95))

n<-20
p<-100

x1<-matrix(rnorm(n*p),nrow=n,ncol=p,byrow=F)
dim(x1)

cor=0.3; #%correlation

for(i in 2:5)
{
x1[,i]<-cor%%x1[,1]+(1-cor)%%x1[,i]
}

l=(x1%%beta+0.3rnorm(n,0, 1));

prob=exp(l)/(1 + exp(l));
prob[which(prob>=0.5)] =1;
prob[which(prob<0.5)]=0;
y1=prob;
y1
data<-cbind(y1,x1)
dim(data)
colnames(data)<-c(“y”,paste0(‘x’,1:((dim(data)[2]-1))))
head(data)
data[1:10,1:10]
dim(data)
#####################################################
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
Sys.setenv(LOCAL_CPPFLAGS = ‘-march=native’)
path<-path
sigma<-1; p0<-4; p<-p; n<-n;
scale_global<- (p0/(p-p0))*sigma/sqrt(n)

fit<-stan(file=paste(path,"/sta.stan",sep=""),
data=list(n=n,d=p,y=data[,1],x=data[,-1],
scale_icept=10,scale_global=scale_global,nu_global=1,nu_local=1,
warmup=1000,iter=5000,chains=4,
algorithm = “NUTS”,refresh=0,adapt_delta=0.999,seed=1234))
##Here, nu_global=1, and nu_local=1, corresponds to horseshoe prior for both global and local ##parameters.
######################################################################
############posterior mean and credible interval###########################
post_beta <-extract(fit)$beta
m_beta<-apply(post_beta,2,mean)
ci<-t(apply(post_beta,2,quantile,probs = c(0.05,0.5,0.95)))
ci
#############################################################

Hi AB25,

It seems you are simulating correlating covariates. In that case the posterior credible intervals can includes zero for non-zero betas. See, e.g.

1 Like

Thanks for your reply.
This is a data where there is no correlation.
sim<-function(p=p,n=n,s=s,m1=m1,s1=s1,m2=m2,s2=s2){
x<-matrix(rnorm(n*p),nrow=n,ncol=p) # data matrix
gama<-rbinom(p,1,s) # gama telling which genes are significant
d1<-rnorm(p,m1,s1) # two distributions for betas
d2<-rnorm(p,m2,s2)

beta<-d1*gama+(1-gama)*d2 # 20% significant genes

ind<-which(gama==1)

pr<-exp(x%%beta)/(1+exp(x%%beta))
y<-rbinom(n,1,pr)

data<-cbind(y,x)
nn<-ncol(x)
colnames(data) <- c(“Y”,paste0(‘X’,1:(dim(x)[2])))
return(list(ind=ind,data=data,beta=beta))
}
##########Simulation truth#################################################################################
p<-100
n=400
s<-0.2
m1<-100
s1<-1
m2<-0
s2<-0.3

set.seed(05042019)
data_m<-sim(p=p,n=n,s=s,m1=m1,s1=s1,m2=m2,s2=s2)
Howeve, it is the same case.

With n=20 << p=100, it is likely that the posterior is multimodal with narrow spike centered at zero. Here’s an example figure how your marginals probably look like
horseshoe_bimodal

mean and median are away from zero, but there is also 20% of draws larger than 0. This just due to data being weakly informative in n<<p case.

Note that you need to use lot of bins in your histogram to see how narrow the spike is (and kernel density estimates are likely to smooth a lot)

Can you guide me on a simulation setting for which this method will work?

What do you mean with “this method” and “will work”? The links I provided are pointing to simulations in which regularized horseshoe and posterior inference works.