Regularized Horseshoe prior posterior credible intervals

AB25 · May 22, 2019, 4:26pm

I am trying to apply regularized horseshoe prior to a simulation setting with a binary outcome.
I am using the R stan code for logistic regression from Piironen &Vehtari (2017) paper.
The posterior credible interval always includes zero whatever the simulation setting is. The first five "betas " are high so they should be realized. Later on, my plan is to apply to gene-expression data with at least 1000 genes (covariates), The paper mentions that the method works for p>n cases. I couldn’t understand how the “betas” are chosen to be significant. Kindly help.
My rcode follows:
################ simulated data#########################
beta<-c(rep(2,5),rep(0,95))

n<-20
p<-100

x1<-matrix(rnorm(n*p),nrow=n,ncol=p,byrow=F)
dim(x1)

cor=0.3; #%correlation

for(i in 2:5)
{
x1[,i]<-cor%%x1[,1]+(1-cor)%%x1[,i]
}

l=(x1%%beta+0.3rnorm(n,0, 1));

prob=exp(l)/(1 + exp(l));
prob[which(prob>=0.5)] =1;
prob[which(prob<0.5)]=0;
y1=prob;
y1
data<-cbind(y1,x1)
dim(data)
colnames(data)<-c(“y”,paste0(‘x’,1:((dim(data)[2]-1))))
head(data)
data[1:10,1:10]
dim(data)
#####################################################
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
Sys.setenv(LOCAL_CPPFLAGS = ‘-march=native’)
path<-path
sigma<-1; p0<-4; p<-p; n<-n;
scale_global<- (p0/(p-p0))*sigma/sqrt(n)

fit<-stan(file=paste(path,"/sta.stan",sep=""),
data=list(n=n,d=p,y=data[,1],x=data[,-1],
scale_icept=10,scale_global=scale_global,nu_global=1,nu_local=1,
warmup=1000,iter=5000,chains=4,
algorithm = “NUTS”,refresh=0,adapt_delta=0.999,seed=1234))
##Here, nu_global=1, and nu_local=1, corresponds to horseshoe prior for both global and local ##parameters.
######################################################################
############posterior mean and credible interval###########################
post_beta <-extract(fit)$beta
m_beta<-apply(post_beta,2,mean)
ci<-t(apply(post_beta,2,quantile,probs = c(0.05,0.5,0.95)))
ci
#############################################################

avehtari · May 22, 2019, 4:38pm

Hi AB25,

It seems you are simulating correlating covariates. In that case the posterior credible intervals can includes zero for non-zero betas. See, e.g.

Collinearity case study https://avehtari.github.io/modelselection/collinear.html
Collinearity and horseshoe (end of section 5) https://avehtari.github.io/modelselection/diabetes.html

AB25 · May 22, 2019, 4:43pm

Thanks for your reply.
This is a data where there is no correlation.
sim<-function(p=p,n=n,s=s,m1=m1,s1=s1,m2=m2,s2=s2){
x<-matrix(rnorm(n*p),nrow=n,ncol=p) # data matrix
gama<-rbinom(p,1,s) # gama telling which genes are significant
d1<-rnorm(p,m1,s1) # two distributions for betas
d2<-rnorm(p,m2,s2)

beta<-d1*gama+(1-gama)*d2 # 20% significant genes

ind<-which(gama==1)

pr<-exp(x%%beta)/(1+exp(x%%beta))
y<-rbinom(n,1,pr)

data<-cbind(y,x)
nn<-ncol(x)
colnames(data) <- c(“Y”,paste0(‘X’,1:(dim(x)[2])))
return(list(ind=ind,data=data,beta=beta))
}
##########Simulation truth#################################################################################
p<-100
n=400
s<-0.2
m1<-100
s1<-1
m2<-0
s2<-0.3

set.seed(05042019)
data_m<-sim(p=p,n=n,s=s,m1=m1,s1=s1,m2=m2,s2=s2)
Howeve, it is the same case.

avehtari · May 22, 2019, 4:59pm

With n=20 << p=100, it is likely that the posterior is multimodal with narrow spike centered at zero. Here’s an example figure how your marginals probably look like
horseshoe_bimodal

mean and median are away from zero, but there is also 20% of draws larger than 0. This just due to data being weakly informative in n<<p case.

Note that you need to use lot of bins in your histogram to see how narrow the spike is (and kernel density estimates are likely to smooth a lot)

AB25 · May 22, 2019, 8:05pm

Can you guide me on a simulation setting for which this method will work?

avehtari · May 23, 2019, 10:35am

What do you mean with “this method” and “will work”? The links I provided are pointing to simulations in which regularized horseshoe and posterior inference works.

Topic		Replies	Views
Horseshoe prior with diabetes dataset Modeling fitting-issues	4	3279	May 12, 2017
Can someone double check to make sure my Stan code is correct for an ordered logistic regression with a horseshoe prior? Modeling rstan , specification	2	382	December 1, 2023
Horseshoe prior for logistic regression in brms brms	0	1014	July 12, 2018
How to extent (regularized) horseshoe prior to multiple outcome setting? Modeling regularization	28	3512	May 8, 2020
Horseshoe prior on subset of predictors brms bioinformatics	14	4045	August 7, 2019

Regularized Horseshoe prior posterior credible intervals

Related topics