# Probit model can't recover the true value

I’m using probit model to do regression.
My model are as following,
y_{i} = \begin{equation} \left\{ \begin{array}{**lr**} 1, z_{i} > 0 & \\ 0, otherwise & \end{array} \right. \end{equation}
z_{i} = x_{i}*beta + \epsilon_{i} where \epsilon_{i} \sim N(0,1)
In order to make selection on \beta 's element, I place bayesian lasso prior on \beta, which is as following,
\pi(\beta \mid \lambda) = \prod_{j=1}^{p} \frac{\lambda}{2}e^{-\lambda \mid \beta_{j} \mid}
\pi(\lambda^{2}) \sim gamma(a,b).

My code is as following,

data{
int<lower=0> N; // number of obs
int<lower=0> p; // number of predictors
int<lower=0,upper=1> y[N];
matrix[N, p] X; // predictor variables
real<lower=0> a;
real<lower=0> b;
}
parameters{
vector[p] beta;
real<lower=0> lambda;
}
transformed parameters{
vector[N] mu;
mu = X*beta;
}
model{
for(n in 1:N){
y[n] ~ bernoulli(Phi(mu[n]));
}

for(j in 1:p){
beta[j] ~ double_exponential( 0 , inv(sqrt(lambda)));
}

lambda ~ gamma(a,b);

}


My problem is that I can recover the true beta.
Here is how I generate data and my result,

N <- 400
p <- 4

rho = .35
Sig_beta <- matrix(rho, p, p)
Sig_beta <- Sig_beta^(toeplitz(1:p)-1)
diag(Sig_beta) <- 1
X <- array(0,dim=c(N,p))
X <- mvrnorm(N, rep(0,p), Sig_beta)

beta <- numeric(p)
index <- c(1,0,1,1)
beta[which(index==1)] <-  rnorm(length(which(index==1)),3,10)

z <- numeric(N)
for(n in 1:N){
z <- rnorm(N,beta*X[n,],1)
}

y <- ifelse(z>0,1,0)

dat <- list("N" = N , "p" = p, "X" = X , "y" = y , "a" = 1 , "b" = 1)


Rplot01.pdf (17.1 KB)

and

beta
 -1.978069 0.000000 -5.755343 -1.668030

It is clear that all components of beta are concentrates near 0.

It’s very strange strange and I don’t know where is wrong.

The main issue is that your Stan model is not consistent with you generative model.

for(n in 1:N){
z <- rnorm(N,beta*X[n,],1)
}


does not work, because * does matrix multiplication in Stan, but not in R, where you need to use %*%. You also forgot the index for z.
I assume, you wanted to do this:

for(n in 1:N){
z[n] <- rnorm(1,beta %*% X[n,],1)
}


or shorter: z <- X %*% beta.

But the main point is that this section from your R code

y <- ifelse(z>0,1,0)


is not consistent with this sections of your Stan model:

y[n] ~ bernoulli(Phi(mu[n]));


rather use the following in your R code to generate y values:

y <- rbinom(1,1,pnorm(z))


This is not the model you described at the top of your post, but it is a probit model. If I am not mistaken, the model you describe at the top of your post will be hard to fit in Stan because the function from z to y is not smooth [I am not claiming to know any more details ;-)].

As an aside: Here you find recommendations for sparsity inducing priors in Stan : https://arxiv.org/abs/1707.01694

PS: It would help to use the same variable names in R and Stan.

Thanks for your answer and point out my faults in my R code.
Porbit model is hard to fit in stan, but in stan-reference, they provide a useful way to achieve this.
y[n] ~ bernoulli(Phi(alpha + beta * x[n]));
So I think my code of this part is not wrong.

I see that I had misunderstood your notation for the Probit model.