Probit model can't recover the true value

Lehostal · August 29, 2018, 2:01am

I’m using probit model to do regression.
My model are as following,
y_{i} = \begin{equation} \left\{ \begin{array}{**lr**} 1, z_{i} > 0 & \\ 0, otherwise & \end{array} \right. \end{equation}
z_{i} = x_{i}*beta + \epsilon_{i} where \epsilon_{i} \sim N(0,1)
In order to make selection on \beta 's element, I place bayesian lasso prior on \beta, which is as following,
\pi(\beta \mid \lambda) = \prod_{j=1}^{p} \frac{\lambda}{2}e^{-\lambda \mid \beta_{j} \mid}
\pi(\lambda^{2}) \sim gamma(a,b).

My code is as following,

data{
  int<lower=0> N; // number of obs
  int<lower=0> p; // number of predictors
  int<lower=0,upper=1> y[N];
  matrix[N, p] X; // predictor variables
  real<lower=0> a;
  real<lower=0> b;
}
parameters{
  vector[p] beta;
  real<lower=0> lambda;
}
transformed parameters{
  vector[N] mu;
  mu = X*beta;
}
model{
  for(n in 1:N){
    y[n] ~ bernoulli(Phi(mu[n]));
  }
  
  for(j in 1:p){
    beta[j] ~ double_exponential( 0 , inv(sqrt(lambda)));
  }
  
  lambda ~ gamma(a,b);
  
}

My problem is that I can recover the true beta.
Here is how I generate data and my result,

N <- 400
p <- 4

rho = .35
Sig_beta <- matrix(rho, p, p)
Sig_beta <- Sig_beta^(toeplitz(1:p)-1)
diag(Sig_beta) <- 1
X <- array(0,dim=c(N,p))
X <- mvrnorm(N, rep(0,p), Sig_beta)

beta <- numeric(p)
index <- c(1,0,1,1)
beta[which(index==1)] <-  rnorm(length(which(index==1)),3,10)

z <- numeric(N)
for(n in 1:N){
  z <- rnorm(N,beta*X[n,],1)
}

y <- ifelse(z>0,1,0)



dat <- list("N" = N , "p" = p, "X" = X , "y" = y , "a" = 1 , "b" = 1)

Rplot01.pdf (17.1 KB)

and

beta
[1] -1.978069 0.000000 -5.755343 -1.668030

It is clear that all components of beta are concentrates near 0.

It’s very strange strange and I don’t know where is wrong.

Guido_Biele · August 29, 2018, 8:20am

The main issue is that your Stan model is not consistent with you generative model.

But there are additional problems:

for(n in 1:N){
  z <- rnorm(N,beta*X[n,],1)
}

does not work, because * does matrix multiplication in Stan, but not in R, where you need to use %*%. You also forgot the index for z.
I assume, you wanted to do this:

for(n in 1:N){
  z[n] <- rnorm(1,beta %*% X[n,],1)
}

or shorter: z <- X %*% beta.

But the main point is that this section from your R code

y <- ifelse(z>0,1,0)

is not consistent with this sections of your Stan model:

y[n] ~ bernoulli(Phi(mu[n]));

rather use the following in your R code to generate y values:

y <- rbinom(1,1,pnorm(z))

This is not the model you described at the top of your post, but it is a probit model. If I am not mistaken, the model you describe at the top of your post will be hard to fit in Stan because the function from z to y is not smooth [I am not claiming to know any more details ;-)].

As an aside: Here you find recommendations for sparsity inducing priors in Stan : https://arxiv.org/abs/1707.01694

PS: It would help to use the same variable names in R and Stan.

Lehostal · August 29, 2018, 1:03pm

Thanks for your answer and point out my faults in my R code.
Porbit model is hard to fit in stan, but in stan-reference, they provide a useful way to achieve this.
y[n] ~ bernoulli(Phi(alpha + beta * x[n]));
So I think my code of this part is not wrong.

Guido_Biele · August 29, 2018, 2:13pm

I see that I had misunderstood your notation for the Probit model.

Topic		Replies	Views
The parameters of model can not be recovered Modeling fitting-issues	2	487	February 10, 2023
CFA model cannot recover simulated values Modeling fitting-issues	5	803	June 23, 2022
Unable to recover latent covariance in multivariate multinomial model Modeling	5	747	July 12, 2018
Extracting predicted data from multivariate probit model Modeling	4	610	September 25, 2020
Multivariate probit with missing outcomes Modeling rstan , specification	9	660	September 16, 2022

Probit model can't recover the true value

Related Topics