Problem with modelling Logistic Regression in Rstan

Hi everyone,
I am trying to adapt a logistic regression model taking elastic net priors to deal with binary outcome (y). I tried different functions like bernoulli_logit but it didn’t work for me. so I tried using logistic and logistic_rng but it isn’t correct. Can someone help with fixing this issue?

first model:

data{
  int N_train; //number of observations training and validation set
  int p; //number of predictors
  real y_train[N_train]; //response vector
  matrix[N_train, p] X_train; //model matrix
  //test set
  int N_test; //number of observations test set
  matrix[N_test, p] X_test; //model matrix test set
}
parameters{
  real alpha; //intercept
  real<lower = 0> sigma2; //error variance
  vector[p] beta_raw; // regression parameters
  //hyperparameters prior
  vector<lower = 1>[p] tau;
  real<lower = 0> lambda1;
  real<lower = 0> lambda2;
}
transformed parameters{
  vector[p] beta;
  real<lower = 0> sigma; //error sd
  vector[N_train] linpred; //mean normal model
  for(j in 1:p){
    beta[j] = sqrt(((sigma2*(tau[j]-1))/(lambda2*tau[j]))) * beta_raw[j];
  }
  sigma = sqrt(sigma2);
  linpred = alpha + X_train*beta;
}
model{
 //prior regression coefficients: elastic net
	beta_raw ~ normal(0, 1); //implies beta ~ normal(0, sqrt(1/(lambda2/sigma2 * (tau[j]/(tau[j]-1)))))
	tau ~ gamma(0.5, (8*lambda2*sigma2)/(lambda1^2));
	lambda1 ~ cauchy(0, 1);
	lambda2 ~ cauchy(0, 1);
	
 //priors nuisance parameters: uniform on log(sigma^2) & alpha
	target += -2 * log(sigma); 
	
 //likelihood
	y_train ~ bernoulli_logit(linpred);
}
generated quantities{ //predict responses test set
	real y_test[N_test]; //predicted responses
	for(i in 1:N_test){
		y_test[i] = bernoulli_logit_rng(alpha + X_test[i,] * beta);
	}
}

In the second time; I changed the model, generated quantities as follows:

model{
 //prior regression coefficients: elastic net
	beta_raw ~ normal(0, 1); //implies beta ~ normal(0, sqrt(1/(lambda2/sigma2 * (tau[j]/(tau[j]-1)))))
	tau ~ gamma(0.5, (8*lambda2*sigma2)/(lambda1^2));
	lambda1 ~ cauchy(0, 1);
	lambda2 ~ cauchy(0, 1);
	
 //priors nuisance parameters: uniform on log(sigma^2) & alpha
	target += -2 * log(sigma); 
	
 //likelihood
	y_train ~ logistic(linpred,sigma);
}
generated quantities{ //predict responses test set
	real y_test[N_test]; //predicted responses
	for(i in 1:N_test){
		y_test[i] = logistic_rng((alpha + X_test[i,] * beta), sigma);
	}
}

Edited by @Max_Mantei: code formatting and highlighting via ```stan ```

Moin Fathemah! Welcome to the Stan forum! :)

I didn’t dig into the details, but for logistic regression bernoulli_logit and bernoulli_logit_rng should be correct (using logistic won’t be correct). Could you please specify what exactly did not work for you?

Cheers,
Max

Moin Max,
using Bernoulli_logit throws the following syntax error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:

real[ ] ~ bernoulli_logit(vector)

Available argument signatures for bernoulli_logit:

int ~ bernoulli_logit(real)
int ~ bernoulli_logit(real[ ])
int ~ bernoulli_logit(vector)
int ~ bernoulli_logit(row_vector)
int[ ] ~ bernoulli_logit(real)
int[ ] ~ bernoulli_logit(real[ ])
int[ ] ~ bernoulli_logit(vector)
int[ ] ~ bernoulli_logit(row_vector)

Real return type required for probability function.
error in ‘model_bayes_en_fit’ at line 41, column 39

39: 	
40:  //likelihood
41: 	y_train ~ bernoulli_logit(linpred);
                                          ^
42: }

Ahh! Okay, yes.

In your data block declare the outcome y_train as an array of ints like so:

data{
  ...
  int y_train[N_train];
  ...
}

Similarly in the generated quantities block declare

generated quantities{
  int y_test[N_test];
  ...
}

Cheers,
Max

Thanks a lot.

Cheers,
Fatemah

1 Like