Log likelihood for hurdle model

Luca_Campanelli · May 23, 2017, 9:17pm

Dear Stan users,
how do I compute the log likelihood (to be used by the loo package) for a hurdle model like the one illustrated below? (example adapted from https://groups.google.com/forum/#!topic/stan-users/X1dBkLNel4s )

Using the normal_lpdf and bernoulli_logit_lpmf functions I think I could compute two different log likelihood for the two components of the model. Is there any way to compute a single log likelihood matrix that accounts for both components of the model (normal and logistic) at the same time?

Thank you for any suggestion you may have.
Luca

# data 
set.seed(173)
RawDat0 <- double(2500)
RawDatP <- rnorm(7500, 5, 2)
RawD <- c(RawDat0, RawDatP)
RD0 <- RawD == 0

hist(c(RawDat0, RawDatP))

DT2 <- list(is_zero = RD0, y = RawDatP, N = 10000, NP = 7500)


# models 
smodel <- " 
data {
  int<lower=0> N;
  int<lower=0> NP;
  int<lower=0,upper=1> is_zero[N];
  vector[NP] y;
}
parameters {
  real<lower=0> sigma;
  real logit_theta;
  real mu;
}
model {
  is_zero ~ bernoulli_logit(logit_theta);
  y ~ normal(mu, sigma);
}
generated quantities {
  #
} 
"

mod1 <- stan(model_code = smodel, data = DT2, iter = 1500, warmup = 500, 
             chains = 4, seed = 1234)

bgoodri · May 23, 2017, 11:11pm

Add the two matrices together

Luca_Campanelli · May 24, 2017, 12:29am

thank you, Ben.
Would that still work if different predictors or a different number of predictors are used to model ‘logit_theta’ and ‘mu’?

Luca_Campanelli · May 24, 2017, 1:15am

in addition, the number of data point is different. The ‘normal’ model is based on positive numbers only. The bernoulli model uses all data.

bgoodri · May 24, 2017, 1:47am

I suppose really there is just one matrix that has two terms in the case that the outcome is positive. The respective number of predictors for the two parts of the model don’t matter; only the likelihood.

avehtari · May 24, 2017, 7:28am

Maybe the better term would be “concatenate” instead of add, and in Stan language you would define log_lik as a vector (and you get out a matrix because of getting several draws of a vector)

vector[N+NP] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(logit_theta);
  }
  for (n in (N+1):(N+P) {
    log_lik[n] = normal_lpdf(mu, sigma);
  }

Luca_Campanelli · May 24, 2017, 4:39pm

Thank you very much Aki, that solves the problem of the different number of observations in the two equations of the model. For the record, below is the final code.
-Luca

generated quantities {
  vector[N+NP] log_lik;

  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(is_zero[n] | logit_theta);
  }
  for (n in (N+1):(N+NP) ) {
    log_lik[n] = normal_lpdf(y[(n-N)] | mu, sigma);
  }      
}

Topic		Replies	Views
Speeding up hurdle model with latent factors and covariates? Modeling specification , ecology	5	971	January 28, 2021
Comparing Hurdle Model with Non-Hurdle Model Modeling loo	5	472	November 21, 2023
Computing pointwise log-likelihood for a binomial normal hierarchical model Modeling loo	6	2038	June 11, 2017
Fit statistics — troubles calculating log likelihood? Modeling specification , loo	6	2054	October 16, 2017
Log-Likelihood with different parts Modeling	4	493	July 31, 2020

Log likelihood for hurdle model

Related topics