# Log likelihood for hurdle model

Dear Stan users,
how do I compute the log likelihood (to be used by the loo package) for a hurdle model like the one illustrated below? (example adapted from https://groups.google.com/forum/#!topic/stan-users/X1dBkLNel4s )

Using the normal_lpdf and bernoulli_logit_lpmf functions I think I could compute two different log likelihood for the two components of the model. Is there any way to compute a single log likelihood matrix that accounts for both components of the model (normal and logistic) at the same time?

Thank you for any suggestion you may have.
Luca

``````# data
set.seed(173)
RawDat0 <- double(2500)
RawDatP <- rnorm(7500, 5, 2)
RawD <- c(RawDat0, RawDatP)
RD0 <- RawD == 0

hist(c(RawDat0, RawDatP))

DT2 <- list(is_zero = RD0, y = RawDatP, N = 10000, NP = 7500)

# models
smodel <- "
data {
int<lower=0> N;
int<lower=0> NP;
int<lower=0,upper=1> is_zero[N];
vector[NP] y;
}
parameters {
real<lower=0> sigma;
real logit_theta;
real mu;
}
model {
is_zero ~ bernoulli_logit(logit_theta);
y ~ normal(mu, sigma);
}
generated quantities {
#
}
"

mod1 <- stan(model_code = smodel, data = DT2, iter = 1500, warmup = 500,
chains = 4, seed = 1234)``````

thank you, Ben.
Would that still work if different predictors or a different number of predictors are used to model ‘logit_theta’ and ‘mu’?

in addition, the number of data point is different. The ‘normal’ model is based on positive numbers only. The bernoulli model uses all data.

I suppose really there is just one matrix that has two terms in the case that the outcome is positive. The respective number of predictors for the two parts of the model don’t matter; only the likelihood.

Maybe the better term would be “concatenate” instead of add, and in Stan language you would define log_lik as a vector (and you get out a matrix because of getting several draws of a vector)

``````vector[N+NP] log_lik;
for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf(logit_theta);
}
for (n in (N+1):(N+P) {
log_lik[n] = normal_lpdf(mu, sigma);
}``````

Thank you very much Aki, that solves the problem of the different number of observations in the two equations of the model. For the record, below is the final code.
-Luca

``````generated quantities {
vector[N+NP] log_lik;

for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf(is_zero[n] | logit_theta);
}
for (n in (N+1):(N+NP) ) {
log_lik[n] = normal_lpdf(y[(n-N)] | mu, sigma);
}
}``````