Using the normal_lpdf and bernoulli_logit_lpmf functions I think I could compute two different log likelihood for the two components of the model. Is there any way to compute a single log likelihood matrix that accounts for both components of the model (normal and logistic) at the same time?
Thank you for any suggestion you may have.
Luca
# data
set.seed(173)
RawDat0 <- double(2500)
RawDatP <- rnorm(7500, 5, 2)
RawD <- c(RawDat0, RawDatP)
RD0 <- RawD == 0
hist(c(RawDat0, RawDatP))
DT2 <- list(is_zero = RD0, y = RawDatP, N = 10000, NP = 7500)
# models
smodel <- "
data {
int<lower=0> N;
int<lower=0> NP;
int<lower=0,upper=1> is_zero[N];
vector[NP] y;
}
parameters {
real<lower=0> sigma;
real logit_theta;
real mu;
}
model {
is_zero ~ bernoulli_logit(logit_theta);
y ~ normal(mu, sigma);
}
generated quantities {
#
}
"
mod1 <- stan(model_code = smodel, data = DT2, iter = 1500, warmup = 500,
chains = 4, seed = 1234)
I suppose really there is just one matrix that has two terms in the case that the outcome is positive. The respective number of predictors for the two parts of the model don’t matter; only the likelihood.
Maybe the better term would be “concatenate” instead of add, and in Stan language you would define log_lik as a vector (and you get out a matrix because of getting several draws of a vector)
vector[N+NP] log_lik;
for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf(logit_theta);
}
for (n in (N+1):(N+P) {
log_lik[n] = normal_lpdf(mu, sigma);
}
Thank you very much Aki, that solves the problem of the different number of observations in the two equations of the model. For the record, below is the final code.
-Luca
generated quantities {
vector[N+NP] log_lik;
for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf(is_zero[n] | logit_theta);
}
for (n in (N+1):(N+NP) ) {
log_lik[n] = normal_lpdf(y[(n-N)] | mu, sigma);
}
}