# Fitting a hurdle model

I am encountering a problem when fitting a hurdle model. I use simulated data and the sign of the coefficient are switched. Any hints about what I am doing wrong are appreciated.

I use the following R code to simulate the data:

``````set.seed(235)

# Truncated rpois
rtpois <- function(n, l) {
qpois(runif(n, dpois(0, l), 1), l)
}

n <- 100
beta <- c(0.5, 0.5, 0.7) # count
gamma <- c(0, 3) # zero

X <- cbind(rep(1,n), scale(runif(n,0,100)), scale(runif(n,0,10)))
Xb <- X %*% beta
Zg <- X[, 1:2] %*% gamma

theta <- 1/(1 + exp(-Zg))
lambda <- exp(Xb) # poisson means for modeling counts

# zeros
y <- rbinom(n, 1, theta) # generate 0/1 binomial data
which_counts <- which(y == 1)

# count
y[which_counts] <- rtpois(length(which_counts), lambda[which_counts])

dat_hurdle <- data.frame(cbind(y, X))
colnames(dat_hurdle) <- c("y", "x1", "x2", "x3")
``````

One simulated data set is available here: test-dat.txt (3.9 KB).

Estimating parameters with the `pscl` package in R gives expected parameter estimates:

``````m1 <- hurdle(y ~ x2 + x3 | x2, data = dat_hurdle,
dist = "poisson", zero.dist = "binomial")
coef(m1)
## count_(Intercept)          count_x2          count_x3  zero_(Intercept)           zero_x2
##        0.4319570         0.5509842         0.5477326        -0.2813459         2.7722614
``````

I then tried to implement a hurdle model with Stan (hurdle.stan (388 Bytes)).

``````data {
int<lower=0> N;     // number of observations
int<lower=0> y[N];  // response
matrix[N, 2] X;     // covariates
}

parameters {
vector beta;
vector gamma;
}

model {
for (n in 1:N) {
(y[n] == 0) ~ bernoulli_logit(gamma + gamma * X[n, 1]);
if (y[n] > 0) {
y[n] ~ poisson(exp(beta + beta * X[n, 1] + beta * X[n, 2])) T[1,];
}
}
}
``````

Fitting the model using `rstan` I get

``````library(rstan)
fit1 <- sampling(stan_m1, chains = 2, iter = 1.5e3,
data = list(y = y, X = X[, -1], N = n))
get_posterior_mean(fit1)
##            mean-chain:1 mean-chain:2 mean-all chains
## beta     0.3876424    0.3987130       0.3931777
## beta     0.5759098    0.5639956       0.5699527
## beta     0.5559836    0.5543759       0.5551798
## gamma    0.3063394    0.3024723       0.3044059
## gamma   -2.9257496   -2.9057399      -2.9157447
## lp__       14.6313376   14.9631837      14.7972606
``````

Estimates for beta to beta (the count part of the models) appear reasonable. For the zero part of the model parameter estimates also seem reasonable, but with switched signs (most obviously for gamma, i.e. the estimate fro gamma is negative when it should be positive). My guess is, that I misspecified the Stan model.

Depending on the parameterization, the hurdle parameters either indicate the probability of getting a zero or of not getting a zero. Accordingly, the signs may be switched if two packages use different parameterizations. You may also fit bayesian hurdle models with the brms R package, which uses Stan as the backend.

2 Likes

Hi Paul,

many thanks for your answer and pointing me towards the `brms`-package, which does everything I need.

All the best,
Johannes