Modeling a hierarchical poisson model

i want to fitting a model developed by Zainab Al-kaabawi , Yinghui Wei & Rana Moyeed(Bayesian hierarchical models for linear networks,Journal of Applied Statistics).
Write a model: counts y1,…,yn distributed according to a Poisson distribution with mean Lλ, in terms of the logarithm of the mean,θ
= logλ, Complete the model by assigning a
N(μ,σ) prior to the log mean parameter θ. Now, i have two variables:death denoting death cases due to accident in a province, and length denoting the length of roads in a province.
my Stan is as follows,

data {
  int<lower=0> N;
  vector[N] length;
  vector[N] death;
}

parameters {
  vector[N] lambda; 
}
model {
  lambda ~ normal(0,100);
   death ~ poisson(exp(lambda)*length);
}
"

the result show that death ~ poisson(exp(lambda)*length) is wrong.
The message is “Ill-typed arguments supplied to infix operator *”

how to resolve the issue? thank you in advance.

Make it a loop:

For (n in 1:N) death[n] ~ poisson(exp(lambda[i]*length[i]))

It is a pity that your advice does not work.
my codes are as follows,


death <- c(1001,  927, 2181, 2226,  784, 1805,
           1482,  819,  811, 3645, 2678, 2356,
           1594, 1474, 3009, 2652, 4287, 3270,
           7246, 2677,  604,  936, 2273, 2655,
           1908,  278,  917, 1307,  463,  447,1316)
length <- c(22363,  15230, 209209, 145469,
            216176, 131065, 109761, 168958,
            13005, 158000, 122918, 237967,
            112878, 210711, 291759, 277482,
            302178, 242420, 223081, 172391,
            41687, 186137, 405390, 209617,
            316091, 120852, 185607, 157243,
            87726,  38347, 223118)

library(rstan)

accident <- "
data {
  int<lower=0> N;
  vector[N] length;
  vector[N] death;
}

parameters {
  vector[N] lambda; 
}
model {
  vector[N] theta;
  lambda ~ normal(0,100);
  for (n in 1:N) {
  theta[n] = exp(lambda[n])*length[n];
  death[n] ~ poisson(theta[n]);
  }
}
"
write(accident,file = 'accident.stan')

data_for_Stan <- list(N = length(death),
                      length = length, 
                      death = death)

fit <- stan(file = "accident.stan", data = data_for_Stan)
summary(fit)$summary

can you run my codes? thank you in advance.

What do you mean by the code doesn’t work? Did you see another error message from compiling?

On a different note: I don’t know anything about your model or the reference you gave, but it is over-parameterized in my opinion. I don’t think there is enough data in this setup to learn about each lambda.

thank your reply.
The two-level hierarchical model for traffic accidents is given below,
death_i ~ poisson(length_i * theta_i)
log(theta_i) ~ N(alpha,tau^2)
i think two columns,death and length, is enough to estimate the model. Is right my understanding?

I think the problem is with the product operator… * is the operator for a matrix multiplication, and .* is the operator for element-wise multiplication. See this section of the manual.

Technically no, because the input for poisson should be reals.

On the modeling front, the data you supplied are sufficient only for a single lambda to be estimated.

You need more data per each subject to allow differentiated lambdas. You are not providing data or different priors for the differentiation.

reals is a pseudotype that includes a vector of reals or a scalar real, so I’m not sure that I understand your comment. See this section of the user manual.

On the modeling front, it was my understanding that there is no minimum sample size in Bayesian computation. There are perfectly valid Bayesian models where there are more parameters than observations.

The posterior distributions for each lambda may be different or very similar depending on the values of death and length, If every value of the length vector is identical, and the values of the death vector differ by 5 orders of magnitude, you will probably learn something interesting about the lambdas. Even if the lambda posteriors turned out to be identical, you’re still getting valid information. Since the title of the post mentions a hieracrchical model, I’m assuming that eventually, yanmin will make each lambda be a draw from some lambda distribution:

model {
   // insert priors for lambda_mean, sigma
   lambda ~ normal(lambda_mean, sigma);
   death ~ poisson(exp(lambda)*length);
}

In which case, if all the lambdas are very close together, that will show up as a small sigma parameter, and we might come to the conclusion that length of roads is the most important variable explaining the variance between province accident rates.

1 Like
data {
  int<lower=0> N;
  vector[N] length;
  array[N] int death;
}

parameters {
  vector[N] lambda; 
}
model {
  lambda ~ normal(0,100);
   death ~ poisson(exp(lambda) .* length);
}

As others mentioned, you need element-wise multiplication .* between two vectors here, and the signature for a Poisson requires ints: Unbounded Discrete Distributions so: array[N] int death;

I don’t know what your intentions of the model are, but this will address your syntax errors.

2 Likes

thank you very much. your codes run well.

I found other solution to my question. The code is as follows,

data {
  int<lower=0> N;
  vector[N] length;
  array[N] int death;
}

parameters {
  vector[N] lambda; 
}
model {
  lambda ~ normal(0,100);
  for (n in 1:N) {
   death[n] ~ poisson(exp(lambda)[n] * length[n]);
}
}

My mistake in my initial response. I was confused by the use of the word hierarchical. I am more familiar when the model being used here is called an over-dispersed Poisson, where the individual Poisson rate is distributed according to the log-normal distribution. In that case, yes, the model is well-defined and there is no issue with sample size whatsoever.

This is a blog post that demonstrates this model well: The right way to do predictive checks with observation-level random effects – Erik J. Ringen, PhD

1 Like

I’d recommend this change to avoid the danger of overflow or underflow. The _log says that it wants its argument on the log scale.

death ~ poission_log(lambda + log(length));

You can actually take in log_length as data and avoid the run-time log, though it’s going to be a minor cost when the argument is just data rather than a parameter.

This just does the same thing explicitly. If you want to do it your way, it’ll be much more efficient to do exp(lambda[n]), because otherwise you’re running exp(lambda) on a whole vector for N iterations, each time only selecting the n-th element.

I think the original post here was asking for a hierarchical model, as the citaiton clearly says “hierarchial models”, but the original code wasn’t hierarchical. To make this model hierarchical, you can do this (the offset and multiplier give it a non-centered parameterization, which is what you want for the size of data you’re indicating):

parameters {
  real mu;
  real<lower=0> sigma;
  vector<offset=mu, multiplier=sigma>[N] lambda;
}
model {
  mu ~ normal(0, 10);
  sigma ~ exponential(0.1);
  lambda ~ normal(mu, sigma);
  death ~ poisson_log(lambda + log(length));
}

This is also assuming that lambda values are going to be roughly unit scale. Here, the lambda values will be roughly exp(lambda) * length =approx= death, so lambda will be around log(death / length) for this data, or about -3 or -4 or even lower—but they won’t be 100.

I’d strongly recommend putting your Stan code in its own files. Then you can print with quotes and you can also more easily diagnose problems with line numbers.

2 Likes

i think hierarchical model is every group has their own parameter, and all parameters share a common distribution.

good idea!

i use your code and the message shows that death ~ poisson_log(lambda + log(length)) is wrong. The error is as follows,
Ill-typed arguments supplied to function ‘poisson_log’:
(vector, vector)
Available signatures:
(array int, row_vector) => real
The first argument must be array int but got vector
(array int, vector) => real
The first argument must be array int but got vector
(array int, array real) => real
The first argument must be array int but got vector
(array int, real) => real
The first argument must be array int but got vector
(int, row_vector) => real
The first argument must be int but got vector
(Additional signatures omitted)