Specifying a rather complicated hierarchical model in Stan

I don’t have much experience with Stan, so this question might be trivial instead of rather complicated, but I unfortunately couldn’t figure out what to do.

The model I want to specify has the following ingredients:

  • Several years (y = 1, 2, \ldots, Y) of daily (d = 1, 2, \ldots, 365) data. (That is, for simplicity, we assume each year has 365 days.)
  • The outcome at day t (t = 1, 2, \ldots, 365\cdot Y) is Poisson distributed. Let’s call y\left[t\right] the year and d\left[t\right] the day of t (i.e., t = 365 \cdot y\left[t\right] + d\left[t\right]-365).
  • The parameter of the Poisson distribution at day t is the following: \alpha + \beta t + \sum_{y=1}^Y I_{y\left[t\right]} \cdot \gamma_{y\left[t\right]} \cdot f_{\mu_{y\left[t\right]}, \sigma_{y\left[t\right]}^2}\left(t\right), where f_{\mu, \sigma^2} is the normal density with expected value \mu and variance \sigma^2.
  • Parameter I_y has an iid Bernoulli distribution with parameter (or should I call it hyperparameter?) p in each year, and \gamma_y is iid uniformly distributed from \gamma_{min} to \gamma_{max} (also (hyper)parameters) in each year.

In other words: we have a linear trend throughout the entire period, and in each year, there is possibly – but not necessarily – an additional peak (with the constant probability p, and independent in each year), the shape of which is the normal density centered at \mu which is uniform from 1 to 365 in that year (there are no parameters governing \mu), and a constant \sigma^2 that is a parameter to be estimated. (So f here doesn’t mean a normal random variable, but rather the density.) With \alpha>0 and \beta>0 the positivity of the Poisson’s parameter is ensured.

One could generate such data with the following code:

library(ggplot2)

simdat <- function(alpha = 2, beta = 0.1, gammamin = 10000, gammamax = 50000,
                   p = 0.4, sigm = 20) {
  SimData <- expand.grid(d = 1:365, y = 1:20)
  SimData$t <- 365*SimData$y + SimData$d - 365
  gamma <- runif(20, gammamin, gammamax)
  i <- rbinom (20, 1, p)
  mu <- runif(20, 1, 365) + (1:20)*365 - 365
  SimData$lambda <- alpha + beta*SimData$t +
    rowSums(sapply(1:20, function(y) i[y]*gamma[y]*dnorm(1:(20*365), mu[y], sigm)))
  SimData$x <- rpois(20*365, SimData$lambda)
  SimData
}

SimData <- simdat()

ggplot(SimData, aes(x = t, y = x)) + geom_line()

ggplot(SimData, aes(x = d, y = x)) + geom_line() + facet_wrap(~y)

The parameters are: \alpha, \beta, p, \gamma_{min}, \gamma_{max} and \sigma^2. These should be estimated from the empirical data. (We could assume any meaningful default prior for them.)

Is it possible with Stan…?

Thank you in advance!

I don’t see any reason why you would be able to specify this kind of model in Stan, you just have to specify what is the hierarchy of the levels involved, it seems to me that it would be years Y and days d, is that right? It is not clear to me what you would use t for, in a hierarchical model; in that case it would just be a flat on total number of days regardless of year and the time dependency on t of the functions inside the sum are not clear (e.g. if you are trying to estimate several sets of D=365 it may become messy) – but that may be my fault in understanding what some of those R functions are doing.

Your parameters I, \gamma, \mu, and \sigma have the year index and you mention hyper parameters, so you want to have hyper-parameters (i.e. group-level or “timeless” parameters) associated to these (individual or within-year-level) parameters.

Have you tried (or want) to set up something of a hierarchical model like this (for each parameter)?

mu_gamma ~ normal(0,1); // taking in to consideration your prior information or assumptions about what gamma should be
sd_gama ~ cauchy(0,5); // a half-cauchy, or whatever you think is suitable for a variable constrained to be positive using <lower=0>

gamma_1 ~ normal(mu_gamma);
gamma_2 ~ normal(mu_gamma); // for two years, or an array for Y years

because it also seems like you want to set up something like a mixture model depending on the value that the Bernoulli variable I takes, with parameter p estimated jointly.
Identifiability of a mixture hierarchical model of this kind may be very tricky (that also depends on how each of the functions depend on t), the summation term being a product may yield broadly equivalent solutions that are not identifiable.

If you can specify what are the hierarchies you are trying to set up (e.g. “gamma is the same within a year but changes between years” it will be easier to specify the Stan code for it.)

Yes, I think you can code that model in Stan. My only question is what the f_{\mu, \sigma} is doing, because it’s a density function and I’ve never seen a density function used as a predictor in a regression. Is this supposed to be a random effect that has a \textrm{normal}(\mu, \sigma) distribution? It doesn’t look like it from the simulation code with dnorm. Why is there a density as part of the Poisson rate equation?

But you have to marginalize out the variable I_y. The user’s guide chapter on latent discrete parameters has a bunch of examples.

Numerically, it can be more stable to model the rate on the log scale, in which case the sums get replaced with log_sum_exp.

I would also recommend against making a parameter like \gamma_y bounded in (\gamma_{\textrm{min}}, \gamma_{\textrm{max}}) unless it’s physically constrained to that interval. The problem is that the boundaries have unexpected effects on pushing probability mass around and don’t work like one might expect from intuitions based in maximum likelihood. If it’s not physical constrained, I’d suggest using an unconstrained prior with a weakly informative prior with most of the probability mass in that interval.

First of all, thank you very much!

I don’t see any reason why you would be able to specify this kind of model in Stan, you just have to specify what is the hierarchy of the levels involved

Perhaps I used the word “hierarchical” incorrectly: what I meant by this is that parameters such as I_y are themselves random variables from a distribution governed by further (hyper)parameters.

it seems to me that it would be years Y and days d, is that right?

I am not sure, and for the reason you mentioned: there is a long-term trend component (meaning that even the same day behaves differently in different years).

It is not clear to me what you would use t for,

That’s the long-term trend: t=1 for the first day of the first year, =365 for the last day of the first year, t=366 for the first day of the second year, t=730 for the last day of the second year and so on.

Your parameters I, \gamma, \mu , and \sigma have the year index and you mention hyper parameters, so you want to have hyper-parameters (i.e. group-level or “timeless” parameters) associated to these (individual or within-year-level) parameters.

The index here means that for each year, we randomly generate an I, a \gamma, etc., iid from the same distribution (iid Bernoulli, iid uniform etc.). The parameters to be estimated are the parameters of these distributions (p, \gamma_{min} and \gamma_{max} etc.).

Identifiability of a mixture hierarchical model of this kind may be very tricky

Yes, that’s my fear here, I am not even sure if it is solvable… I am particularly worried about I (that is, the estimation of p). As an illustration: OLS estimation – using a simple averaging to solve stochastic minimization – can’t recover this parameter:

msefit <- function(par) mean((SimData$x-do.call(simdat, as.list(par))$x)^2)

cl <- parallel::makeCluster(parallel::detectCores()-1)
parallel::clusterExport(cl, c("msefit", "simdat", "SimData"))

res <- nloptr::nloptr(
  c(alpha = 1, beta = 1, gammamin = 5000, gammamax = 100000,
    p = 0.5, sigm = 10),
  function(par) mean(parallel::parSapply(cl, 1:1000, function(i) msefit(par))),
  lb = c(0,   0,     0,  0,      0,  0),
  ub = c(10, 10, 100000, 100000, 1, 20),
  opts = list(algorithm = "NLOPT_GN_CRS2_LM", print_level = 3, maxeval = 1000))

The reason is logical: while it might be surprising at first glance, it is really better to estimate p=1 than its true value! Why? Even if we set p=0.4, i.e., its true value, the error will be large, because the 40% of the years in which we simulate a peak will be different from the 40% in which there is actually a peak (they’ll overlap only randomly) and both when we simulate a peak, while in reality there is no peak, and the other way around will result in a huge error. Compared to that, it is really better to assume a peak in every year (p=1) and estimate its height to be half of the true height, thus in every year we will have an error, but only a small one.

And I’m afraid that it could be a problem for any estimation method… but I might be wrong of course.

Thank you very much!

My only question is what the f_{\mu, \sigma} is doing, because it’s a density function […] Why is there a density as part of the Poisson rate equation?

Yes, you’re absolutely correct, that is really just the density and has nothing to do otherwise with any normal distribution. The reason why it is part of the equation is simply that the random peaks that appear in the curve have a bell-curve like shape.

But you have to marginalize out the variable I_y. The user’s guide chapter on latent discrete parameters has a bunch of examples.

Thanks, I’ll check it! At first glance it seems promising.

Numerically, it can be more stable to model the rate on the log scale, in which case the sums get replaced with log_sum_exp.

That’s entirely possible.

I would also recommend against making a parameter like \gamma bounded in (\gamma_{min}, \gamma_{max}) unless it’s physically constrained to that interval. […] If it’s not physical constrained, I’d suggest using an unconstrained prior with a weakly informative prior with most of the probability mass in that interval.

Well, \gamma governs the height of the peak so we definitely have a lower bound that is absolutely physical, namely, zero. The upper is also somewhat physically constrained, because realistically peaks can’t be arbitrarily high (even with small probability), but this is a less hard constraint.