Yes but it doesn’t need to be fully hierarchical on every variable! Given the huge differences in A between countries I think it would be borrowing too much information from the few countries with really large case counts.
You can check the picture I uploaded before about A. I still don’t get how those posteriors are generated but you get very different HDIs.
In the other hand, it would be reasonable to put a dependence structure between c and A to capture the reaction from governments to flatten the curve. I don’t know how to do this.
Martin  how did you come up with the substitution/ transformation ? I had pen and paper out to try to work out an analogous transformation for the following model but i got hopelessly stuck:
Identifiability issues occur when (t  d) = 0
(I believe)
Edit: My code for the model is in this thread + post: Hierarchical Gompertz model
Edit 2: note t is data = time in days
@jroon I think in your case of the threeparameter Gompertz function
there are two potential nonidentifiability “problem areas”
 As you mentioned, the case t = d, which corresponds to f(d) = A e^{1}, and
 the asymptotic limit for large values of t, \lim_{t\to\infty} f(t) = A.
Both areas leave k and d unidentifiable.
Regards 2  sorry I should have said  t is data (time). In my use its number of days the Covid outbreakis happening so its nowhere close to infinity thankfully!
No magic trick I fear  I just thought hard about what can be learned from the data and what cannot, did quite a bit of math and then tested a bunch of things until I found the one that worked. Also simulating data and seeing how changes in the parameters change the shape of the curve helped me get some insight.
If I get it right, t is known while A,k,d are paraemters, right? So treating Y = f(A, k, d, t) One way to go about this might be to take take t_{min}, t_{max} as the min/max t you observe and try to use y_{min} = f(A, k, d, t_{min}) and y_{max} = f(A, k, d, t_{max}) as new parameters  if you can solve for the other parameters given y_{min}, y_{max}  but I am really just guessing here. The value at midpoint of the observed t range might also be a good parameter. The point is that such values are by construction well constrained by data (but might be impractical because it is hard to derive the parameters you need from them).
EDIT: To be a bit more specific, since Gompertz is also a sigmoid curve. Here are some specific ideas I used with the logistic sigmoid:
 If I observe only the start of the curve, the upper plateau (A) is not determined.
 So instead we use the value at midpoint of observed data as a parameter. When I observe the upper plateau, this would roughly correspond to A/2, but it is constrained by data even when I only see the lower plateau.
 If the inflection point of the curve is far from the observed data range, I only see the lower or the upper plateau, i.e. almost constant function. In other words if the inflection point is far from observed data, it has little influence on the actual shape of the curve. In this case the “slope” of the curve is also not informed by the data. To overcome this, we used the location of the inflection point on the xaxis as another parameter AND we put an informative prior on it to constrain it “close” to the range of observed data. This makes the slope somewhat identified as well.
 This changes the interpretation of the model fit! If the posterior for the inflection point has notable mass outside the observed data range, we have to be aware that this part of the posterior is influenced almost exclusively by the prior and the actual inflection point can be much further from the observed data range than what the posterior might suggest. In this case the fitted slope is also just a consequence of the prior.
For the logistic sigmoid those resulted in reasonably neat formulae, not sure if that would be the case for Gompertz, so other tricks might be necessary.
Great answer thanks  lots to think on!!
Ok just to check my thinking here before I model this:
Given my gompertz function, where t = time in days:
I said let:
Applied some algebra and got:
where y_max, t and t_max are all taken from data meaning I now have a 2 parameter model.
Am I in thinking along the right lines here? Presumably then after I would fit this I then calculate A in the generated data section using the middle equation here ?
@mevers  I hope you don’t mind me partinvading your thread!
Edit: test for single country fits  i.e. no hierarchy look good so far! Will update as I progress!
Edit2: Algebra incorrect deriving last equation  fixed now.
Hi,
it looks roughly good (didn’t check the algebra), but y_{max} will be another parameter, not taken from data  if you take y_{max} from data, you are ignoring the observation noise for this value.
Does that make sense?
Hi @mevers  I dont’ mind that, but i’m basically done with it now because nonmodel problems came up.
@martinmodrak Yes I discovered that ymax needed to be a parameter and I did that using a missing value approach with success! Thanks to you both for your help! Good luck @mevers with your project!
Hello,
I’ve been following this thread for a while because I’m trying to build the same model for spanish regions but for cumulative deaths. I’m a beginner in bayesian stuff but work myself pretty well in R language.
Following @jroon 's code I arrived to this model for spanish data:
library(tidyverse)
library(brms)
raw < read_csv("spain_data.csv")
formula < casos ~ A * exp( exp( (k * (days  delay) ) ) )
form_mult < bf(formula,
A ~ 1 + (1  ccaa),
k ~ 1 + (1  ccaa),
delay ~ 1 + (1  ccaa),
nl = TRUE)
priors < c(
prior(normal(0, 13000), nlpar = "A", lb=200),
prior(normal(.1, .05), nlpar = "k", lb=0),
prior(normal(5, 20), nlpar = "delay", lb=0),
prior(student_t(3, 0, 5000), class = "sigma"),
prior(student_t(4, 3000, 5000), class = "sd", group = "ccaa", nlpar = "A"),
prior(student_t(4, 0, 0.03), class = "sd", group = "ccaa", nlpar = "k")
)
mod < brm(form_mult,
data = raw,
prior = priors,
seed = 1234,
family = gaussian("identity"),
iter = 4000,
chains = 4,
cores=4,
sample_prior = "no",
control = list(adapt_delta = 0.99,
max_treedepth = 12))
This code gave me this results:
It works well I guess, but the CI seems to me too optimistic for the bigger regions. It looks like the CI are more or less the same size in all regions and that doesn’t seem reasonable to me. Is there a way to correct this and let de CI in each region be different? I thought that the key was in this line of code but I’m not sure I understood what this prior means: prior(student_t(4, 3000, 5000), class = "sd", group = "ccaa", nlpar = "A")
Of course, if you see something wrong with the code (probably there is) please tell me.
Thank you all in advance.
spain_data.csv (21.0 KB)
Arhg… I had a long answer typed up and hit a wrong button and lost it. Ok bullet point version:

I tried to run your model as you posted it using data you posted and it took a long time and there were many divergences and my fit was very different. Is your graph from the exact model you posted? (wondering if I need to update something on my system here).

The model doesn’t include any term for the size of any of the areas and so the CI are mostly governed be the number of datapoints for each area and since your data has 46 points for every area this is why CI similar by area.

Another problem with this model is that it doesn’t allow for the fact that different areas could be at different points in the S curve. As it happens all the Spanish areas are approaching the asymptote so in your plot this doesn’t manifest any problem, but imagine one area was still in the exponential growth phase near the bottom of the S  this model would force and S shape on it even though that area is in effect yet only an exponential curve.
For these reasons and others I moved to a reparameterised model Hierarchical Gompertz model However I gave up on the whole idea before perfecting the model. For some of the issues raised in the thread here, but also because I became more wary about the data that is available for various countries for various reasons I came to the conclusion that do do any useful models would need better data than count data.
Thanks for your reply.
About 1) yes the plot was made with that same model but with the values gotten from the predict()
function. Is this right or should I use the conditional_effects()
function? I didn’t get correctly the way you control the ‘scales’ argument for the facets, so I did it the long way:
# Fitted
pred < as.data.frame(predict(mod, probs = c(0.01, 0.99), re_formula = NULL))
colnames(pred)[3] < "low"
colnames(pred)[4] < "upp"
pred$ccaa < raw$ccaa
pred$days < raw$days
pred$observed < raw$casos
pred$pob < raw$pob
pred$low[pred$low < 0] < 0
pred %>%
ggplot(aes(x = days, y = observed)) +
geom_point(aes(color = dentro), size = 0.4) +
geom_ribbon(aes(ymin = low, ymax = upp ), alpha = 0.3) +
facet_wrap(~ccaa, scales = "free_y")
About the time taken for the model to fit, well… I thought it was normal. The divergences had to do with Maximum treedepth exceeding? I did get those but I read the Stan documentation and read that those weren’t so worrying and got to do with sampling efficiency.

How would I include this term for the size of each are? as a prior or do I have to change the formula? Can you elaborate a bit?

When I first tried this model with data for Madrid (so just one area) and left 15 observations out of the sample, I got this which I thought at the moment was a pretty great prediction:
The fitted values for the in sample data points don’t seem like been forced as an S shape. The dashed line (the out of sample prediction) does look like an S, but because it looks like it got right the prediction.
When I fit this model just for one area tweaking the priors (maybe not the way their supposed, I guess, I’m just starting with the bayesian framework and probably doing things incorrectly) I get really impressive results. But again I’m probably doing something wrong.
I’ll try with the reparameterised model you present there. I read that a few days ago but didn’t understand exactly what a “missing value approach” is. I’ll look that up. Just to be clear the formula for that new equation should be wrote in R this way? isn’t Ymax the same as A (the upper asymptote)?:
formula < casos ~ ymax * exp(exp( (k * (tmax  d)  exp(k*(td)) ) ) )
Thank you very much for taken your time answering a bayes noob xD
Oh yes that produced the same plot for me. Strange!
Don’t think you can. I didn’t think of a way anyhow. What I came to realise is the Gompertz model is fundamentally a bad model for the task at hand for the reasons we are talking about and probably others.
Yes but that curve was already fairly well advanced. Also because its not hierarchical its borrowing information from other areas and so this one area is less constrained that it would be in a hierarchical model. If you look at some of my monstrosityplots earlier in the thread you can see this effect happening. The fact that your current model is so slow and gets many divergences is a warning that the model is badly specified  even if the plots look nice.
I could not get any model to work well on the hierarchical level. No you are probably not doing something wrong  its just a badly posed problem I concluded. Ultimately also, if we did get this approach to work, its not really very different to the IHME model which has received alot of criticism: https://covid19.healthdata.org/spain
Behind these predictions is a generalized loistic model that is quite similar really: https://ihmeuwmsca.github.io/CurveFit/methods/ Part of the reason I stopped working on this is I realisd I could not make a model as good as IHME did, and they got alot of negetive attention over this model.
Sorry  this reflects my confusion at various points. Ymax is the last observed Y which is different to A  its a bad choice of name but my understanding was shifting as I was doing it. By “missing value approach” I meant I treated Ymax as a missing value of Y, which was necessary for the reasons martin pointed out. This was tricky to get work and looking at code now its a real mess, if it was clean i’d share it, but I’d need a couple of hours to sort it out and I dont’ have time right now
Also a noob here please remember!!
Here, for a curve considered, t is varying, while d is a fixed unknown one.
Also, we usually say x and y are nonidentifiable when both x and y are parameters.
t is not parameter, while d is a parameter.
How identifability is an issue?
I share with you the model we were building a couple of weeks ago. The last thing we tried to do was to fit a function on the c parameter to add more flexibility. Although predictions improved for some countries (we’ve been testing for the last 2 weeks), for others are pretty bad. Also, interpretability is almost gone by this timebased function on c. This is due to fact that c is a relative measure with respect to A, and it will increase when the curve is close to the flattening part.
How do you know it is not a model problem? I am not sure which model you are referring to, but I don’t think the models presented here are flexible enough to evaluate the performance of the HMC. For the Gibbs sampler we need the full conditional distributions, right?
Yes, I agree with you that Gompertz is not flexible enough. Clearly the measures taken by governments and other factors influenced the curve for the past weeks. But I can’t see how we can evaluate HMC over this.
Please read our paper:
It seems that our problem is similar
We have used the Richard’s curve: please read our paper if interested. Richard’s curve is actually more frequently used in describing infection trajectory in the literature.