Would it be easier only to model an effect on one parameter?
Are you having any divergences in the model ?
a represents the upper asymptote if I’m not mistaken? Shouldn’t it be different by country ?
There is an issue with identifiability in these models - see section on sigmoid curves here: Identifying non-identifiability
I don’t see the hierarchy. You are not pooling the A_i's since they come from different priors N(0,10000). A complete pooling model is the one that sets A∼N(0,10000), a no-pooling model A_i∼N(0,10000) and a partial-pooling model or hierarchical model A_i∼N(\gamma,\sigma_A) and you have to define hyperpriors on \gamma and \sigma_A.
So - in essence I wanted each A to have its own mean, but no I don’t want to share information across levels of A. I’m not convinced that it should be pooled apriori. In this parameterisation, A represents the upper asymptote which is very different by country and also multifactoria with many factors unknown to usl. I don’t think it make sense to share information across this parameter. I do think it makes sense to pool on the k parameter of my model this relates more closely to the properties of the virus itself.
Yes but it doesn’t need to be fully hierarchical on every variable! Given the huge differences in A between countries I think it would be borrowing too much information from the few countries with really large case counts.
You can check the picture I uploaded before about A. I still don’t get how those posteriors are generated but you get very different HDIs.
In the other hand, it would be reasonable to put a dependence structure between c and A to capture the reaction from governments to flatten the curve. I don’t know how to do this.
Martin - how did you come up with the substitution/ transformation ? I had pen and paper out to try to work out an analogous transformation for the following model but i got hopelessly stuck:
\\Y = Ae^{-exp(-k( t - d ))}
Identifiability issues occur when (t - d) = 0 (I believe)
Regards 2 - sorry I should have said - t is data (time). In my use its number of days the Covid outbreakis happening so its nowhere close to infinity thankfully!
No magic trick I fear - I just thought hard about what can be learned from the data and what cannot, did quite a bit of math and then tested a bunch of things until I found the one that worked. Also simulating data and seeing how changes in the parameters change the shape of the curve helped me get some insight.
If I get it right, t is known while A,k,d are paraemters, right? So treating Y = f(A, k, d, t) One way to go about this might be to take take t_{min}, t_{max} as the min/max t you observe and try to use y_{min} = f(A, k, d, t_{min}) and y_{max} = f(A, k, d, t_{max}) as new parameters - if you can solve for the other parameters given y_{min}, y_{max} - but I am really just guessing here. The value at midpoint of the observed t range might also be a good parameter. The point is that such values are by construction well constrained by data (but might be impractical because it is hard to derive the parameters you need from them).
EDIT: To be a bit more specific, since Gompertz is also a sigmoid curve. Here are some specific ideas I used with the logistic sigmoid:
If I observe only the start of the curve, the upper plateau (A) is not determined.
So instead we use the value at midpoint of observed data as a parameter. When I observe the upper plateau, this would roughly correspond to A/2, but it is constrained by data even when I only see the lower plateau.
If the inflection point of the curve is far from the observed data range, I only see the lower or the upper plateau, i.e. almost constant function. In other words if the inflection point is far from observed data, it has little influence on the actual shape of the curve. In this case the “slope” of the curve is also not informed by the data. To overcome this, we used the location of the inflection point on the x-axis as another parameter AND we put an informative prior on it to constrain it “close” to the range of observed data. This makes the slope somewhat identified as well.
This changes the interpretation of the model fit! If the posterior for the inflection point has notable mass outside the observed data range, we have to be aware that this part of the posterior is influenced almost exclusively by the prior and the actual inflection point can be much further from the observed data range than what the posterior might suggest. In this case the fitted slope is also just a consequence of the prior.
For the logistic sigmoid those resulted in reasonably neat formulae, not sure if that would be the case for Gompertz, so other tricks might be necessary.
where y_max, t and t_max are all taken from data meaning I now have a 2 parameter model.
Am I in thinking along the right lines here? Presumably then after I would fit this I then calculate A in the generated data section using the middle equation here ?
@mevers - I hope you don’t mind me part-invading your thread!
Edit: test for single country fits - i.e. no hierarchy look good so far! Will update as I progress!
Edit2: Algebra incorrect deriving last equation - fixed now.
Hi,
it looks roughly good (didn’t check the algebra), but y_{max} will be another parameter, not taken from data - if you take y_{max} from data, you are ignoring the observation noise for this value.