The short question is: Does anyone have experience modeling sums of positive latent random variables? What are the tried and true techniques? I am having trouble.

The long question is: I have some *positive* latent functions Y_i(x) where i is indexing functions and x is an integer (or real). For example we might say the Y_i(x) are log linear regressions: log Y_i(x) = \alpha + \beta x + \epsilon_{i}(x) where \epsilon_{i}(x) \sim Normal(0, \sigma) i.i.d. which might look like this for \alpha =0.5, \beta=-0.025, \sigma=0.25. Note that this is a simplification for this initial model, I hope to eventually use a Gaussian process.

but I observe sums (or integrals) of Y_i(x) over (disjoint) intervals I_k. For example, I_1 =(0,5], I_2=(5,15], I_3=(15,30], I_4=(30,60], I_5=(60,100]. Then call my observations Z_{ik}

Z_{ik} = \sum_{x\in I_k} Y_i(x)

In the example we would have observations that look like this (the height is such that the area of the bar equals Z_{ik}):

(Note that in my real problem the intervals are not the same across all i so the real data might look like this instead:

I’m having trouble modeling this in Stan because (I believe, correct me if I’m wrong!) there is no closed form likelihood for the sum of lognormal variables.

I tried marginalizing as follows. In the above example, Z_{i, (L,U]} - \sum_{x\in (L, U-1]} Y_i(x) = Y_i(U) so Z_{i, (L,U]} - \sum_{x\in (L, U-1]} Y_i(x) \sim Lognormal(\alpha + \beta U, \sigma). The problem is when I implement this in Stan, with Y_i(x) latent for x \in (L, U-1]], there are a *lot* of rejected samples because the LHS can be negative and so the log doesn’t exist. Perhaps there is a way to constrain things? But note that in my real problem the observation intervals vary by i so it could be programmatically hard.

I also tried adding a measurement error term to “grease the wheels of inference”, e.g. changing the model to

Z_{ik} = \sum_{x\in I_k} Y_i(x) + Normal(0, \tau)

where \tau is my measurement standard deviation. I also had issues fitting this model, it seems very sensitive to the parameters and the priors and I got a lot of low E-BFMI warnings. Nevertheless this seems to me right now to be the most promising approach, even though it doesn’t guarantee Z_j > 0 so it could be unrealistic.

Edited to add: besides marginalization and measurement error, another possible way forward is the Fenton-Wilkinson approximation to the sum of lognormals by a lognormal by simply matching the first two moments, see here. Maybe this is the most promising. And for sums of 1 - 100 lognormals it seems from a few quick simulations that the approximation works OK though of course the fewer approximations in the model at this early stage the better…

Thought I’d stop and check whether anyone has experience or recommendations on how to proceed? Thanks!