Hi everyone,
I’m working on a model to infer the parameters of an underlying distribution, given only per-group summary statistics - specifically the sample minimum, mean, and maximum—without access to raw observations. The number of observations n per group is also known.
My approach is to factor the likelihood:
f(y_{\text{min}}, y_{\text{avg}}, y_{\text{max}}) = f(y_{\text{min}}, y_{\text{max}}) \cdot f(y_{\text{avg}} | y_{\text{min}}, y_{\text{max}})
The f(y_{\text{min}}, y_{\text{max}}) term is the joint density of the min and max order statistics, which is straightforward to implement.
For the conditional mean f(y_{\text{avg}} | y_{\text{min}}, y_{\text{max}}) it is probably easier to reason in terms of y_{sum} = n y_{avg} - y_{min} - y_{max} so that the conditional mean can be replaced by f(y_{sum}| y_{\text{min}}, y_{\text{max}}) which is the likelihood for the sum of the n-2 “inner” order statistics, assuming these are i.i.d. draws from the parent distribution truncated to [y_{\text{min}}, y_{\text{max}}].
What I’ve tried so far is a normal approximation for y_{sum}. This approach seems to work, but I’m relying on a CLT approximation and it can be a bit burdensome coding the mean and the standard deviation of the truncated parent distribution.
I’m wondering if there is a better or more exact approach?
What about modeling the n-2 latent variables for each group as parameters? How would one sample parameters in Stan that satisfy a hard sum constraint and boundary constraints?