Brms Modeling with aggregate data/weights

I want to see the effect of partial pooling in the following data. However, each row corresponds to a sum of multiple lines for each of the states. Thus, states with less exposure represents less original rows (less weight). The last column “PurePremium” is the ratio I want to model (Cost/Exposure). I was thinking 2 options. 1) Model cost with Exposure as offset and weights. 2) Model ratio (Cost/Exposure) with Exposure as a weight.
My problem is either I do it directly, or even normalizing weights, the model does not converge. Summing up:

  1. I don’t know what role play weights for partial pooling, I guess states with more exposure contributes more to the overall mean.

  2. In which case we have to normalize weights and how that affects the estimation algorithm. It is noteworthy that the exposures are so high in this case so if the variance is divided by weights, should we model costs directly to put it in level?

    state Cost Exposure PurePremium
    398 AZ 427243576 1352274 315.9445
    399 CA 209261012 429744 486.9434
    400 CO 343279991 856492 400.7977
    401 CT 164581024 335020 491.2573
    402 FL 5244800741 13094660 400.5297
    403 GA 4231860437 7924618 534.0144
    404 IL 332640238 833802 398.9439
    405 IN 657799052 1083470 607.1225
    406 MD 232967856 580956 401.0077
    407 MI 308049813 1153222 267.1210
    408 MN 46624519 117468 396.9125
    409 MO 601465340 1619952 371.2859
    410 NV 116394232 341410 340.9222
    411 NY 597147136 1044810 571.5366
    412 OH 580629529 1479476 392.4562
    413 OK 76540657 213862 357.8974
    414 PA 690546185 2131986 323.8981
    415 TX 3834934060 9140948 419.5335
    416 UT 192174885 931374 206.3348
    417 VA 218375023 458498 476.2835
    418 WI 379522917 745888 508.8202

model_nopool <- brms::brm(
Cost |
weights(Exposure) ~ state,
data = SplitStates_2015,
chains = 2,
iter = 20000,
family = Gamma(link=“log”),
prior = model_prior
control= options(list(max_treedepth = 15))

model_partialpool <- brms::brm(
PurePremium |
weights(Exposure_norm) ~ (1 |
data = SplitStates_2015,
chains = 2,
iter = 40000,
family = Gamma(link=“log”),
prior = model_prior,
control= options(list(adapt_delta=0.95,max_treedepth = 15))

I don’t know how weights work in brms, but what does your raw data look like?

It it for every state, you have a set of costs and exposures, like:

state cost exposure
A 1 2
A 2 1
A 2 2
B 1 1

And you’re trying to get estimates of the cost given the state and exposure?

Yes, my raw data looks like that. The first problem is that each row of my raw data is already an average from the state for a specific condition. So first I’ll do a simple model of cost/exposure given the state: cost/exp ~ (1|state).
Since I don’t have individual data I am losing information about the variation within each condition in each of the states. I only now how much exposure I have for each condition, but that’s not enough information to model variation.
I will just model the averages coming from the states as I mentioned above, without considering the exposure as weights. The only problem with this is that the number of conditions inside a state is fixed, even though the average cost for the state is a random variable (because it comes from a fixed sum of rvs).
What do you think it would be the best approach?
PD: Weights are simply constants that multiply the contribution to the loglik. If I want to model individual outcomes, I’d need to include the weights as a proxy, which I already tried but it was a mess for the sampling algorithm (couldn’t get a sample at all).

So would it be possible to do the regression?:

cost/exp ~ (1 | condition) + (1 | state) + (1 | condition:state)

If you only have the means for each state/condition (and don’t have any repetitions of this) I don’t know of a way to extract information about the variance you’d get on an individual level (like what you’d get if you were able to add a (1 | individual) term to the regression).

Aahh, I see, thanks.

I was thinking something like this:
cost/exp ~ (condition | state)

Do you know the differences with your proposal?

Is condition a factor (discrete) variable or a continuous variable? If it’s on the left hand side, it’ll show up like some sort of linear predictor. If it’s on the right hand side, it’s a grouping term in a hierarchical model.

If it’s continuous, it probably doesn’t make sense for it to be on the right side (infinite groups).

If it’s discrete and has more than 2 levels it might make sense to do the (1 | condition) formulation. You get get the partial pooling regularization effect that way.

Putting it on the left hand side corresponds to a non-pooled estimate of the effects of condition in each state. I think it’d show up in the design matrix like one-hot encoded something or another, if that helps at all. You can generate Stan code/data for brms models with make_stancode/make_standata and try to reverse-engineer differences if things get confusing.

How many observations do you have per state? It could be that the model has problems as you use a distribution with overdispersion (Gamma) but only have a single observation per state, which than clashes with the overdispersion parameter. Not sure if this is a helpful suggestion at all though.

With regard to your original question, if you want to model the ratio, adding offset(log(Exposure)) seems like the canonical solution for log-models.

Hello Paul,
I am currently working now with many observations per state, which are provided services (for example State FL, Maternity service), as bbbales mentioned. The problem is that the list of services varies across the states, cause people who treated the data merged groups that presented low exposure into “Other services”. I am thinking the problem similarly to the study of test scores and schools in US, where students are provided services and school are the states: cost/exp ~ (1 | state) + (1 | condition:state).
The only problem is that the shrinkage effects are given by the number of provided services and not the number of exposure. It makes sense because they are related but it is not proper from the data but the merging criteria (less exposure, more merges).
I can’t do miracles with this dataset but maybe I could think a way to smooth the parameters with the exposure information, and not just with the number of provided services, i.e., states with 40.000 exposure (equivalent to 4000 patients more or less) will not be representative of the state outcome.
Moreover, using exposure as weights results in very narrow likelihood functions which causes some problems in sampling. In this type of data having 10.000.000 of exposure may not represent a robust number, so It is not very clear how the certainty scales with the number of observations (without having data at the lowest level of information).

I am sorry I didn’t respond earlier but I really don’t have much to suggest right now as I am lacking the subject matter knowledge.

I need to research about multiple inputation or a mixture model maybe. I have unobserved service values for some states but I have the sum of the unobserved. Do you have any guide where to look at?

I am not sure in which direction I can reasonably point you to, but perhaps @lauren has some suggestions?