Constraining predictions of matrices to sums of categories of rows and columns

mts24 · November 5, 2018, 7:13pm

Hello everyone,

I am completely new to the community and a novice in running Bayesian models. My question pertains to a set of matrices that I have collected and standardized from across many countries and years. Each matrix represents the total health spending in a given country year and is broken-down by the two dimensions - health spending by types of providers (~20 columns) and health spending by types of services (~20 rows). The sum of the rows must equal the sum of the columns. These matrices have varying levels of completeness (ranging form only totals - where cross-tabulations are completely empty - to nearly complete cross-tabulations).

Using the rstanarm package, I am currently able to predict cell-specific estimates across the countries and years, however, I am not sure how to incorporate the constraints that the sum of the rows must equal the sum of the columns. Has anyone added similar constraints to their models?

Thanks so much,
Matt

bgoodri · November 5, 2018, 7:15pm

You would have to write your own Stan code to enforce a constraint such as that in the log-likelihood.

mts24 · November 5, 2018, 11:21pm

Thanks for the response, Ben. If I wanted to write my own Stan code that would allow for this constraint as well as the functionality from stan_glmer would you have some recommendations for starting places for me (example code, etc)?

In the end I will want to have country and time effects (possibly random effects) in addition to covariates such as gross domestic product.

bgoodri · November 5, 2018, 11:55pm

I would start with brms::make_stancode(formula, data). I am not totally sure how you want to do your model, but you could have the conditional mean of the rows be a function of the covariates and the conditional mean of the columns be a scaled simplex where the scaling factor is the sum of the conditional means for each row.

mts24 · November 7, 2018, 8:48pm

Thanks, Ben. This sounds really promising. I’ve been reviewing brms::make_stancode(formula, data) and the simplex {boot} function in R. However, not having a strong CS or math background (mostly biostats) this leaves me with many questions. I wonder with the below example of my previous model and data could you add a bit more detail?

Variable descriptions:
country = country name
c_num = numeric country value
year = year of data
sha_hp = health provider category
sha_hp_num = numeric health provider category
sha_hc = health function (services) category
sha_hc_num = numeric health function (services) category
ln_gdp_pc = logged GDP per capita (country-year specific)
value_pc = health expenditures (USD) in per captia space of specific country-year provider-function

My previous rstanarm model used was:

model_1 <- ln_value_pc ~ ln_gdp_pc + (1|c_num) + (1|sha_hc_num) + (1|sha_hp_num)

stan_glmer(formula = model_1, data)

The ppc distributions looked like this:

ppc

bgoodri · November 7, 2018, 8:54pm

The images are missing

mts24 · November 7, 2018, 10:12pm

Sorry about that Ben. I’m trying to figure out why they didn’t and how I can get them to upload.

mts24 · November 7, 2018, 10:53pm

Okay, let me know if you can see them now.

bgoodri · November 8, 2018, 12:56am

OK, I can see them now. With these variables, what is supposed to equal the sum of something else?

mts24 · November 8, 2018, 5:26pm

The data head I showed is in the long format. So for a given country-year (Afghanistan 2008), there will be expenditures reported by health providers (sha_hp) which would sum across all variants of health functions (sha_hc). So the category sha_hp = 1 would have expenditure values across sha_hc = 1 to 9; the sum of these would equal the total for sha_hp = 1. The same would hold for any category of sha_hp, summing across the categories of sha_hc. Then the sum of the totals of sha_hp’s must equal the sum of total sha_hc’s, which is the total health expenditures in that given country-year.

I hope this makes sense, but if not, the original data that was compiled is a table (or matrix) for a specific country-year where rows are the health functions, the columns are the health providers, and the margins are the totals, which sum to the total health expenditure.

mts24 · November 13, 2018, 8:11pm

Hi Ben,

Just wanted to follow up for any thoughts you might have with the information I’ve provided about my data? Particularly with your point below:

start with brms::make_stancode(formula, data) . I am not totally sure how you want to do your model, but you could have the conditional mean of the rows be a function of the covariates and the conditional mean of the columns be a scaled simplex where the scaling factor is the sum of the conditional means for each row.

bgoodri · November 15, 2018, 3:14am

So, in the .png you put up above, you have some rows of data in the long format that are sums of the other rows on health expenditures?

mts24 · November 15, 2018, 7:03pm

Currently, yes. However, I can reformat the data in whatever way works. Perhaps showing you an example of one of the raw matrices will help. Below you will see that the columns are health care providers and rows are health care functions, please ignore the sub-components of the categories (i.e. HC 1.1, HP 1.1) as I am only worried about making estimates are the first level (i.e. HC 1 and HP 1). The sum of the columns, equal the sum of the rows (1,277,281 million). I have, roughly 800 country-years worth of data with matrices with similar detail and roughly 600 where we only have the sums.

Topic		Replies	Views
Specifying priors for a matrix Modeling	2	2054	June 6, 2019
Reduce_sum with padded arrays Modeling rstan	7	264	February 23, 2024
Help understanding code and generating estimates General	9	945	April 25, 2020
Regression with parameter constraints Modeling	3	1617	November 2, 2017
Log binomial random intercept model Modeling fitting-issues	8	2701	March 4, 2018

Constraining predictions of matrices to sums of categories of rows and columns

Related topics