Empirical Bayesian ordinal regression model

fit <- stan_polr (FWS1 ~ Ethnic1 + Fam1 + Eco1+ Health1 + Safety1
+ Community1 + Religios1 + Housing1, data = as.data.frame (BayesOrdinal1),
method = “logistic”, prior = R2(0.2, “mean”),
prior_counts = dirichlet(1), init_r = 0.1, seed = 12345,
algorithm = “sampling”)

Error: ‘1’ is not a supported link for family ‘dirichlet’.
Supported links are: ‘logit’

Dear rstanarm and brms users
How pass this error? what is the reason?
How to specify the prior? How to calculate AIC?

Please need your help.
Thank you

Hi Yousif!

Hm, I don’t see where this is coming from. Could you post a subset of your data set and the exact script that produced this error?

Did you already read through the vignette for rstanarm::stan_polr?

Btw, AIC is not a Baysian qunatity. It is based on the Maximum Likelihood Estimator, which Stan doesn’t compute (at least not with sampling). See, for example, this thread and numerous other threats in this forum for alternatives (hint: loo package).

Cheers!
Max

FWS1 Strata1 Ethnic1 FamType1 Edu1 Income1 Fam1 Eco1 Health1 Safety1 Community1 Religios1 Housing1
High Urban Indian Nuclear Tertiary <=RM2000 High High Low Moderate High Moderate High
High Urban Indian Extended Primary <=RM2000 High High High High High High High
High Urban Chinese Nuclear Seondary <=RM2000 High Low High High High High Moderate
High Urban Chinese Nuclear Seondary <=RM2000 High Moderate High High High High Moderate
High Urban Chinese Nuclear Seondary RM2001-RM4000 High Moderate High High High High High
High Urban Chinese Nuclear Seondary <=RM2000 High High High High High High High
High Urban Chinese Nuclear Seondary <=RM2000 High Low High High High Moderate High
High Urban Bumiputra Nuclear Seondary RM2001-RM4000 High High High High High High High
High Urban Bumiputra Nuclear No Formal Edu <=RM2000 High Moderate Low High High High High
High Urban Bumiputra Nuclear No Formal Edu <=RM2000 High Low High High High High Low
High Urban Bumiputra Nuclear Seondary <=RM2000 High High High High High High High
High Urban Bumiputra Nuclear Tertiary RM4001-RM7000 High High High High High High Moderate
High Urban Bumiputra Nuclear Seondary RM2001-RM4000 High Moderate High High High High High
High Rural Bumiputra Extended Seondary RM2001-RM4000 High High High High High High High
High Rural Bumiputra Nuclear Seondary RM2001-RM4000 Low Low Moderate High High Moderate High
Moderate Rural Bumiputra Nuclear Primary <=RM2000 Moderate Low Low Moderate Low Low Low
High Rural Bumiputra Nuclear Seondary RM2001-RM4000 High High High High High High Low
High Rural Bumiputra Nuclear Primary <=RM2000 High High High High High High High
High Rural Bumiputra Nuclear Seondary RM2001-RM4000 High High High High High High High
Low Rural Bumiputra Nuclear Primary <=RM2000 Low Low Low Low Low Low Low
High Rural Bumiputra Single Seondary <=RM2000 High High High High High High High
High Rural Bumiputra Nuclear Primary <=RM2000 High Moderate Moderate High High High High
High Rural Bumiputra Nuclear Seondary <=RM2000 Moderate Moderate Moderate Moderate Moderate Moderate Moderate
High Rural Bumiputra Nuclear Primary <=RM2000 High High High High High High High
Low Rural Bumiputra Nuclear Seondary <=RM2000 High Moderate Moderate Moderate Moderate Moderate Moderate
High Rural Bumiputra Nuclear Tertiary <=RM2000 High High Moderate High High High Moderate
High Rural Bumiputra Single Seondary <=RM2000 High High High High High High High
High Rural Bumiputra Nuclear Primary <=RM2000 High High Low High High High High
High Rural Bumiputra Single Seondary RM2001-RM4000 High High Moderate High High High High
High Rural Bumiputra Nuclear Seondary <=RM2000 High High High High High High High
High Rural Bumiputra Nuclear Primary <=RM2000 High High High High High High High
High Rural Bumiputra Nuclear Seondary <=RM2000 High High High High High High High
Moderate Urban Bumiputra Nuclear Seondary <=RM2000 High Moderate Moderate Moderate High Low Low
High Urban Bumiputra Extended Tertiary RM4001-RM7000 High Low Moderate Moderate Moderate High Low
Moderate Urban Bumiputra Nuclear Tertiary >RM7000 High Moderate High High High High Moderate
High Urban Bumiputra Nuclear Tertiary >RM7000 High High High High High High High
High Urban Bumiputra Nuclear Seondary <=RM2000 High Low High High High Low High
High Urban Bumiputra Nuclear Tertiary RM4001-RM7000 High High High High High High Moderate
High Urban Bumiputra Nuclear Tertiary RM2001-RM4000 High Moderate High High High High High
High Urban Bumiputra Nuclear Seondary <=RM2000 Moderate Low Moderate Moderate Moderate High Low
High Urban Bumiputra Nuclear Seondary RM2001-RM4000 High High High High High Low Low
High Urban Bumiputra Extended Seondary <=RM2000 High High High High High High High
Low Urban Bumiputra Nuclear Seondary <=RM2000 Low Low Low Low Low Low Low
High Urban Bumiputra Single Seondary <=RM2000 High Low Moderate High Moderate High Low
High Urban Bumiputra Nuclear Seondary RM4001-RM7000 High High High High High High High
High Urban Bumiputra Nuclear Seondary RM2001-RM4000 High Moderate High High Low Low Low
Low Urban Bumiputra Nuclear Seondary <=RM2000 Moderate Low Low High Moderate Low Low
High Urban Indian Nuclear Seondary <=RM2000 High Low Low High High High High
High Urban Chinese Extended Seondary <=RM2000 High High High Moderate Low Low Moderate

Daer Max

Please find the attached data set and try to check the reason of the error.
How can we used the past data as a prior to estimate the parameters of the Bayesian ordinal regression of these data?

Hi Yousif!

I used the clipr::read_clip_tbl() function to get in the data, but next time it’d probably be easier if you could just upload the csv file using the Upload button in the menu. ;)

Anyway… the model actually ran fine for me:

df <- clipr::read_clip_tbl()

library(tidyverse)

df <- df %>% mutate(FWS1 = ordered(FWS1, levels = c("Low", "Moderate", "High")))

library(rstanarm)
options(mc.cores = parallel::detectCores())

fit <- stan_polr(
  FWS1 ~ Ethnic1 + Fam1 + Eco1 + Health1 + Safety1 + Community1 + Religios1 + Housing1, 
  data = df,
  method = "logistic", 
  prior = R2(0.2, "mean"),
  prior_counts = dirichlet(1), 
  init_r = 0.1, 
  seed = 12345,
  algorithm = "sampling"
)

summary(fit)

Fit summary:

Model Info:
 function:     stan_polr
 family:       ordered [logistic]
 formula:      FWS1 ~ Ethnic1 + Fam1 + Eco1 + Health1 + Safety1 + Community1 + 
	   Religios1 + Housing1
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 49

Estimates:
                     mean   sd   10%   50%   90%
Ethnic1Chinese      0.1    0.5 -0.5   0.1   0.7 
Ethnic1Indian       0.1    0.6 -0.7   0.1   1.0 
Fam1Low            -0.1    1.2 -1.6  -0.1   1.5 
Fam1Moderate        0.0    0.7 -0.9   0.0   0.8 
Eco1Low             0.0    0.5 -0.6   0.0   0.6 
Eco1Moderate       -0.1    0.3 -0.6  -0.1   0.3 
Health1Low         -0.2    0.5 -0.8  -0.2   0.5 
Health1Moderate     0.1    0.5 -0.6   0.1   0.7 
Safety1Low         -0.5    1.4 -2.4  -0.5   1.3 
Safety1Moderate    -0.1    0.6 -0.8  -0.1   0.6 
Community1Low       0.2    0.7 -0.7   0.2   1.2 
Community1Moderate -0.1    0.7 -0.9  -0.1   0.8 
Religios1Low       -0.2    0.5 -0.9  -0.2   0.4 
Religios1Moderate   0.0    0.6 -0.8   0.0   0.8 
Housing1Low         0.0    0.5 -0.7   0.0   0.6 
Housing1Moderate   -0.1    0.4 -0.6  -0.1   0.4 
Low|Moderate       -2.6    0.5 -3.3  -2.6  -1.9 
Moderate|High      -1.8    0.4 -2.4  -1.8  -1.3 

Fit Diagnostics:
                    mean   sd   10%   50%   90%
mean_PPD:Low      0.1    0.1  0.0   0.1   0.2  
mean_PPD:Moderate 0.1    0.1  0.0   0.1   0.1  
mean_PPD:High     0.8    0.1  0.7   0.8   0.9  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                   mcse Rhat n_eff
Ethnic1Chinese     0.0  1.0  5056 
Ethnic1Indian      0.0  1.0  5190 
Fam1Low            0.0  1.0  4929 
Fam1Moderate       0.0  1.0  4690 
Eco1Low            0.0  1.0  4629 
Eco1Moderate       0.0  1.0  5441 
Health1Low         0.0  1.0  4424 
Health1Moderate    0.0  1.0  5796 
Safety1Low         0.0  1.0  4895 
Safety1Moderate    0.0  1.0  4707 
Community1Low      0.0  1.0  4160 
Community1Moderate 0.0  1.0  4910 
Religios1Low       0.0  1.0  5215 
Religios1Moderate  0.0  1.0  5383 
Housing1Low        0.0  1.0  5299 
Housing1Moderate   0.0  1.0  4977 
Low|Moderate       0.0  1.0  5183 
Moderate|High      0.0  1.0  5239 
mean_PPD:Low       0.0  1.0  4623 
mean_PPD:Moderate  0.0  1.0  4124 
mean_PPD:High      0.0  1.0  4300 
log-posterior      0.1  1.0  1474 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

I don’t know where your error comes from… hm…

I am so sorry I though you want to see only how the data looks like.

[quote=“Max_Mantei, post:5, topic:13511”]
Anyway… the model actually ran fine for me:

Thank you for your interest.
I am also got the same result when I did the first run. After that I got an error as I mentioned. However, when I removed " prior_count=dirichlet(1) " it works.

What is the difference if we use " prior_count=dirichlet(1) " or remove it from stan_polr() ?
How to use past data as a prior ? which prior we can use instead of R2(0.2,“mean”) for this data?

Highly appreciated

No worries! I should have been more clear. :)

Did you include the prior specification in your first run?

This is all very well documented in the rstanarm vignettes. If you remove prior_count=dirichlet(1), it would not really change anything, since the default prior is dirichlet(1). Also, you can can obviously change the R^2 prior location: The more explanatory power you think your predictors have (jointly), the higher should be your prior R^2. You can also specify prior=NULL which will result in flat priors (uniform on the real number line) for the regression coefficients. This is generally not advisable.

You can verify that the default prior for the counts is indeed \text{Dirichlet(1)} and that prior=NULL yields flat priors:

fit_default <- stan_polr(
  FWS1 ~ Ethnic1 + Fam1 + Eco1 + Health1 + Safety1 + Community1 + Religios1 + Housing1, 
  data = df,
  method = "logistic",
  prior = NULL,
  init_r = 0.1, 
  seed = 12345,
  algorithm = "sampling"
)

Now call prior_summary on the fit_default object:

> prior_summary(fit_default)
Priors for model 'fit_default' 
------

Coefficients
 ~ flat

Counts
 ~ dirichlet(concentration = [1,1,1])
------
See help('prior_summary.stanreg') for more details

You should really check out the rstanarm vignettes. They are awesome!

Dear Max
Thank you so much
I have read rstanarm` vignettes and I have check it.
Now my question is how to used the past data (data of 2011) as a prior to estimate the parameters of data of 2016.

How to use covariance matrix as a prior?

Thank you

I can only recommend what @bbbales2 said in the other thread. I hope I don’t sound harsh here, but I think the problems are a bit more conceptual. I know that Bayesian reasoning is often sold as updating prior beliefs after seeing data, then looking at new data with the new prior and update again and so on. But this is not what is typically done Bayesian Statistics. At least I don’t think it is helpful to think about it that way. I guess that is what @bbbales2 was trying to convey (just estimate your data jointly!).

I’d recommend you pick up a good textbook and familiarize yourself a bit more with this approach to Bayesian modeling. I always recommend the Gelman and Hill textbook or Statistical Rethinking by Richard McElreath.

1 Like

s

Thank you

I will try to use this.
What about using prior of covariance matrix that mentioned in rstanarm?
How to use it as a prior?

greatly appreciated