Empirical Bayesian ordinal regression model

yousifi · March 4, 2020, 6:10am

fit ← stan_polr (FWS1 ~ Ethnic1 + Fam1 + Eco1+ Health1 + Safety1
+ Community1 + Religios1 + Housing1, data = as.data.frame (BayesOrdinal1),
method = “logistic”, prior = R2(0.2, “mean”),
prior_counts = dirichlet(1), init_r = 0.1, seed = 12345,
algorithm = “sampling”)

Error: ‘1’ is not a supported link for family ‘dirichlet’.
Supported links are: ‘logit’

Dear rstanarm and brms users
How pass this error? what is the reason?
How to specify the prior? How to calculate AIC?

Please need your help.
Thank you

Max_Mantei · March 4, 2020, 9:50pm

Hi Yousif!

Hm, I don’t see where this is coming from. Could you post a subset of your data set and the exact script that produced this error?

Did you already read through the vignette for rstanarm::stan_polr?

Btw, AIC is not a Baysian qunatity. It is based on the Maximum Likelihood Estimator, which Stan doesn’t compute (at least not with sampling). See, for example, this thread and numerous other threats in this forum for alternatives (hint: loo package).

Cheers!
Max

yousifi · March 6, 2020, 4:11pm

FWS1	Strata1	Ethnic1	FamType1	Edu1	Income1	Fam1	Eco1	Health1	Safety1	Community1	Religios1	Housing1
High	Urban	Indian	Nuclear	Tertiary	<=RM2000	High	High	Low	Moderate	High	Moderate	High
High	Urban	Indian	Extended	Primary	<=RM2000	High	High	High	High	High	High	High
High	Urban	Chinese	Nuclear	Seondary	<=RM2000	High	Low	High	High	High	High	Moderate
High	Urban	Chinese	Nuclear	Seondary	<=RM2000	High	Moderate	High	High	High	High	Moderate
High	Urban	Chinese	Nuclear	Seondary	RM2001-RM4000	High	Moderate	High	High	High	High	High
High	Urban	Chinese	Nuclear	Seondary	<=RM2000	High	High	High	High	High	High	High
High	Urban	Chinese	Nuclear	Seondary	<=RM2000	High	Low	High	High	High	Moderate	High
High	Urban	Bumiputra	Nuclear	Seondary	RM2001-RM4000	High	High	High	High	High	High	High
High	Urban	Bumiputra	Nuclear	No Formal Edu	<=RM2000	High	Moderate	Low	High	High	High	High
High	Urban	Bumiputra	Nuclear	No Formal Edu	<=RM2000	High	Low	High	High	High	High	Low
High	Urban	Bumiputra	Nuclear	Seondary	<=RM2000	High	High	High	High	High	High	High
High	Urban	Bumiputra	Nuclear	Tertiary	RM4001-RM7000	High	High	High	High	High	High	Moderate
High	Urban	Bumiputra	Nuclear	Seondary	RM2001-RM4000	High	Moderate	High	High	High	High	High
High	Rural	Bumiputra	Extended	Seondary	RM2001-RM4000	High	High	High	High	High	High	High
High	Rural	Bumiputra	Nuclear	Seondary	RM2001-RM4000	Low	Low	Moderate	High	High	Moderate	High
Moderate	Rural	Bumiputra	Nuclear	Primary	<=RM2000	Moderate	Low	Low	Moderate	Low	Low	Low
High	Rural	Bumiputra	Nuclear	Seondary	RM2001-RM4000	High	High	High	High	High	High	Low
High	Rural	Bumiputra	Nuclear	Primary	<=RM2000	High	High	High	High	High	High	High
High	Rural	Bumiputra	Nuclear	Seondary	RM2001-RM4000	High	High	High	High	High	High	High
Low	Rural	Bumiputra	Nuclear	Primary	<=RM2000	Low	Low	Low	Low	Low	Low	Low
High	Rural	Bumiputra	Single	Seondary	<=RM2000	High	High	High	High	High	High	High
High	Rural	Bumiputra	Nuclear	Primary	<=RM2000	High	Moderate	Moderate	High	High	High	High
High	Rural	Bumiputra	Nuclear	Seondary	<=RM2000	Moderate	Moderate	Moderate	Moderate	Moderate	Moderate	Moderate
High	Rural	Bumiputra	Nuclear	Primary	<=RM2000	High	High	High	High	High	High	High
Low	Rural	Bumiputra	Nuclear	Seondary	<=RM2000	High	Moderate	Moderate	Moderate	Moderate	Moderate	Moderate
High	Rural	Bumiputra	Nuclear	Tertiary	<=RM2000	High	High	Moderate	High	High	High	Moderate
High	Rural	Bumiputra	Single	Seondary	<=RM2000	High	High	High	High	High	High	High
High	Rural	Bumiputra	Nuclear	Primary	<=RM2000	High	High	Low	High	High	High	High
High	Rural	Bumiputra	Single	Seondary	RM2001-RM4000	High	High	Moderate	High	High	High	High
High	Rural	Bumiputra	Nuclear	Seondary	<=RM2000	High	High	High	High	High	High	High
High	Rural	Bumiputra	Nuclear	Primary	<=RM2000	High	High	High	High	High	High	High
High	Rural	Bumiputra	Nuclear	Seondary	<=RM2000	High	High	High	High	High	High	High
Moderate	Urban	Bumiputra	Nuclear	Seondary	<=RM2000	High	Moderate	Moderate	Moderate	High	Low	Low
High	Urban	Bumiputra	Extended	Tertiary	RM4001-RM7000	High	Low	Moderate	Moderate	Moderate	High	Low
Moderate	Urban	Bumiputra	Nuclear	Tertiary	>RM7000	High	Moderate	High	High	High	High	Moderate
High	Urban	Bumiputra	Nuclear	Tertiary	>RM7000	High	High	High	High	High	High	High
High	Urban	Bumiputra	Nuclear	Seondary	<=RM2000	High	Low	High	High	High	Low	High
High	Urban	Bumiputra	Nuclear	Tertiary	RM4001-RM7000	High	High	High	High	High	High	Moderate
High	Urban	Bumiputra	Nuclear	Tertiary	RM2001-RM4000	High	Moderate	High	High	High	High	High
High	Urban	Bumiputra	Nuclear	Seondary	<=RM2000	Moderate	Low	Moderate	Moderate	Moderate	High	Low
High	Urban	Bumiputra	Nuclear	Seondary	RM2001-RM4000	High	High	High	High	High	Low	Low
High	Urban	Bumiputra	Extended	Seondary	<=RM2000	High	High	High	High	High	High	High
Low	Urban	Bumiputra	Nuclear	Seondary	<=RM2000	Low	Low	Low	Low	Low	Low	Low
High	Urban	Bumiputra	Single	Seondary	<=RM2000	High	Low	Moderate	High	Moderate	High	Low
High	Urban	Bumiputra	Nuclear	Seondary	RM4001-RM7000	High	High	High	High	High	High	High
High	Urban	Bumiputra	Nuclear	Seondary	RM2001-RM4000	High	Moderate	High	High	Low	Low	Low
Low	Urban	Bumiputra	Nuclear	Seondary	<=RM2000	Moderate	Low	Low	High	Moderate	Low	Low
High	Urban	Indian	Nuclear	Seondary	<=RM2000	High	Low	Low	High	High	High	High
High	Urban	Chinese	Extended	Seondary	<=RM2000	High	High	High	Moderate	Low	Low	Moderate

yousifi · March 6, 2020, 4:16pm

Daer Max

Please find the attached data set and try to check the reason of the error.
How can we used the past data as a prior to estimate the parameters of the Bayesian ordinal regression of these data?

Max_Mantei · March 6, 2020, 11:19pm

Hi Yousif!

I used the clipr::read_clip_tbl() function to get in the data, but next time it’d probably be easier if you could just upload the csv file using the Upload button in the menu. ;)

Anyway… the model actually ran fine for me:

df <- clipr::read_clip_tbl()

library(tidyverse)

df <- df %>% mutate(FWS1 = ordered(FWS1, levels = c("Low", "Moderate", "High")))

library(rstanarm)
options(mc.cores = parallel::detectCores())

fit <- stan_polr(
  FWS1 ~ Ethnic1 + Fam1 + Eco1 + Health1 + Safety1 + Community1 + Religios1 + Housing1, 
  data = df,
  method = "logistic", 
  prior = R2(0.2, "mean"),
  prior_counts = dirichlet(1), 
  init_r = 0.1, 
  seed = 12345,
  algorithm = "sampling"
)

summary(fit)

Fit summary:

Model Info:
 function:     stan_polr
 family:       ordered [logistic]
 formula:      FWS1 ~ Ethnic1 + Fam1 + Eco1 + Health1 + Safety1 + Community1 + 
	   Religios1 + Housing1
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 49

Estimates:
                     mean   sd   10%   50%   90%
Ethnic1Chinese      0.1    0.5 -0.5   0.1   0.7 
Ethnic1Indian       0.1    0.6 -0.7   0.1   1.0 
Fam1Low            -0.1    1.2 -1.6  -0.1   1.5 
Fam1Moderate        0.0    0.7 -0.9   0.0   0.8 
Eco1Low             0.0    0.5 -0.6   0.0   0.6 
Eco1Moderate       -0.1    0.3 -0.6  -0.1   0.3 
Health1Low         -0.2    0.5 -0.8  -0.2   0.5 
Health1Moderate     0.1    0.5 -0.6   0.1   0.7 
Safety1Low         -0.5    1.4 -2.4  -0.5   1.3 
Safety1Moderate    -0.1    0.6 -0.8  -0.1   0.6 
Community1Low       0.2    0.7 -0.7   0.2   1.2 
Community1Moderate -0.1    0.7 -0.9  -0.1   0.8 
Religios1Low       -0.2    0.5 -0.9  -0.2   0.4 
Religios1Moderate   0.0    0.6 -0.8   0.0   0.8 
Housing1Low         0.0    0.5 -0.7   0.0   0.6 
Housing1Moderate   -0.1    0.4 -0.6  -0.1   0.4 
Low|Moderate       -2.6    0.5 -3.3  -2.6  -1.9 
Moderate|High      -1.8    0.4 -2.4  -1.8  -1.3 

Fit Diagnostics:
                    mean   sd   10%   50%   90%
mean_PPD:Low      0.1    0.1  0.0   0.1   0.2  
mean_PPD:Moderate 0.1    0.1  0.0   0.1   0.1  
mean_PPD:High     0.8    0.1  0.7   0.8   0.9  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                   mcse Rhat n_eff
Ethnic1Chinese     0.0  1.0  5056 
Ethnic1Indian      0.0  1.0  5190 
Fam1Low            0.0  1.0  4929 
Fam1Moderate       0.0  1.0  4690 
Eco1Low            0.0  1.0  4629 
Eco1Moderate       0.0  1.0  5441 
Health1Low         0.0  1.0  4424 
Health1Moderate    0.0  1.0  5796 
Safety1Low         0.0  1.0  4895 
Safety1Moderate    0.0  1.0  4707 
Community1Low      0.0  1.0  4160 
Community1Moderate 0.0  1.0  4910 
Religios1Low       0.0  1.0  5215 
Religios1Moderate  0.0  1.0  5383 
Housing1Low        0.0  1.0  5299 
Housing1Moderate   0.0  1.0  4977 
Low|Moderate       0.0  1.0  5183 
Moderate|High      0.0  1.0  5239 
mean_PPD:Low       0.0  1.0  4623 
mean_PPD:Moderate  0.0  1.0  4124 
mean_PPD:High      0.0  1.0  4300 
log-posterior      0.1  1.0  1474 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

I don’t know where your error comes from… hm…

yousifi · March 7, 2020, 2:32am

I am so sorry I though you want to see only how the data looks like.

yousifi · March 7, 2020, 2:40am

[quote=“Max_Mantei, post:5, topic:13511”]
Anyway… the model actually ran fine for me:

Thank you for your interest.
I am also got the same result when I did the first run. After that I got an error as I mentioned. However, when I removed " prior_count=dirichlet(1) " it works.

What is the difference if we use " prior_count=dirichlet(1) " or remove it from stan_polr() ?
How to use past data as a prior ? which prior we can use instead of R2(0.2,“mean”) for this data?

Highly appreciated

Max_Mantei · March 7, 2020, 1:45pm

No worries! I should have been more clear. :)

Did you include the prior specification in your first run?

This is all very well documented in the rstanarm vignettes. If you remove prior_count=dirichlet(1), it would not really change anything, since the default prior is dirichlet(1). Also, you can can obviously change the R^2 prior location: The more explanatory power you think your predictors have (jointly), the higher should be your prior R^2. You can also specify prior=NULL which will result in flat priors (uniform on the real number line) for the regression coefficients. This is generally not advisable.

You can verify that the default prior for the counts is indeed \text{Dirichlet(1)} and that prior=NULL yields flat priors:

fit_default <- stan_polr(
  FWS1 ~ Ethnic1 + Fam1 + Eco1 + Health1 + Safety1 + Community1 + Religios1 + Housing1, 
  data = df,
  method = "logistic",
  prior = NULL,
  init_r = 0.1, 
  seed = 12345,
  algorithm = "sampling"
)

Now call prior_summary on the fit_default object:

> prior_summary(fit_default)
Priors for model 'fit_default' 
------

Coefficients
 ~ flat

Counts
 ~ dirichlet(concentration = [1,1,1])
------
See help('prior_summary.stanreg') for more details

You should really check out the rstanarm vignettes. They are awesome!

yousifi · March 7, 2020, 3:03pm

Dear Max
Thank you so much
I have read rstanarm` vignettes and I have check it.
Now my question is how to used the past data (data of 2011) as a prior to estimate the parameters of data of 2016.

How to use covariance matrix as a prior?

Thank you

Max_Mantei · March 7, 2020, 3:20pm

I can only recommend what @bbbales2 said in the other thread. I hope I don’t sound harsh here, but I think the problems are a bit more conceptual. I know that Bayesian reasoning is often sold as updating prior beliefs after seeing data, then looking at new data with the new prior and update again and so on. But this is not what is typically done Bayesian Statistics. At least I don’t think it is helpful to think about it that way. I guess that is what @bbbales2 was trying to convey (just estimate your data jointly!).

I’d recommend you pick up a good textbook and familiarize yourself a bit more with this approach to Bayesian modeling. I always recommend the Gelman and Hill textbook or Statistical Rethinking by Richard McElreath.

yousifi · March 7, 2020, 4:15pm

s

yousifi · March 7, 2020, 4:28pm

Thank you

I will try to use this.
What about using prior of covariance matrix that mentioned in rstanarm?
How to use it as a prior?

greatly appreciated

Topic		Replies	Views
Bayesian ordinal regression model (Empirical Bayes ordinal regression model) rstanarm	13	1133	March 3, 2020
Improving Performance on Logistic Regression with Informative Priors Modeling performance , rstanarm	4	1545	May 1, 2020
How to use a Covariance matrix prior in Bayesian ordinal regression model? rstanarm	3	672	March 18, 2020
Jacobian determinant adjustment for Rstanarm's implementation of the induced dirichlet prior in stan_polr Developers rstanarm	10	292	July 4, 2024
Prior for logistic regression when having knowledge on outcome rstanarm	4	1944	July 28, 2017

Empirical Bayesian ordinal regression model

Fit summary:

Related topics