# What is the bayesian counterpart to the chi-square test?

I know a little about Bayesian models of Fisher’s exact test, from the rate of left-handedness different between the male and female groups.

library(tidyverse)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

tb <- tibble::tribble(
~sex,   ~left_handed, ~right_handed,
"male",           9L,           43L,
"female",         4L,           44L
)

stan_program <- "
data {
int<lower=1> event_1;
int<lower=1> event_2;
int<lower=1> n_1;
int<lower=1> n_2;
}
parameters {
real<lower=0,upper=1> p_1;
real<lower=0,upper=1> p_2;
}
model {
event_1 ~ binomial(n_1, p_1);
event_2 ~ binomial(n_2, p_2);
p_1 ~ beta(5, 40);
p_2 ~ beta(5, 40);
}
generated quantities {
real diff = p_1 - p_2;
}
"

stan_data <- list(
n_1      = 52, # male
event_1  = 9,  # left-handed male
n_2      = 48, # female
event_2  = 4   # left-handed female
)

stan_fit <- stan(model_code = stan_program, data = stan_data)


But I have a new @mcmc_stan questions for chi-square test, such as the case below

library(tidyverse)

gss %>%
select(party, NASA) %>%
ggplot(aes(x = party, fill = NASA)) +
geom_bar()


Classical approach

chisq.test(gss$party, gss$NASA)


What is the bayesian counterpart to the chi-square test ?
Can you give me some advice?

Hi,

it looks like you would need a categorical/multinomial model, which you then use to draw samples from the posterior to examine what differences there are between the categories?

2 Likes

@torkar I try to write it, but I’m not sure it’s right.

stan_program <- "
data {
int<lower=1> N;
int<lower=2> C;
int<lower=2> J;
int<lower=1, upper=C> y[N];
int<lower=1, upper=J> g[N];
}
parameters {
simplex[C] theta[J];
}
model {
for(j in 1:J) {
target += dirichlet_lpdf(theta[j] | rep_vector(2, C));
}

for(n in 1:N) {
target += categorical_lpmf(y[g[n]] | theta[g[n]]);
}
}
generated quantities {
matrix[C, J-1] diff;
for(i in 1:C) {
for(j in 1:J-1) {
diff[i, j] = theta[i, j] - theta[i, j+1];
}
}
}
"

stan_data <- gss %>%
tidybayes::compose_data(
N     = nrow(gss),
C     = n_distinct(NASA),
J     = n_distinct(party),
g     = gss$party, y = gss$NASA
)

stan_fit <- stan(model_code = stan_program, data = stan_data,iter = 1000, chains = 4)


4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.

         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat


theta[1,1] 0.92 0.00 0.04 0.83 0.90 0.92 0.95 0.98 2382 1
theta[1,2] 0.04 0.00 0.03 0.00 0.02 0.03 0.05 0.11 2379 1
theta[1,3] 0.04 0.00 0.03 0.01 0.02 0.03 0.05 0.11 2650 1
theta[2,1] 0.03 0.00 0.02 0.00 0.01 0.02 0.03 0.07 2912 1
theta[2,2] 0.95 0.00 0.02 0.89 0.94 0.95 0.97 0.99 3076 1
theta[2,3] 0.03 0.00 0.02 0.00 0.01 0.02 0.03 0.07 3079 1
theta[3,1] 0.05 0.00 0.03 0.01 0.03 0.04 0.07 0.13 3321 1
theta[3,2] 0.90 0.00 0.04 0.80 0.88 0.91 0.93 0.97 2983 1
theta[3,3] 0.05 0.00 0.03 0.01 0.03 0.04 0.07 0.13 3137 1
diff[1,1] 0.88 0.00 0.06 0.73 0.85 0.89 0.93 0.97 2361 1
diff[1,2] 0.00 0.00 0.04 -0.08 -0.02 0.00 0.02 0.08 2662 1
diff[2,1] -0.92 0.00 0.04 -0.98 -0.95 -0.93 -0.90 -0.83 3054 1
diff[2,2] 0.92 0.00 0.04 0.83 0.90 0.93 0.95 0.98 3088 1
diff[3,1] -0.85 0.00 0.07 -0.96 -0.90 -0.86 -0.82 -0.69 3080 1
diff[3,2] 0.85 0.00 0.07 0.69 0.81 0.86 0.90 0.96 2983 1
lp__ -39.75 0.06 1.81 -44.28 -40.70 -39.43 -38.43 -37.28 784 1

Samples were drawn using NUTS(diag_e) at Mon Feb 01 18:54:14 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Is this bayesian’s result consistent with Classical approach?

chisq.test(gss$party, gss$NASA)


Pearson’s Chi-squared test

data: gss$party and gss$NASA
X-squared = 1.3261, df = 4, p-value = 0.8569

Hi,
here’s a lot of input you can check out :)

@torkar Thank you. That’s what I need.

1 Like