What is the bayesian counterpart to the chi-square test?

I know a little about Bayesian models of Fisher’s exact test, from the rate of left-handedness different between the male and female groups.

library(tidyverse)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

tb <- tibble::tribble(
  ~sex,   ~left_handed, ~right_handed,
  "male",           9L,           43L,
  "female",         4L,           44L
)


stan_program <- "
data {
  int<lower=1> event_1;        
  int<lower=1> event_2;        
  int<lower=1> n_1;            
  int<lower=1> n_2;            
}
parameters {
  real<lower=0,upper=1> p_1;    
  real<lower=0,upper=1> p_2;    
}
model {
  event_1 ~ binomial(n_1, p_1);
  event_2 ~ binomial(n_2, p_2);
  p_1 ~ beta(5, 40);
  p_2 ~ beta(5, 40);
}
generated quantities {
  real diff = p_1 - p_2;
}
"


stan_data <- list(
  n_1      = 52, # male
  event_1  = 9,  # left-handed male
  n_2      = 48, # female
  event_2  = 4   # left-handed female
)



stan_fit <- stan(model_code = stan_program, data = stan_data)

But I have a new @mcmc_stan questions for chi-square test, such as the case below

library(tidyverse)
load(url("https://bit.ly/2E65g15"))

gss %>%
  select(party, NASA) %>%
  ggplot(aes(x = party, fill = NASA)) +
  geom_bar()

Classical approach

chisq.test(gss$party, gss$NASA)

What is the bayesian counterpart to the chi-square test ?
Can you give me some advice?

Hi,

it looks like you would need a categorical/multinomial model, which you then use to draw samples from the posterior to examine what differences there are between the categories?

Please see here for more information:

2 Likes

@torkar I try to write it, but I’m not sure it’s right.

stan_program <- "
data {
  int<lower=1> N;        
  int<lower=2> C;            
  int<lower=2> J;            
  int<lower=1, upper=C> y[N]; 
  int<lower=1, upper=J> g[N]; 
}
parameters {
  simplex[C] theta[J];
}
model {
  for(j in 1:J) {
    target += dirichlet_lpdf(theta[j] | rep_vector(2, C));
  }
  
  for(n in 1:N) {
    target += categorical_lpmf(y[g[n]] | theta[g[n]]);
  }
}
generated quantities {
  matrix[C, J-1] diff;
  for(i in 1:C) {
    for(j in 1:J-1) {
      diff[i, j] = theta[i, j] - theta[i, j+1];
    }
  }
}
"


stan_data <- gss %>% 
  tidybayes::compose_data(
    N     = nrow(gss), 
    C     = n_distinct(NASA),
    J     = n_distinct(party),
    g     = gss$party, 
    y     = gss$NASA
)

stan_fit <- stan(model_code = stan_program, data = stan_data,iter = 1000, chains = 4)

Inference for Stan model: 632ef07f1f4765bada5f7672a4d94c91.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.

         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat

theta[1,1] 0.92 0.00 0.04 0.83 0.90 0.92 0.95 0.98 2382 1
theta[1,2] 0.04 0.00 0.03 0.00 0.02 0.03 0.05 0.11 2379 1
theta[1,3] 0.04 0.00 0.03 0.01 0.02 0.03 0.05 0.11 2650 1
theta[2,1] 0.03 0.00 0.02 0.00 0.01 0.02 0.03 0.07 2912 1
theta[2,2] 0.95 0.00 0.02 0.89 0.94 0.95 0.97 0.99 3076 1
theta[2,3] 0.03 0.00 0.02 0.00 0.01 0.02 0.03 0.07 3079 1
theta[3,1] 0.05 0.00 0.03 0.01 0.03 0.04 0.07 0.13 3321 1
theta[3,2] 0.90 0.00 0.04 0.80 0.88 0.91 0.93 0.97 2983 1
theta[3,3] 0.05 0.00 0.03 0.01 0.03 0.04 0.07 0.13 3137 1
diff[1,1] 0.88 0.00 0.06 0.73 0.85 0.89 0.93 0.97 2361 1
diff[1,2] 0.00 0.00 0.04 -0.08 -0.02 0.00 0.02 0.08 2662 1
diff[2,1] -0.92 0.00 0.04 -0.98 -0.95 -0.93 -0.90 -0.83 3054 1
diff[2,2] 0.92 0.00 0.04 0.83 0.90 0.93 0.95 0.98 3088 1
diff[3,1] -0.85 0.00 0.07 -0.96 -0.90 -0.86 -0.82 -0.69 3080 1
diff[3,2] 0.85 0.00 0.07 0.69 0.81 0.86 0.90 0.96 2983 1
lp__ -39.75 0.06 1.81 -44.28 -40.70 -39.43 -38.43 -37.28 784 1

Samples were drawn using NUTS(diag_e) at Mon Feb 01 18:54:14 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Is this bayesian’s result consistent with Classical approach?

chisq.test(gss$party, gss$NASA)

Pearson’s Chi-squared test

data: gss$party and gss$NASA
X-squared = 1.3261, df = 4, p-value = 0.8569

Hi,
here’s a lot of input you can check out :)

@torkar Thank you. That’s what I need.

1 Like