Posterior predictive checking for multinomial logit

dkaplan · November 8, 2020, 5:38pm

Greetings all. I’m trying to figure you how to get posterior predictive checks for mutinomial regression. Here is the code.

title: “MultiNomialReg.stan”
date: “10/16/2020”
output: html_document

knitr::opts_chunk$set(echo = TRUE)

library(rstan)
library(bayesplot)
options(mc.cores = 3)

Read in data

male: (1=male)

ASBG04: # of books in home 1) 0-10, 2) 11-25, 3) 26-100, 4) 101-200 5) More than 200

ASBGSMR: Students motivated to read (4pt, strongly agree to strongly disagree)

ASBR07A: I do well in reading

Canadareg <-read.csv("~/desktop/Canada.csv",header=TRUE) #browse to select data "Canada.csv"
Canadareg <- Canadareg[1:5000,]  # Just to cut down the sample size.
Canadareg[Canadareg==999999]=NA
Canadareg <- na.omit(Canadareg)
male <- ifelse(Canadareg$ITSEX==2,0,1)
Canadareg1 <- subset(Canadareg,select=c(ASBR07A, male, ASBG04,ASBGSMR))
Canadareg1 <- data.frame(Canadareg1)

N <- nrow(Canadareg1)
f <- as.formula("ASBR07A ~ male")
M <- model.matrix(f,Canadareg1)

data.list <- list(N=nrow(M), 
                  K=length(unique(Canadareg1[,1])),
                  D=ncol(M), 
                  x=M, 
                  ASBR07A=as.numeric(Canadareg1[,1]))

modelString <- "

data {
  int <lower=2> K;              // This is 4, the number of outcomes categories
  int <lower=0> N;            
  int <lower=1> D;              // This is the number of columns in the design                                    matrix - 2
  int <lower = 1, upper = K> ASBR07A[N];
  matrix[N, D] x;              // This will be N by 2 matrix of data
}

parameters {
  matrix[D, K] beta;     // This is a 2 x 4 matrix of betas
}


model {
  matrix[N, K] x_beta = x * beta;   //  N x 2  * 2  x 4 
  
  to_vector(beta) ~ normal(0, 5);

  for (i in 1:N)
    ASBR07A[i] ~ categorical_logit(x_beta[i]');
}

generated quantities {
 int<lower=1,upper=K> ASBR07A_rep[N];
  for (i in 1:N){
   ASBR07A_rep[i] = categorical_logit_rng(matrix beta);
 }
}

"

Start estimation

nChains = 3
nIter= 10000
thinSteps = 10
burnInSteps = floor(nIter/2)

ASBR07A = data.list$ASBR07A

MultiNomRegFit = stan(data=data.list,model_code=modelString,chains=nChains,
              iter=nIter,warmup=burnInSteps,thin=thinSteps)

Stan plots

stan_plot(MultiNomRegFit, pars=c("beta"))
stan_trace(MultiNomRegFit,inc_warmup=T, pars=c("beta"))
stan_dens(MultiNomRegFit, pars=c("beta"))
stan_ac(MultiNomRegFit, pars=c("beta"))

These plots show good evidence of convergence for all parameters in the model.

Stan output

print(MultiNomRegFit,pars=c("beta"))

Posterior predictive checks

ASBR07A_rep = extract(MultiNomRegFit,"ASBR07A_rep")$ASBR07A_rep
ppc_bars(y=ASBR07A, yrep= ASBR07A_rep)

The error message I am receiving is

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable “matrix” does not exist.
error in ‘model93e3d1c8db5_9c080fd17354e62514567a23c3da5154’ at line 28, column 48

26:  int<lower=1,upper=K> ASBR07A_rep[N];
27:   for (i in 1:N){
28:    ASBR07A_rep[i] = categorical_logit_rng(matrix beta);
                                                   ^
29:  }

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘9c080fd17354e62514567a23c3da5154’ due to the above error.

I have tried other values for categorical_logit_rng(matrix beta), including (beta) and (vector beta) to no avail - same error message.

Thanks in advance for your suggestions.

David

bbbales2 · November 9, 2020, 12:29pm

If you’re trying to generate data according to your likelihood:

for (i in 1:N)
    ASBR07A[i] ~ categorical_logit(x_beta[i]');

then use:

int<lower=1,upper=K> ASBR07A_rep[N];
for (i in 1:N){
  ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
}

The docs for this one are over here (and some of the functions can be pretty picky so always worth checking): https://mc-stan.org/docs/2_25/functions-reference/categorical-distribution.html

That work?

dkaplan · November 10, 2020, 3:56am

Thanks, but I’m afraid it didn’t work.

Here is the error message

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable "x_beta" does not exist.
 error in 'model254b51e596d_5340eb0229dbdee84ac2f7d20844c9a3' at line 27, column 47
  -------------------------------------------------
    25: int<lower=1,upper=K> ASBR07A_rep[N];
    26: for (i in 1:N){
    27:   ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
                                                      ^
    28: }
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '5340eb0229dbdee84ac2f7d20844c9a3' due to the above error.

Here is the code,

modelString <- "

data {
  int <lower=2> K;              // This is 4, the number of outcomes categories
  int <lower=0> N;            
  int <lower=1> D;              // This is the number of columns in the design                                    matrix - 2
  int <lower = 1, upper = K> ASBR07A[N];
  matrix[N, D] x;              // This will be N by 2 matrix of data
}

parameters {
  matrix[D, K] beta;     // This is a 2 x 4 matrix of betas
}


model {
  matrix[N, K] x_beta = x * beta;   //  N x 2  * 2  x 4 
  
  to_vector(beta) ~ normal(0, 5);

  for (i in 1:N)
    ASBR07A[i] ~ categorical_logit(x_beta[i]');
}
generated quantities {
int<lower=1,upper=K> ASBR07A_rep[N];
for (i in 1:N){
  ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
}
}
"

andrjohns · November 10, 2020, 4:31am

Any variables declared in the model block can only used within the model block. If you want to be able to use transformations in both the model and generated quantities block, then you need to use transformed parameters:

modelString <- "

data {
  int <lower=2> K;              // This is 4, the number of outcomes categories
  int <lower=0> N;            
  int <lower=1> D;              // This is the number of columns in the design                                    matrix - 2
  int <lower = 1, upper = K> ASBR07A[N];
  matrix[N, D] x;              // This will be N by 2 matrix of data
}

parameters {
  matrix[D, K] beta;     // This is a 2 x 4 matrix of betas
}
transformed parameters {
  matrix[N, K] x_beta = x * beta;   //  N x 2  * 2  x 4 
}

model {
  to_vector(beta) ~ normal(0, 5);

  for (i in 1:N)
    ASBR07A[i] ~ categorical_logit(x_beta[i]');
}
generated quantities {
int<lower=1,upper=K> ASBR07A_rep[N];
  for (i in 1:N){
    ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
  }
}
"

Topic		Replies	Views
Posterior predictive checks and LOOCV for multinomial logistic regression Modeling	3	479	April 10, 2022
Interpreting multinomial logit coefficients Modeling	2	764	April 12, 2022
Predicted Probabilities with posteriors in hierarchical multinomial (RStan) Modeling	2	438	November 19, 2019
Posterior Prediction from Logit Regression General	8	2039	December 14, 2019
Labeling and order of multinomial regression coefficients Modeling	18	710	November 29, 2020