Model with an ordered categorical predictor - NaN found and how do I add varying effects?

fsdias · July 18, 2020, 6:21pm

Hi
I’m working with a dataset I found on Statistical Rethinking, the Trolley datase. I’m following the book’s approach for this problem but I’m using STAN directly instead of “rethinking”. I’m regressing an ordinal variable against an ordinal predictor using the following model. “R” is an integer from 1 to 7 indicating how morally permissible the participant found a certain action and “E” is an integer from 1 to 8 representing the education level of that participant.

Data prep

dat <- list(
  R = R ,
  E = E,
  alpha = rep( 2 , 7 ) )

STAN model

data{
    int R[9930];
    int E[9930];
    vector[7] alpha;
}
parameters{
    ordered[6] kappa;
    real bE;
    simplex[7] delta;
    
   }

transformed parameters {
  vector[9930] delta_sum; 
  vector[8] delta_j = append_row(0, delta);
  vector[9930] phi;
  for ( i in 1:9930 ) {
  delta_sum[i] = sum(delta_j[1:E[i]]);
    }
  phi=bE*delta_sum;
  
}
model{
    delta ~ dirichlet( alpha );
    bE ~ normal( 0 , 1 );
    kappa ~ normal( 0 , 1.5 );
    R ~ ordered_logistic( phi , kappa );
}

I run the model with this code.

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
parallel:::setDefaultClusterOptions(setup_strategy = "sequential")
fit<-stan("model.stan", data=dat)

The model run well and with no issues. However I found that some delta_sum are NaN values. These NaN values all occur in situations where E=1. Why is that? What am I doing wrong?

I have a second question. Suppose I have reason to believe the association between R and E might change depending on the participant’s religion/country/sate, etc. How could/should I add a varying effects for religion/country/sate?

Thanks

I’m attaching the data in case it helps: data.csv (38.8 KB)

fsdias · July 24, 2020, 2:41pm

Does anyone have any idea about what I’m be doing wrong? I’ve been looking at this for a sometime and I can’t figure it out.
Thanks

Ara_Winter · July 27, 2020, 9:48pm

I will take a look at this. I grab your data and will get this going in Rstudio to see what’s going on. Hopefully some more eyes will get on this.

fsdias · July 28, 2020, 9:24am

Thanks. These are some of the the NaN I was referring to.

print(fit,pars=c("delta_sum[3961]","delta_sum[3962]"))
Inference for Stan model: model.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
delta_sum[3961]    0     NaN  0    0   0   0   0     0   NaN  NaN
delta_sum[3962]    0     NaN  0    0   0   0   0     0   NaN  NaN

Samples were drawn using NUTS(diag_e) at Tue Jul 28 10:13:28 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

fsdias · July 30, 2020, 2:07pm

After looking at this some more, I think it makes sense “delta_sum” is zero when E=1, because when E=8 delta_sum=1

Inference for Stan model: model.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

               mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
delta_sum[601]    1       0  0    1   1   1   1     1    85    1

Samples were drawn using NUTS(diag_e) at Thu Jul 30 14:55:59 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

I also went back to “Statistical Rethinking” and noticed that McElreath wrote that “the first level (E=1) will be absorbed into the intercept” which seems to support the above conclusion.

I still have one more question, which is how do I turn this into a hierarchical mode (see first post).

Question to the moderators: should I open a separate question for this?

Thanks

Topic		Replies	Views
Graded response model in Stan with varying item response categories Modeling rstan	3	859	March 9, 2021
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2837	June 12, 2020
Representing ordered categorical predictors Modeling rstan , specification	1	388	August 22, 2022
Using categorical predictors Modeling	6	1629	June 19, 2020
How to deal with NaN for the outcome variable/dependent variables Modeling techniques , fitting-issues , specification	5	578	March 28, 2023

Model with an ordered categorical predictor - NaN found and how do I add varying effects?

Related topics