RStan model crashes R when I provide it with new data

I’m pretty new to Stan. I was checking out a Stan model for modeling NCAA basketball performance on this blog post (http://doogan.us/nj/NCAA2018.html) which seemed pretty cool. I got it to work with NCAA basketball data from 2018, but when I try to run the same model on the 2019 data, Stan crashes my instance of R. It crashes pretty hard…like R shuts down entirely with no message or anything.

If anyone has any feedback, I’d greatly appreciate it…I need this model to win my pool!

Here’s R the code that runs with the 2018 data:

library(tidyverse)

colnames <- c('days', 'date', 'team1', 'home1', 'score1', 'team2', 'home2', 'score2')

teams_2018 <- read_csv('https://www.masseyratings.com/scores.php?s=298892&sub=11590&all=1&mode=3&format=2', col_names = c('id', 'name'))
games_2018 <- read_csv('https://www.masseyratings.com/scores.php?s=298892&sub=11590&all=1&mode=3&format=1', col_names = colnames)

games_2018 <- games_2018 %>% 
  mutate(margin=score1-score2)

sd1 <- list(N=       nrow(games_2018),
           y=       games_2018$margin,
           h_i=     ifelse(games_2018$home1==1,1,0),
           h_j=     ifelse(games_2018$home2==1,1,0),
           team_i=  games_2018$team1,
           team_j=  games_2018$team2,
           N_g=     nrow(teams_2018))


require(rstan)
team.fit.2018 <- stan(file='ncaa_homecourt.stan', 
                      iter=800, warmup=500, 
                      data=sd1, refresh=100, chains=4, cores=1, verbose = TRUE)

Here’s the R code that fails with the 2019 data:

teams_2019 <- read_csv('https://www.masseyratings.com/scores.php?s=305972&sub=11590&all=1&mode=3&format=2', col_names = c('id', 'name'))
games_2019 <- read_csv('https://www.masseyratings.com/scores.php?s=305972&sub=11590&all=1&mode=3&format=1', col_names = colnames)

games_2019 <- games_2019 %>% 
  mutate(margin=score1-score2)

sd2 <- list(N=       nrow(games_2019),
           y=       games_2019$margin,
           h_i=     ifelse(games_2019$home1==1,1,0),
           h_j=     ifelse(games_2019$home2==1,1,0),
           team_i=  games_2019$team1,
           team_j=  games_2019$team2,
           N_g=     nrow(teams_2019))

require(rstan)
team.fit.2019 <- stan(file='ncaa_homecourt.stan', 
                      iter=800, warmup=500, 
                      data=sd2, refresh=100, chains=4, cores=1, verbose = TRUE)

and here’s the contents of my ncaa_homecourt.stan file

  data{
    int N;
    vector[N] y;
    int team_i[N];
    int team_j[N];
    int h_i[N];
    int h_j[N];
    int N_g;
  }
  parameters{
    vector[N_g] alpha_raw;
    vector[N_g] theta_raw;
    real eta;
    real<lower=0> tau_theta;
    real<lower=0> tau_alpha;
    real<lower=0> sigma;
  }
  transformed parameters{
    vector[N_g] alpha;
    vector[N_g] theta;
    alpha = eta + alpha_raw*tau_alpha;
    theta = theta_raw*tau_theta;
  }
  model{
    // vector for conditional mean storage
    vector[N] mu;

    // priors
    tau_theta ~ cauchy(0,1)T[0,];
    tau_alpha ~ cauchy(0,.25)T[0,];
    sigma ~ cauchy(0,1)T[0,];
    eta ~ normal(4,1);
    theta_raw ~ normal(0,1);
    alpha_raw ~ normal(0,1);

    // define mu for the Gaussian
    for( t in 1:N ) {
      mu[t] = (theta[team_i[t]] + alpha[team_i[t]]*h_i[t]) - 
              (theta[team_j[t]] + alpha[team_j[t]]*h_j[t]);
    }

    // the likelihood
    y ~ normal(mu,sigma);
  }

I’m running R version 3.5.2
I’m running RStudio version Version 1.1.463
I’m on a windows machine (Windows 10)
I’m running Stan version 2.18.1

Thanks for any help!

For me, the 2019 model does not crash but it does have a handful of divergent transitions.

That’s very interesting. I had a coworker try to run the model on his machine, and he got the same result (ie we can run 2018 but 2019 crashes R). I was even trying out using subsets of the 2019 data, and got even stranger behavior. I can run the first 5 rows of data through the model, but if I try 6 or more, my R session crashes.

Is there any log file I can inspect when R crashes? Right now, I’m running RStudio, and all I see is a message box stating that my R session aborted, with no further info.

Also, would you be able to share that stanfit object with me that was generated when you ran the 2019 data? I don’t exactly know how to facilitate that, but it would sure help me fill out my bracket. I apologize if this is asking too much. Thanks!

I DM’d you with a link to the posterior draws.