I’m pretty new to Stan. I was checking out a Stan model for modeling NCAA basketball performance on this blog post (http://doogan.us/nj/NCAA2018.html) which seemed pretty cool. I got it to work with NCAA basketball data from 2018, but when I try to run the same model on the 2019 data, Stan crashes my instance of R. It crashes pretty hard…like R shuts down entirely with no message or anything.
If anyone has any feedback, I’d greatly appreciate it…I need this model to win my pool!
Here’s R the code that runs with the 2018 data:
library(tidyverse)
colnames <- c('days', 'date', 'team1', 'home1', 'score1', 'team2', 'home2', 'score2')
teams_2018 <- read_csv('https://www.masseyratings.com/scores.php?s=298892&sub=11590&all=1&mode=3&format=2', col_names = c('id', 'name'))
games_2018 <- read_csv('https://www.masseyratings.com/scores.php?s=298892&sub=11590&all=1&mode=3&format=1', col_names = colnames)
games_2018 <- games_2018 %>%
mutate(margin=score1-score2)
sd1 <- list(N= nrow(games_2018),
y= games_2018$margin,
h_i= ifelse(games_2018$home1==1,1,0),
h_j= ifelse(games_2018$home2==1,1,0),
team_i= games_2018$team1,
team_j= games_2018$team2,
N_g= nrow(teams_2018))
require(rstan)
team.fit.2018 <- stan(file='ncaa_homecourt.stan',
iter=800, warmup=500,
data=sd1, refresh=100, chains=4, cores=1, verbose = TRUE)
Here’s the R code that fails with the 2019 data:
teams_2019 <- read_csv('https://www.masseyratings.com/scores.php?s=305972&sub=11590&all=1&mode=3&format=2', col_names = c('id', 'name'))
games_2019 <- read_csv('https://www.masseyratings.com/scores.php?s=305972&sub=11590&all=1&mode=3&format=1', col_names = colnames)
games_2019 <- games_2019 %>%
mutate(margin=score1-score2)
sd2 <- list(N= nrow(games_2019),
y= games_2019$margin,
h_i= ifelse(games_2019$home1==1,1,0),
h_j= ifelse(games_2019$home2==1,1,0),
team_i= games_2019$team1,
team_j= games_2019$team2,
N_g= nrow(teams_2019))
require(rstan)
team.fit.2019 <- stan(file='ncaa_homecourt.stan',
iter=800, warmup=500,
data=sd2, refresh=100, chains=4, cores=1, verbose = TRUE)
and here’s the contents of my ncaa_homecourt.stan file
data{
int N;
vector[N] y;
int team_i[N];
int team_j[N];
int h_i[N];
int h_j[N];
int N_g;
}
parameters{
vector[N_g] alpha_raw;
vector[N_g] theta_raw;
real eta;
real<lower=0> tau_theta;
real<lower=0> tau_alpha;
real<lower=0> sigma;
}
transformed parameters{
vector[N_g] alpha;
vector[N_g] theta;
alpha = eta + alpha_raw*tau_alpha;
theta = theta_raw*tau_theta;
}
model{
// vector for conditional mean storage
vector[N] mu;
// priors
tau_theta ~ cauchy(0,1)T[0,];
tau_alpha ~ cauchy(0,.25)T[0,];
sigma ~ cauchy(0,1)T[0,];
eta ~ normal(4,1);
theta_raw ~ normal(0,1);
alpha_raw ~ normal(0,1);
// define mu for the Gaussian
for( t in 1:N ) {
mu[t] = (theta[team_i[t]] + alpha[team_i[t]]*h_i[t]) -
(theta[team_j[t]] + alpha[team_j[t]]*h_j[t]);
}
// the likelihood
y ~ normal(mu,sigma);
}
I’m running R version 3.5.2
I’m running RStudio version Version 1.1.463
I’m on a windows machine (Windows 10)
I’m running Stan version 2.18.1
Thanks for any help!