Help understanding stan_glm() commands

Longshot408 · December 6, 2019, 10:40pm

Evening all,
I’m hoping I can get some help understanding a few things in the stan_glm() command so that I can finalize my models. Any pointers would be appreciated. I spent a bunch of time digging through this article on priors, a another rstan article on priors and their options, and the stan_glm() info page, but I’m still unclear on a few things. Some of the language/terms are just over my head right now…I’m very new to this.

Specifically, could anyone help explain what the following do?

X
Y
Prior_PD
QR
Sparse

I’m also not clear on exactly what autoscale’s function is, but running a model with and without it produced almost exactly the same results, so I figured as long as I’m specifying priors myself its ok to turn it off.

The bottom piece of code is a test model where I’ve slowly been adding in bits as I figure them out. The top code is my current model (copy-pasted from a vignette), and works just fine. But I don’t want to rely on the package defaults as a crutch.

Info on variables to aid model interpretation: Discount is a 3-level categorical variable representing the percentage reduction between a threatened trial sentence and the plea bargain they have been offered (20%, 50%, and 70%). PTS is the Potential Trial Sentence that the defendant is threatened with and has two levels only (5 years; 25 years). The IV is whether or not the plea deal was accepted or rejected.

#Current model
Discountmodel <- stan_glm(
  Accept_Reject ~ Discount + PTS, 
  data = pubdata, 
  family = binomial(link = "logit"), 
  prior_intercept = NULL,
  cores=3,
  QR = FALSE,
  chains = 3, iter = 50000,
  diagnostic_file=file.path(tempdir(), "df.csv"))

#model for test runs
testmodel1=stan_glm(Accept_Reject~Discount + PTS, 
        family = binomial(link = "logit"), 
        data=pubdata, 
        #x = FALSE,
        #y = TRUE, 
        prior = normal(location = 1.1,scale = 2.5,autoscale = FALSE),
        #prior_intercept = normal(), 
        #prior_PD = FALSE, 
        algorithm = c("sampling"), 
        mean_PPD = TRUE,
        adapt_delta = 0.95, 
        #QR = FALSE, 
        #sparse = FALSE,
        chains=3,iter=1000,cores=3)

martinmodrak · December 9, 2019, 12:54pm

All of the parameters are for semi-advanced uses so you shouldn’t be worried too much about using their defaults. X and Y only add additional output (fields to the stanfit objects) - if you are not missing any output, you are good. Prior_PD is useful for prior predictive checks, see the Visualisation paper if you want more, but it is just a good procedure to choose priors, so it can be ignored for simple use cases.

QR is useful when your model does not converge, Sparse can potentially speed up some models. Once again, if your model converges fine and doesn’t take ages, you are safe to ignore those.

Hope that helps.

Longshot408 · December 9, 2019, 3:56pm

Thanks! It definitely does. But just to clarify, by ignore do you mean “not specify” (i.e. leave them to their defaults), or disable them and forget about them?

martinmodrak · December 9, 2019, 4:49pm

I meant leave them to their defaults. Best of luck.

Topic		Replies	Views
New to Stan, how do input my prior to the glm model General rstan , techniques , specification	1	503	April 27, 2020
Question: Stan Modelspecification for Hierarchical Linear Model (Gelman's Radon example) RStan rstan	13	1058	November 19, 2019
Help with specification of hierarchical priors? Modeling	9	1283	December 4, 2018
Large discrepencies between stan_glm and standard glm Modeling specification	2	394	May 28, 2020
Specifying priors for a matrix Modeling	2	2008	June 6, 2019

Help understanding stan_glm() commands

Related topics