Trying to specify intercept only model

Hello all,
Very simple question on creating an intercept-only model: How do I do it? I would have thought my code below would work, but it is returning an error.


    > InterceptModel <- stan_glm(
+   Accept_Reject ~ 1,
+   data = pubdata, 
+   family = binomial(link = "logit"), 
+   prior_intercept = NULL,
+   cores=3,
+   QR = FALSE,
+   chains = 3, iter = 50000,
+   diagnostic_file=file.path(tempdir(), "df.csv"))

bridge4<-bridge_sampler(InterceptModel)

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input

describe_posterior(InterceptModel)

Error in dat[, -1, drop = FALSE] : incorrect number of dimensions

1 Like

I think stan_glm is working but bridge_sampler is returning an error, but I’m not sure why. Does file.path(tempdir(), "df.csv") exist and have values?

Not sure to be honest, as I don’t even understand that line of code. I got it from your suggestion actually, in this post I made a while ago. I’ve been sticking it in every model I’ve been running so that I can compare all the models with Bayes Factors later with performance::compare_performance() and bayestestR::bayesfactor_models()

So far I’ve copied and pasted that one same line into three other models in my RMarkdown file and all of them work fine; its just this 4th, intercept-only model that breaks.

I can’t replicate this error with

library(rstanarm)
post <- stan_glm(cbind(incidence, size - incidence) ~ 1, data = lme4::cbpp, 
                 family = binomial, refresh = 0, 
                 diagnostic_file=file.path(tempdir(), "df.csv"))
library(bridgesampler)
bridge_post <-bridge_sampler(post)

Can you?

1 Like

Just ran that and no, I couldn’t repro it either.
After running your “post” model and my intercept model though, there does appear to be some differences in the model summaries, before you even look at the bridge sampler…

my model:
summary(InterceptModel)

Model Info:
 function:     stan_glm
 family:       binomial [logit]
 formula:      Accept_Reject ~ 1
 algorithm:    sampling
 sample:       75000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 610
 predictors:   1

Estimates:
          mean   sd   10%   50%   90%
(Intercept) 0.0    0.0  0.0   0.0   0.0  

Fit Diagnostics:
       mean   sd   10%   50%   90%
mean_PPD 0.0    0.0  0.0   0.0   0.0  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
          mcse Rhat n_eff
(Intercept)   NaN  NaN  NaN  
mean_PPD      NaN  NaN  NaN  
log-posterior NaN  NaN  NaN  

vs. your model…
summary(post)

Model Info:
 function:     stan_glm
 family:       binomial [logit]
 formula:      cbind(incidence, size - incidence) ~ 1
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 56
 predictors:   1

Estimates:
              mean   sd   10%   50%   90%
(Intercept) -2.0    0.1 -2.2  -2.0  -1.9 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 1.8    0.2  1.5   1.8   2.1  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.0  1.0  1537 
mean_PPD      0.0  1.0  2285 
log-posterior 0.0  1.0   975

Not sure what all those NaN’s mean in mine. Might they have something to do with it? Did mine not converge or something?

Yours is definitely messed up. What version of rstanarm are you using? Does

post <- stan_glm(switch ~ 1, data = wells, family = binomial, refresh = 0, 
                 diagnostic_file=file.path(tempdir(), "df.csv"))

work?

1 Like

Yep, that code works too:

> post <- stan_glm(switch ~ 1, data = wells, family = binomial, refresh = 0, 
+                  diagnostic_file=file.path(tempdir(), "df.csv"))
> summary(post)

Model Info:
 function:     stan_glm
 family:       binomial [logit]
 formula:      switch ~ 1
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 3020
 predictors:   1

Estimates:
              mean   sd    10%   50%   90%
(Intercept)  -0.2    9.5 -12.5  -0.2  12.0

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.5    0.5  0.0   0.5   1.0  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.2  1.0  1605 
mean_PPD      0.0  1.0  2285 
log-posterior 0.0  1.0  1285 

And btw I’m running rstanarm ver 2.19.2

Running diagnostics on this intercept model produced some interesting results.

AutoCorr_interceptModel=plot(InterceptModel, "acf", pars = "(Intercept)")
> plot(AutoCorr_interceptModel)
Warning messages:
1: Removed 63 rows containing missing values (geom_segment). 
2: Removed 63 rows containing missing values (geom_path). 

> heidel.diag(InterceptModel)
                                          
            Stationarity start     p-value
            test         iteration        
(Intercept) failed       NA        NA     
                                    
            Halfwidth Mean Halfwidth
            test                    
(Intercept) <NA>      NA   NA 

> trace_int <- plot(InterceptModel, "trace", pars = "(Intercept)")
> plot(trace_int)

(upload://8jFutmXqcnnisS1QIqtC9QtZXdw.png)

So running with fake data

InterceptModel <- stan_glm(y ~ 1, data = df, family = binomial(link = "logit"), 
                           prior_intercept = NULL,
                           cores=6,
                           QR = FALSE,
                           chains = 4, iter = 4000, 
                           diagnostic_file=file.path(tempdir(), "df.csv"))
> bridge4<-bridge_sampler(InterceptModel)
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
> bridge4
Bridge sampling estimate of the log marginal likelihood: -67.96922
Estimate obtained in 4 iteration(s) via method "normal".

Not sure if that is going to help.

Nope; copy-paste of this code and replacing “y” with my DV and “df” with my data repro’d the same errors as above

Hmm ok. Let me get back to fiddling with it then.

Oh which bridge_sampler are you using? There is one in the brms package and another in the bridgesampling package.

I’m using the latter

Ok! I am using the brms one. That might be part of the difference.

Before going deeper, did you run it with built in data, just as was posted.

I think the problem may be that you are running an intercept only model with no prior on the intercept (I made some random binomial data and did an intercept only model with prior=NULL and got the same strange output you have.

3 Likes