Stan_betareg intercept coefficient unaffected by prior

Hi, relatively new to stan and rstanarm so not always sure what I’m doing, and have run across a problem now I don’t understand at all.

Basically I have a dataset of proportions (hours/maximum hours) measured weekly over a 23 week period. After finding out about beta regression I figured that would be a proper way to model it. I roughly know what the progression of my response variable would be under normal conditions, and have made an ‘example data set’ based on this. I want to set the priors in such a way that that shape is assumed.

I tried to do this by modelling the ideal model using stan_betareg with weekly informative priors and then using the parameters from this model as priors (the model was a good fit to the data).

The idea is to run a weekly model and predict what the values will be for the rest of the 23 week period, initially assuming it will be similar to the ideal set. However, when modelling e.g. over just the first 4 weeks of the example data set itself with the priors, it seems that the intercept estimate is not affected by the intercept prior at all, and the estimates are way off.

I feel like I must be doing something fundamentally wrong here, but I’m not sure what. Any help would be greatly appreciated.

dat <- as.data.frame(cbind(c(1:23),c((1:23)^2),
c(0.1,0.225,0.35,0.475,0.6,0.7,0.8,0.85,0.9,0.9,0.9,0.9,0.9,0.885,0.87,0.825,0.78,0.66,0.54,0.43,0.32,0.21,0.1))
)
colnames(dat)<-c(“Week”,“Week2”,“Hours”)

m0 <- stan_betareg(Hours ~ Week+Week2,
prior=normal(),
prior_intercept=normal(),
QR=T, data=dat
)
summary(m0,digits=3)

m1 <- stan_betareg(Hours ~ Week+Week2,
prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.05)),
prior_intercept=normal(-2.9,0.001),
prior_phi=normal(400,10), data=dat[1:4,]
)

m2 <- stan_betareg(Hours ~ Week+Week2,
prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
prior_intercept=normal(),
prior_phi=normal(400,10), data=dat[1:4,]
)

coef(m1) #intercept 1.87
coef(m2) #intercept 1.87

Hi,

When comparing models you should try to only vary one component of the model at a time. In this case you’re interested on prior sensitivity on the intercept parameter, so the prior argument should be identical in m1 and m2.

Incorporating this into your example we have,


f <- formula(Hours ~ 1 + Week + Week2)
m1 <- stan_betareg(f,
                   prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
                   prior_intercept=normal(-2.9,0.001),
                   prior_phi=normal(400,10), data=dat[1:4,]
)

m2 <- stan_betareg(f,
                   prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
                   prior_intercept=normal(0,1),
                   prior_phi=normal(400,10), data=dat[1:4,]
)

For coef(m1) we have,

(Intercept)         Week        Week2        (phi) 
 -6.45180710   1.29189305   0.04307996 401.52851097 

And for coef(m2) we have,

(Intercept)         Week        Week2        (phi) 
 -2.87419520   0.85882358  -0.03870345 406.59813396 

Which are fairly different. So there is prior sensitivity to the intercept term (and we would hope so since only 4 observations are contributing to the likelihood in the model).

Also you mentioned,

… and the estimates are way off.

How do you know they’re way off? Do you know the true parameter distributions?

Thanks a lot for your reply. Yeah sorry the difference in scale was a typo, but I don’t think that’s the cause of my problem. After copying and running your code I still get the exact same results (coef(m1)=coef(m2)=-1.87), which makes me think there is something wrong with my version of R or rstanarm or something.

Did you confuse coef(m1) and coef(m2)? Because the coeficient of the intercept of m2 is almost exactly the same as the prior for m1.

I might have worded this wrong. What I meant was that I used the parameters of a model that fitted the data very well (including the first 4 observations) as strong priors for the second model (with just 4 observations). However, the output from this second model is very different from these 4 observations (which it wouldn’t be if it had just used the priors as parameter estimates) , which makes no sense to me.

Did you confuse coef(m1) and coef(m2)? Because the coeficient of the intercept of m2 is almost exactly the same as the prior for m1.

No, those are the correct values.

… which makes me think there is something wrong with my version of R or rstanarm.

If you haven’t already done so, try restarting R/RStudio. If that doesn’t work then try reinstalling rstanarm.

I used the parameters of a model that fitted the data very well (including the first 4 observations) as strong priors for the second model (with just 4 observations). However, the output from this second model is very different from these 4 observations (which it wouldn’t be if it had just used the priors as parameter estimates) , which makes no sense to me.

It’s still unclear to me what you mean by “output from this second model”. Are you referring to parameter estimates? posterior predictions? Also, it’s still unclear to me what Week and Week2 are referring to your data.

I agree that, with such small data and such strong priors, your parameter estimates should be driven by your priors.

Yeah I completely removed and reinstalled everything but still getting the same results.

Sorry I meant the posterior predictions from m2. Week and Week2 are just a measure of time, i.e. logit(µ) = ß0 + ß1 * t + ß2 * t^2 (Where week = t and week2 = t^2).

I have attached a plot that I hope makes my issue clearer. The points are the data, the grey dotted line the model I based the priors on and the black line is the posterior. I don’t understand why at any given point in time the posterior predictions are both higher than the prior and the data.

plotplot

f <- formula(Hours ~ 1 + Week + Week2,data=dat)

m0 <- stan_betareg(Hours ~ Week+Week2,
                   prior=normal(),
                   prior_intercept=normal(),  data=dat)

m1 <- stan_betareg(f,
                   prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
                   prior_intercept=normal(-2.9,0.001),
                   prior_phi=normal(400,10), data=dat[1:4,]
)

m2 <- stan_betareg(f,
                   prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
                   prior_intercept=normal(0,1),
                   prior_phi=normal(400,10), data=dat[1:4,]
)

post <- posterior_predict(m0,dat)
post1 <- posterior_predict(m1,dat)
post2 <- posterior_predict(m2,dat)
ymean0 <- colMeans(post)
ymean1 <- colMeans(post1)
ymean2 <- colMeans(post2)
plot(dat$Hours[1:4],ylim=c(0,1), xlab="Time",col="lightgrey",ylab="Hours") # only used period
# plot(dat$Hours,ylim=c(0,1), xlab="Time",col="lightgrey") # Whole period
points(dat$Hours[1:4],col="black") # used data points
lines(ymean1,lty=1,col="black") # strong intercept prior
lines(ymean2,lty=2,col="brown") # weak intercept prior
lines(ymean0,lty=2,col="grey") #prior

Sorry for the delayed reply. Yes, your posterior predictions (for the model with the strong prior on the intercept) seem odd since (based on the priors you’re specifying) most of the posterior mass is around 0. Below is what I get when I run the code to create your plot (I changed some of the colors/line types). Also, I think you meant to refer to the prior predictive distribution in your plot. You can generate this by using the prior_PD = TRUE argument in your stan_betareg call.

I think the first order of business is to figure out why the version of rstanarm installed on your machine is producing strange results. You mentioned you reinstalled rstanarm, would you mind telling us which version of rstan/rstanarm you’re using with packageVersion("rstanarm"). I’m mentioning @jonah since he might have a better idea as to what’s going on.

library(rstanarm)

dat <- as.data.frame(cbind(c(1:23),c((1:23)^2),
                           c(0.1,0.225,0.35,0.475,0.6,0.7,0.8,0.85,
                             0.9,0.9,0.9,0.9,0.9,0.885,0.87,0.825,
                             0.78,0.66,0.54,0.43,0.32,0.21,0.1))
)
colnames(dat)<-c("Week","Week2","Hours")

f <- formula(Hours ~ 1 + Week + Week2,data=dat)

m0 <- stan_betareg(Hours ~ Week+Week2,
                   prior=normal(),
                   prior_intercept=normal(),  data=dat)

m1 <- stan_betareg(f,
                   prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
                   prior_intercept=normal(-2.9,0.001),
                   prior_phi=normal(400,10), data=dat[1:4,]
)

m2 <- stan_betareg(f,
                   prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
                   prior_intercept=normal(0,1),
                   prior_phi=normal(400,10), data=dat[1:4,]
)

m2pd <- stan_betareg(f,
                   prior=normal(location=c(0.85,-0.036),scale=c(0.1,0.1)),
                   prior_intercept=normal(0,1),
                   prior_phi=normal(400,10), data=dat[1:4,], prior_PD = TRUE
)

pp0 <- m0$fitted.values
pp1 <- m1$fitted.values
pp2 <- m2$fitted.values
plot(0, type = "n" ,ylim = c(0,1), xlim = c(1,4), xlab = "Time", col = "lightgrey", ylab = "Hours")
points(dat$Hours[1:4],col="black") # used data points
lines(pp1, lty=1, col="black") # strong intercept prior
lines(pp2, lty=2, col="black") # weak intercept prior
# lines(ymean0[1:4],lty=2,col="grey") #prior
lines(m2pd$fitted.values, col = "blue")
legend("topleft", c("data", "strong int prior", "weak int prior", "(strong) prior pd"),
       col = c("black", "black", "black", "blue"), pch = c(21,NA,NA,NA),
       lty = c(NA,1,2,1))

@imadmali Did you make that plot using the current version of rstanarm on GitHub or using the latest CRAN release?

@bramre If you can’t reproduce the plot that @imadmali made then can you try installing rstanarm from GitHub by running the following in R

library("devtools") 
install_github("stan-dev/rstanarm", args = "--preclean", build_vignettes = FALSE)

and then see if that makes a difference and let us know. Sorry for the hassle!

@jonah it was made using the current version on GitHub. Here’s what it looks like when I plot the results using the CRAN version. It seems that the assigning different priors on the intercept doesn’t have any effect (the solid/dashed line are almost identical).

That looks like a bug in the CRAN version that is fixed on GitHub.

This has helped, thanks! The plot I initially got was the exact same as @imadmali’s second one, with the solid and dashed line being almost identical. After installing it from GitHub I got the plot from the post before, so I am getting the same results now.

I still don’t understand however that when I set the intercept prior as N(-2.9, 0.001), the posterior mean for the intercept is -4.2? Also, am I wrong in thinking that the prior predicitive distribution simply just samples from the prior distribution? If so shouldn’t e.g. the estimate of the coefficient of m1pd not be near 0 instead of -1.89?

My hunch is that the issue here has to do with the interpretation of the prior_intercept argument. The documentation says

Note: If using a dense representation of the design matrix —i.e., if the sparse argument is left at its default value of FALSE— then the prior distribution for the intercept is set so it applies to the value when all predictors are centered.

So prior_intercept corresponds to y when the predictors in X are at their mean values rather than 0. (And if the predictors already have mean zero then there’s no difference.) It’s probably easy to miss that and we should provide a way of getting around that. So I just updated rstanarm on GitHub to make it possible to do something like following:

d <- data.frame(
  const = rep(1, 20), # can name it whatever you want
  x = rnorm(20),
  y = rnorm(20)
)
fit <- stan_glm(
  y ~ 0 + const + x, # drop intercept with '~ 0' but add constant 'const' variable
  data = d,
  prior = normal(location = c(-10, 5), scale = c(1, .2), autoscale = FALSE),
  prior_PD = TRUE # draw from prior
)
print(fit)


stan_glm
 family:   gaussian [identity]
 formula:  y ~ 0 + const + x
 num. obs: 20
------

Estimates:
      Median MAD_SD
const -10.0    1.0 
x       5.0    0.2 
sigma   0.6    0.6 

That is we can trick rstanarm to treat a constant variable as a regular predictor and therefore specify its prior using the prior argument like the other coefficients instead of the prior_intercept argument. This can be used to avoid having to think about the intercept in terms of the predictors being centered. My example is using stan_glm but it should work with stan_betareg too with a fresh installation from GitHub.

library("devtools") 
install_github("stan-dev/rstanarm", args = "--preclean", build_vignettes = FALSE)

Yeah I missed that, that explains a lot, thanks. I updated rstanarm but when I run your code I get this error when trying to run stan_glm(…):

Error: Constant variable(s) found: const

Any idea what’s going wrong?

Did you reinstall from GitHub again? Until I changed it yesterday you would
get that error because it was checking for any constant variables in the
model matrix. But I changed it to allow a constant variable if that
constant is 1. So even if you installed from GitHub a few days ago you’d
need to install it again. (When we do the next CRAN release this will be
included but for the time being it’s just on GitHub.)

Sorry for the hassle, but I’m glad you brought this to our attention
because it motivated adding this alternate way of specifying a prior on the
intercept, which seems useful.

Let me know if reinstalling from GitHub still doesn’t allow you run that
code.

Here’s the equivalent for the code you were running @bramre,

dat <- as.data.frame(cbind(c(1:23),c((1:23)^2),
                           c(0.1,0.225,0.35,0.475,0.6,0.7,0.8,0.85,
                             0.9,0.9,0.9,0.9,0.9,0.885,0.87,0.825,
                             0.78,0.66,0.54,0.43,0.32,0.21,0.1))
)
colnames(dat)<-c("Week","Week2","Hours")
dat$const <- rep(1, nrow(dat))
f <- formula(Hours ~ 0 + const + Week + Week2,data=dat)
m2pd <- stan_betareg(f,
                     prior=normal(location=c(0,0.85,-0.036),scale=c(1,0.1,0.1)),
                     prior_phi=normal(400,10), data=dat[1:4,], prior_PD = TRUE)
m2pd

As @jonah mentioned it should work once you reinstall rstanarm from GitHub with the steps in this previous response.

Although, I’m not sure if this is what you should be using since the model you’re fitting is performing an intercept transformation. So it’s a bit strange that you’re omitting the transformation with the prior predictive distribution and then using that to compare with your fitted models that have the transformation.

Sorry for my late reply, was busy moving to a new country, ha.

I got it working after uninstalling rstanarm before reinstalling it again, thanks a lot! I played around with it for a bit and I am getting the results I was expecting now, so I’m happy with that.

@imadmali I think I understand what you’re saying, and I agree that technically doesn’t really make sense. Can I just omit the transformation in the first model or am I thinking too simplistic now?

@bramre that should be fine. The important thing is to make sure you’re consistent across your models, especially if you’re making comparisons among them.