Obtaining effect size from “rstanarm” package's linear regression

techniques

#1

In my study a control group © is pretested (pre.c) and post-tested (pos.c). Similarly a treatment group (t) is prettested (pre.t) and post-tested (pos.t). So I have two groups (group factor) tested at two time points (time factor).

I have fit a linear regression using stan_glm() from rstanarm package. I’m wondering though how to interpret the results without any effect sizes reported in the output (see below)?

Is it possible to obtain effect sizes for the main and interaction effects in rstanarm package?

pre.c = c(0.2521539, -0.8839510, 1.0106639, 1.2189900, -2.5187683, 0.1119494, 1.1506801, 1.2056992, 1.7262407, 3.1810580)
pos.c = c(1.5293298, 0.6043102, 0.5317992, 2.4416711, -2.6809858, 0.4981319, 4.0499866, 0.5683203, 4.0376835, 2.6794024)
pre.t = c(-1.0432592, 4.4818748, 3.5557269, 0.5164588, 3.3919210, 3.9045262, -0.6085360, 2.1122688, -0.2043023, 4.6272319)
pos.t = c(6.451760, 10.089001, 8.472512, 5.555241, 8.743465, 9.036619, 7.050737, 9.584346, 5.852163, 8.372967)

data <- data.frame(y = c(pre.c, pos.c, pre.t, pos.t),
time = rep(0:1, 20),
group = rep(c(0, 1), each = 20))

library(rstanarm) #### The R package ####
fit <- stan_glm(y ~ group * time, data = data,
prior_intercept = normal(0, 10),
prior = normal(0, 2.5),
prior_aux = normal(0, 10))
Output:

Estimates:
            mean   sd     2.5%   25%    50%    75%    97.5%

(Intercept) 0.9 0.9 -0.9 0.3 0.9 1.6 2.7
group 3.2 1.3 0.6 2.4 3.2 4.0 5.7
time 0.2 1.3 -2.3 -0.7 0.3 1.1 2.8
group:time 1.4 1.8 -2.3 0.2 1.5 2.6 5.0


#2

It depends on which definition of “effect size” you are using, but in general, the key idea is to do the transformation to each draw from the posterior distribution rather than doing the transformation to the mean or median of the posterior draws. The answer will be something like the one I gave at


and the recommendation still applies to move away from things associated with frequentist estimation (standardization, effect sizing, etc.) to a more Bayesian approach of just graphing how the posterior predictive distribution changes when you manipulate a predictor in a plausible way.


#3

I’m interested in two effect sizes.

First, proportion of variance (POV) accounted for by each effect, (in this case; for two main effects, and 1 interaction effect with their uncertainties).

Second, standardized mean difference for each group between the two time points (Cohen’s d effect size along with their uncertainties)?

Is it possible to achieve any or both of these two effect sizes for my case, given my data and “rstanarm” code provided above?

I would highly appreciate a demonstration.


#4

For all these “proportion of variance” things, the Bayesian way is to look at the posterior predictive distribution

PPD_v <- posterior_predict(fit)
nd <- model.frame(fit)
nd$x <- 0
PPD_c <- posterior_predict(fit, newdata = nd)
POV <- 1 - apply(PPD_c, 1, var) / apply(PPD_v, 1, var)

Cohen’s d would be something like

nd <- model.frame(fit)
nd$x <- 1
PPD_1 <- posterior_predict(fit, newdata = nd)
nd$x <- 0
PPD_0 <- posterior_predict(fit, newdata = nd)
d <- sweep(PPD_1 - PPD_0, MARGIN = 2, FUN = `\`, STATS = 
                     sweep(posterior_predict(fit), 1, FUN = sd)) 

#5

Thanks, but which effect does this particular POV you have kindly computed relate to? I see that quantile(POV, c(.025, .975)) has highly negative interval (i.e., -0.8311958 0.4494211) ! ?

Also, which difference does this d, represent?

P.S My machine strangely stop each time I run the last part of d calculation?


#6

Do you assume centered x here?

And do you have non-centered x?


#7

No I have not centered anything?


#8

Sorry for not being more explicit. My guess is that

nd$x <- 0

should be, e.g.,

nd$x <- mean(nd$x)

Do you now see what I mean with question on centered vs. non-centered?


#9

Dear Ben,

Thanks, but which effect does this particular POV you have kindly computed relate to? I see that quantile(POV, c(.025, .975)) has highly negative interval (i.e., -0.8311958 0.4494211) ! ?

Also, which difference does this d, represent?

P.S My machine strangely stop each time I run the last part of d calculation?


#10

I will refer to Ben who initially provided the R code to clarify. Please ask him so we can both make sure.


#11

It basically means that the POV of this variable is negligible. With a finite number of draws 1 - apply(PPD_c, 1, var) / apply(PPD_v, 1, var) can be negative even though PPD_c should have less variance than PPD_v because the former sets x to a constant (such as its sample mean) whereas in the actual data it has variation.


#12

To be clear, which effect (mains or interaction) does this particular POV you have kindly computed relate to?


#13

total


#14

Dear Ben,

Is it possible to know, what proportion of total variance in the dependent variable is accounted for by each main and interaction effect?

Also, if I may, Cohen’ s d that you have shown is for what mean difference?


#15

@bgoodri I also tested this code with following where x1 and x2 are relevant (note that I used an example where data is generated using Poisson, but for simplicity for this example I used Gaussian model).

library(rstanarm)
options(mc.cores = parallel::detectCores())
set.seed(87)
df <- data_frame(
  pos.tot = runif(200,min=0.8,max=1.0),
  urban.tot = pmin(runif(200,min=0.0,max=0.02),1.0 - pos.tot),
  neg.tot = (1.0 - pmin(pos.tot + urban.tot,1)),
  x1= pmax(pos.tot - runif(200,min=0.05,max=0.30),0),
  x3= pmax(neg.tot - runif(200,min=0.0,max=0.10),0),
  x2= pmax(pos.tot - x1 - x3/2,0),
  x4= pmax(1 - x1 - x2 - x3 - urban.tot,0))
# true model and 200 Poisson observations
mean.y <- exp(-5.8 + 6.3*df$x1 + 15.2*df$x2)
df$y <- rpois(200,mean.y)

fit <- stan_glm(y ~ x1 + x2 + x3 + x4, data = df, seed=170402030, refresh=0)

PPD_v <- posterior_predict(fit)
nd <- model.frame(fit)
nd$x1 <- 0
PPD_c <- posterior_predict(fit, newdata = nd)
POV <- 1 - apply(PPD_c, 1, var) / apply(PPD_v, 1, var)
quantile(POV, c(.025, .975))

and the result is

      2.5%      97.5% 
-2.0270499  0.1486391 

The result is similar for all variables.
So either there is something wrong with the code, or this is not useful for correlated covariates…