Obtaining effect size from “rstanarm” package's linear regression

izeh · March 17, 2018, 6:02pm

In my study a control group © is pretested (pre.c) and post-tested (pos.c). Similarly a treatment group (t) is prettested (pre.t) and post-tested (pos.t). So I have two groups (group factor) tested at two time points (time factor).

I have fit a linear regression using stan_glm() from rstanarm package. I’m wondering though how to interpret the results without any effect sizes reported in the output (see below)?

Is it possible to obtain effect sizes for the main and interaction effects in rstanarm package?

pre.c = c(0.2521539, -0.8839510, 1.0106639, 1.2189900, -2.5187683, 0.1119494, 1.1506801, 1.2056992, 1.7262407, 3.1810580)
pos.c = c(1.5293298, 0.6043102, 0.5317992, 2.4416711, -2.6809858, 0.4981319, 4.0499866, 0.5683203, 4.0376835, 2.6794024)
pre.t = c(-1.0432592, 4.4818748, 3.5557269, 0.5164588, 3.3919210, 3.9045262, -0.6085360, 2.1122688, -0.2043023, 4.6272319)
pos.t = c(6.451760, 10.089001, 8.472512, 5.555241, 8.743465, 9.036619, 7.050737, 9.584346, 5.852163, 8.372967)

data <- data.frame(y = c(pre.c, pos.c, pre.t, pos.t),
time = rep(0:1, 20),
group = rep(c(0, 1), each = 20))

library(rstanarm) #### The R package ####
fit <- stan_glm(y ~ group * time, data = data,
prior_intercept = normal(0, 10),
prior = normal(0, 2.5),
prior_aux = normal(0, 10))
Output:

Estimates:
            mean   sd     2.5%   25%    50%    75%    97.5%

(Intercept) 0.9 0.9 -0.9 0.3 0.9 1.6 2.7
group 3.2 1.3 0.6 2.4 3.2 4.0 5.7
time 0.2 1.3 -2.3 -0.7 0.3 1.1 2.8
group:time 1.4 1.8 -2.3 0.2 1.5 2.6 5.0

bgoodri · March 17, 2018, 7:26pm

It depends on which definition of “effect size” you are using, but in general, the key idea is to do the transformation to each draw from the posterior distribution rather than doing the transformation to the mean or median of the posterior draws. The answer will be something like the one I gave at

and the recommendation still applies to move away from things associated with frequentist estimation (standardization, effect sizing, etc.) to a more Bayesian approach of just graphing how the posterior predictive distribution changes when you manipulate a predictor in a plausible way.

parvi · March 17, 2018, 8:34pm

I’m interested in two effect sizes.

First, proportion of variance (POV) accounted for by each effect, (in this case; for two main effects, and 1 interaction effect with their uncertainties).

Second, standardized mean difference for each group between the two time points (Cohen’s d effect size along with their uncertainties)?

Is it possible to achieve any or both of these two effect sizes for my case, given my data and “rstanarm” code provided above?

I would highly appreciate a demonstration.

bgoodri · March 17, 2018, 11:32pm

For all these “proportion of variance” things, the Bayesian way is to look at the posterior predictive distribution

PPD_v <- posterior_predict(fit)
nd <- model.frame(fit)
nd$x <- 0
PPD_c <- posterior_predict(fit, newdata = nd)
POV <- 1 - apply(PPD_c, 1, var) / apply(PPD_v, 1, var)

Cohen’s d would be something like

nd <- model.frame(fit)
nd$x <- 1
PPD_1 <- posterior_predict(fit, newdata = nd)
nd$x <- 0
PPD_0 <- posterior_predict(fit, newdata = nd)
d <- sweep(PPD_1 - PPD_0, MARGIN = 2, FUN = `\`, STATS = 
                     sweep(posterior_predict(fit), 1, FUN = sd))

parvi · March 18, 2018, 1:39am

Thanks, but which effect does this particular POV you have kindly computed relate to? I see that quantile(POV, c(.025, .975)) has highly negative interval (i.e., -0.8311958 0.4494211) ! ?

Also, which difference does this d, represent?

P.S My machine strangely stop each time I run the last part of d calculation?

avehtari · March 18, 2018, 8:33pm

Do you assume centered x here?

And do you have non-centered x?

parvi · March 18, 2018, 8:37pm

No I have not centered anything?

avehtari · March 20, 2018, 9:14pm

Sorry for not being more explicit. My guess is that

nd$x <- 0

should be, e.g.,

nd$x <- mean(nd$x)

Do you now see what I mean with question on centered vs. non-centered?

parvi · March 20, 2018, 9:18pm

Dear Ben,

Thanks, but which effect does this particular POV you have kindly computed relate to? I see that quantile(POV, c(.025, .975)) has highly negative interval (i.e., -0.8311958 0.4494211) ! ?

Also, which difference does this d, represent?

P.S My machine strangely stop each time I run the last part of d calculation?

parvi · March 20, 2018, 9:19pm

I will refer to Ben who initially provided the R code to clarify. Please ask him so we can both make sure.

bgoodri · March 20, 2018, 9:48pm

It basically means that the POV of this variable is negligible. With a finite number of draws 1 - apply(PPD_c, 1, var) / apply(PPD_v, 1, var) can be negative even though PPD_c should have less variance than PPD_v because the former sets x to a constant (such as its sample mean) whereas in the actual data it has variation.

parvi · March 20, 2018, 9:58pm

To be clear, which effect (mains or interaction) does this particular POV you have kindly computed relate to?

bgoodri · March 20, 2018, 10:39pm

total

parvi · March 20, 2018, 10:54pm

Dear Ben,

Is it possible to know, what proportion of total variance in the dependent variable is accounted for by each main and interaction effect?

Also, if I may, Cohen’ s d that you have shown is for what mean difference?

avehtari · March 22, 2018, 7:22pm

@bgoodri I also tested this code with following where x1 and x2 are relevant (note that I used an example where data is generated using Poisson, but for simplicity for this example I used Gaussian model).

library(rstanarm)
options(mc.cores = parallel::detectCores())
set.seed(87)
df <- data_frame(
  pos.tot = runif(200,min=0.8,max=1.0),
  urban.tot = pmin(runif(200,min=0.0,max=0.02),1.0 - pos.tot),
  neg.tot = (1.0 - pmin(pos.tot + urban.tot,1)),
  x1= pmax(pos.tot - runif(200,min=0.05,max=0.30),0),
  x3= pmax(neg.tot - runif(200,min=0.0,max=0.10),0),
  x2= pmax(pos.tot - x1 - x3/2,0),
  x4= pmax(1 - x1 - x2 - x3 - urban.tot,0))
# true model and 200 Poisson observations
mean.y <- exp(-5.8 + 6.3*df$x1 + 15.2*df$x2)
df$y <- rpois(200,mean.y)

fit <- stan_glm(y ~ x1 + x2 + x3 + x4, data = df, seed=170402030, refresh=0)

PPD_v <- posterior_predict(fit)
nd <- model.frame(fit)
nd$x1 <- 0
PPD_c <- posterior_predict(fit, newdata = nd)
POV <- 1 - apply(PPD_c, 1, var) / apply(PPD_v, 1, var)
quantile(POV, c(.025, .975))

and the result is

      2.5%      97.5% 
-2.0270499  0.1486391

The result is similar for all variables.
So either there is something wrong with the code, or this is not useful for correlated covariates…

mattansb · May 27, 2020, 7:30am

Sorry to bring this topic back to life after a thousand years…

Ben, is there a way to apply the method you suggest for factors? (with out first converting them into dummy variables)

Also, would this alternative be appropriate?

Get PPD
For each sample in the PPD, compute the effect size, resulting in a distribution of effect sizes.
Get HDI / ROPE etc on this effect-size PPD.

(If I’m not mistaken, this is what rstantools::bayes_r2 does?)

bgoodri · May 27, 2020, 3:36pm

Well, rstanarm is going to utilize dummy variables, but the idea of calculating the posterior predictive distribution under various counterfactuals is widely applicable.

mattansb · May 27, 2020, 3:48pm

Thanks Ben!

Regarding:

Right, I just meant that this:

won’t work if x1 is a factor.

E.g.:

library(rstanarm)

fit <- stan_glm(Petal.Length ~ Petal.Width + Species,
                data = iris,
                refresh = 0)

PPD_v <- posterior_predict(fit)
nd <- model.frame(fit)
nd$Species <- factor(0)
PPD_c <- posterior_predict(fit, newdata = nd)
#> Error in model.frame.default(Terms, newdata, xlev = object$xlevels): factor Species has new level 0

bgoodri · May 27, 2020, 5:56pm

It will work if you do nd$Species <- "virginica" or whatever. What are you trying to accomplish by setting it to zero?

mattansb · May 27, 2020, 7:03pm

I’d like to estimate the total variance accounted for by that variable.
Though fixing its value will do this in this case, things get more complicated when interactions are involved.
I’d like to have something akin to a type-3 SS ANOVA table of effect sizes.

This is the only way I can think of that actually works:

If you have any other ideas, I’d love to hear them!

Topic		Replies	Views
How to obtain odds-ratio for random effects from rstanarm results and visualize them? rstanarm rstanarm , logistic-regression	5	1050	September 22, 2021
Using posteriors as new priors Modeling	5	2820	January 22, 2020
Question about hierarchical effects rstanarm	2	776	February 15, 2018
Obtaining Standardized coefficients from “rstanarm” package in R? rstanarm	4	1292	March 20, 2018
Credibility intervals for one group Modeling variational-bayes , bayesplot , stan , rstanarm , survival	6	642	April 30, 2021

Obtaining effect size from “rstanarm” package's linear regression

Related topics