Regression discontinuity with binomial logit model

danielw2904 · August 9, 2018, 3:43pm

Hi All,
I am trying to predict whether the introduction of a paywall changes the anger level of comments on, say, a youtube chanel. We have already run a sentiment analysis and have data of the format:

y = # of angry comments
n = # of total comments
delta = 0 if post was published before paywall, 1 if post was poublished after
distPre = weeks before paywall (introduction is 0)
distPost = weeks after paywall (introduction is 0)

I am attempting to model this using a binomial_logit model and predicting the difference at the introduction of the paywall. I would like to account for possible pre- and post-treatment trends and thus included two variables that indicate how many weeks the current post is away from the introduction of the paywall. I then predict at the point where both distances are 0 and compare the predictive densities. Is this a reasonable approach for causal inference?
I’d be very greatful for any help/ suggestions (also other models?).

Thanks!

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

pewdat <- read.csv("./data.csv")
pewdat$post <- ifelse(as.Date(pewdat$created_at) > as.Date("2016-02-10"), 1, 0)
y <- pewdat$anger
n <- pewdat$total
T <- length(y)
delta <- pewdat$post
#distPost <- c(rep(0, T-sum(delta)),(1:sum(delta))/sum(delta))
#distPre <- c(((T-sum(delta)):1)/(T-sum(delta)), rep(0, sum(delta)))

distPost <- c(rep(0, T-sum(delta)),(1:sum(delta)))
distPre <- c(((T-sum(delta)):1), rep(0, sum(delta)))


mod <- stan(file = "./logbinom2.stan", thin = 5, iter = 20000)

predPre <- as.data.frame(mod, pars = "y_hat_pre")
predPost <- as.data.frame(mod, pars = "y_hat_post")
plot(density(as.matrix(predPre)), col = 'red', xlim = c(0, 1))
lines(density(as.matrix(predPost)), col = 'blue')
legend('topleft', legend = c("pre", "post"), col = c('red', 'blue'), lty = 1)

library(shinystan)
launch_shinystan(mod)

data {
  int<lower=1> T;
  // number of data points
  int y[T];
  // success
  int n[T];
  // total
  vector<lower=0, upper=1>[T] delta;
  // treatment
  vector[T] distPre;
  vector[T] distPost;
}
parameters {
  real beta;
  real alpha;
  real gamma1;
  real gamma2;
}
model {
  beta ~ normal(0, 100);
  alpha ~ normal(0, 100);
  gamma1 ~ normal(0, 100);
  gamma2 ~ normal(0, 100);
  y ~ binomial_logit(n, alpha + delta * beta + distPre * gamma1 + distPost * gamma2);
}
generated quantities {
  real y_hat_pre;
  real y_hat_post;
  y_hat_pre = inv_logit(alpha);
  y_hat_post = inv_logit(alpha + beta);
}

Bob_Carpenter · August 22, 2018, 9:42am

Sorry for the slow response. That model’s reasonable other than for the priors. You’re rarely going to see effects larger than 4 or 5 on the log odds scale, so normal(0, 100) seems unrealistic. We usually want at least weakly informative priors to identify the scale of the result, say normal(0, 5) or thereabouts.

If your predictors aren’t standardized, that’s a big help, too.

I’d be trying to build a hierarchical model here for each of the items in order to estimate the angry comment rate and then I’d try to do some measurements directly on that quantity of interest.

I’m not really sure how to do the causal part of it, which is what you’re after—does the paywall treatment have an effect? I’d suggest the Gelman and Hill book chapters on that.

danielw2904 · August 27, 2018, 4:44pm

No problem. Thank you for your answer!
Sounds good! I’ll tweak the priors.

As for the hierarchical model you mean a common distribution for anger and then separate intercept for each channel?

I’ve read the relevant chapters in Gelman & Hill and they are very informative. Thanks for the suggestion!!

Bob_Carpenter · September 3, 2018, 9:33am

That’d be one approach. If you have predictors per channel, then you can also have varying slopes. There’s another example of that in Gelman and Hill’s discussion of the red-state/blue-state model (with a covariance prior on the multiple random effects).

Topic		Replies	Views
Need Help with Negative Binomial Regression Model please Modeling fitting-issues , specification , interpret-results , count-data	1	434	August 31, 2023
Using bernoulli_logit_lpmf to obtain deviance and waic Modeling	10	1542	January 26, 2020
Truncated model for neg_binomial_2 Modeling	20	1568	June 9, 2017
Continuous Binomial: further questions Modeling	3	323	September 18, 2023
Regression discontinuity with discrete latent variable (advice please) Modeling fitting-issues	1	435	June 13, 2020

Regression discontinuity with binomial logit model

Related Topics