Questions on RCT example

I have a few questions on this RCT example by Joon-Ho Lee

https://mc-stan.org/users/documentation/case-studies/model-based_causal_inference_for_RCT.html

Typos (in case someone wants to fix them)

  • It mentions half-cauchy when the code actually uses a half normal for sigma
  • Change “Model 2” to “Model 3”
  • Model 1: text and R print report different SD (0.48)

Questions

  • rho 0 or 1 are modelled, but why not a rho in between, what does it actually mean when we say Y1 and Y0 are correlated? how would we estimate/derive a good prior for rho?
  • tau_fs has a smaller variance despite normal_rng noise, why?
  • for the counterfactual y0 and y1, should we set a lower bound of 0? As earnings cannot be negative.
  • in section 3.2.2, I don’t get the interaction part for the treatment, why do we need it?

I’m not one of the authors so I can’t speak for them (possibly @Avi_Feller can chime in). My thoughts on your questions:

rho 0 or 1 are modelled, but why not a rho in between, what does it actually mean when we say Y1 and Y0 are correlated? how would we estimate/derive a good prior for rho?

I’m not sure I understand what you mean by rho 0/1 or a rho in between, they have a rho to indicate the correlation between errors in the outcome and potential outcome. They point to this reference Causal Inference: A Missing Data Perspective p. 22. There’s some more info about this type of correlated outcomes at Chapter 1 Fundamental Problem of Causal Inference | Statistical Tools for Causal Inference. In the potential outcomes framework we have person i's (continuous) outcome, y_t under treatment t being either 0/1 for not treated/treated. If there’s a positive correlation between these two states then it says low/high values in the not treated state suggest low/high values in the treated state. For example, say we want to know if wealth increases after randomly assigning people a course in financial management. Well, people who have high wealth prior will probably have high wealth after.

As for a prior on rho, I think an uninformative prior centered at 0 makes sense, unless you have prior information about the correlation.

tau_fs has a smaller variance despite normal_rng noise, why?

tau_fs is a sample average and the sd will continue to decrease with increasing sample size. We can derive an estimate for this quantity by using the fact that the variance of difference of two independent normal variates is \sigma_1^2 + \sigma_2^2 and the sd of the average is \sigma_{E(y_1 - y_0)} = \sqrt{\frac{\sigma_1^2 + \sigma_2^2}{N}}. In fact, we’ll also get more certain of the super population \tau with increasing sample size. So back to your original question of why would it be smaller? Well, I think the intuition there is that in the model block we only have info on the outcomes we observe but in the generated quantities block we condition on the effect and then draw the potential outcomes. This conditioning reduces the variance of the finite sample effect.

for the counterfactual y0 and y1, should we set a lower bound of 0? As earnings cannot be negative.

Probably.

in section 3.2.2, I don’t get the interaction part for the treatment, why do we need it?

They say, “Instead of imposing restrictions that the effects of X_i are the same for both potential outcomes, we define two different vectors of the slope coefficients \beta_c and \beta_t for the control and treated units repectively. The difference in the two vectors, \beta_t−\beta_c, can be obtained by including an interaction term between X and W in the model”

1 Like

On your question

tau_fs has a smaller variance despite normal_rng noise, why?

In the generated quantities block they are calculating the potential outcome using the _rng but the known outcome is not put into that function. It’s as if we have doubled our sample size.

In the case study they use N = 500 with N_T = 200 and N_C = 300. The super population standard deviation can be estimated using the estimated sample standard deviations as

sqrt(0.984^2/300 + 1.021^2/200)
[1] 0.09186798

using the fact of the difference of two normal RVs.

The finite sample standard deviation uses the potential outcomes framework and is thus similar to

> sqrt(0.984^2/500 + 1.021^2/500)
[1] 0.06341446

which lines up closely with what they have.