I’m trying to show that the simultaneous analysis of intermediate and final outcomes of a randomized trialis necessary to assess the global effect of the drug under trial.
Highly idealized situation. a drug Med
is supposed to have a large effect protecting against the occurrence of of arterial hypertension HTA
, which highly eases the occurence of a myocardial infarction IDM
:
Highly idealized (caricatural) data
HTA Med NIDM NSain
1 FALSE FALSE 1 19
2 TRUE FALSE 12 28
3 FALSE TRUE 3 37
4 TRUE TRUE 8 12
It is easier to work to wotrk on “Long format” dataframes (i. e. one row per invidual observation):
LongData <-
structure(list(Sexe = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("F", "M"), class = "factor"), HTA = c(FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE), Med = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), IDM = c(FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE)), row.names = c(NA, -120L), class = "data.frame")
A few verifications show that :
- The drug is indeed protective against HTA onset :
> summary(glm(HTA~Med, data=LongData, family=binomial))
Call:
glm(formula = HTA ~ Med, family = binomial, data = LongData)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4823 -0.9005 0.0000 0.9005 1.4823
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.6931 0.2739 2.531 0.011373 *
MedTRUE -1.3863 0.3873 -3.579 0.000344 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 166.36 on 119 degrees of freedom
Residual deviance: 152.76 on 118 degrees of freedom
AIC: 156.76
Number of Fisher Scoring iterations: 4
- The HTA is indeed prgnostic of MI :
> summary(glm(IDM~HTA, data=LongData, family=binomial))
Call:
glm(formula = IDM ~ HTA, family = binomial, data = LongData)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9005 -0.9005 -0.3715 -0.3715 2.3272
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.6391 0.5175 -5.099 3.41e-07 ***
HTATRUE 1.9459 0.5855 3.323 0.00089 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 120.10 on 119 degrees of freedom
Residual deviance: 105.77 on 118 degrees of freedom
AIC: 109.77
Number of Fisher Scoring iterations: 5
- The experiment is unable to show a “significant” protective effect of the drug againts MI :
> summary(glm(IDM~Med, data=LongData, family=binomial))
Call:
glm(formula = IDM ~ Med, family = binomial, data = LongData)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6988 -0.6988 -0.6364 -0.6364 1.8420
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2852 0.3134 -4.101 4.11e-05 ***
MedTRUE -0.2087 0.4577 -0.456 0.648
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 120.10 on 119 degrees of freedom
Residual deviance: 119.89 on 118 degrees of freedom
AIC: 123.89
Number of Fisher Scoring iterations: 4
A misguided attempt to adjust on the HTA shows a (non-“significant”) deleterious effect of the drug on MI :
> summary(glm(IDM~Med+HTA, data=LongData, family=binomial))
Call:
glm(formula = IDM ~ Med + HTA, family = binomial, data = LongData)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0103 -0.8448 -0.3952 -0.3194 2.4500
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.9503 0.6455 -4.571 4.86e-06 ***
MedTRUE 0.4400 0.5157 0.853 0.393565
HTATRUE 2.1037 0.6190 3.398 0.000678 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 120.10 on 119 degrees of freedom
Residual deviance: 105.04 on 117 degrees of freedom
AIC: 111.04
Number of Fisher Scoring iterations: 5
The problem is, of course, that these analyses can only show the direct effect of the drug on MI ; the protective effect against HTA, which is prognostic of MI, doesn’t appear. Time for mediation analysis…
The direct effect of the drug on MI is represented by the “MedTRUE” coefficient of the last analysis, whose mean is 0.4400 ; the effect mediated by HTA should be the product of the effect of Med on HTA, (i. e. the coefficient MedTRUE of the first analysis, of mean -1.3863) multiplied by the effect of HTA on MI (i. e. the coefficient HTATrue of the last analysis, of mean 2.1037). The product of means is about -2.92 which would indicate a strong protective effect against MI ; but, of course, this indication is false (the mean of the product is not the product of the means, and its disribution has but remote links with the distributions of its terms…).
There comes the idea of building a simultaneous model of HTA and IDM. A piece of cake with brms
:
B1 <- brm(
bf(HTA~Med, family=bernoulli) +
bf(IDM~HTA+Med, family=bernoulli),
data=LongData,
save_all_pars=TRUE, sample_prior="yes",
prior=prior(student_t(3, 0, 10), class="b"))
This runs OK :
> B1
Family: MV(bernoulli, bernoulli)
Links: mu = logit
mu = logit
Formula: HTA ~ Med
IDM ~ HTA + Med
Data: LongData (Number of observations: 120)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
HTA_Intercept 0.71 0.28 0.16 1.27 5195 1.00
IDM_Intercept -3.08 0.66 -4.46 -1.87 3618 1.00
HTA_MedTRUE -1.42 0.40 -2.20 -0.67 5041 1.00
IDM_HTATRUE 2.20 0.64 1.05 3.51 3795 1.00
IDM_MedTRUE 0.44 0.54 -0.58 1.47 4920 1.00
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
But I still have to loop through the samples to re-estimate the indirect (mediated) effect, the total effect and proportion mediated.
That’s where I want to add the relevant code in the “generated quantities” paragraph of the stan code :
B2 <- brm(
bf(HTA~Med, family=bernoulli) +
bf(IDM~HTA+Med, family=bernoulli),
data=LongData,
save_all_pars=TRUE, sample_prior="yes",
prior=prior(student_t(3, 0, 10), class="b"),
stanvars=stanvar(scode="real DE; real IE; real TE; real PM;
DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
TE=DE+IE; PM=IE/TE;",
block="genquant"))
This fails to compile :
> B2 <- brm(
+ bf(HTA~Med, family=bernoulli) +
+ bf(IDM~HTA+Med, family=bernoulli),
+ data=LongData,
+ save_all_pars=TRUE, sample_prior="yes",
+ prior=prior(student_t(3, 0, 10), class="b"),
+ stanvars=stanvar(scode="real DE; real IE; real TE; real PM;
+ DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
+ TE=DE+IE; PM=IE/TE;",
+ block="genquant"))
Setting 'rescor' to FALSE by default for this model
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
variable "b_IDM_MedTRUE" does not exist.
error in 'model724b4d97b3db_file724b2b2a572c' at line 65, column 42
-------------------------------------------------
63: real prior_b_IDM = student_t_rng(3,0,10);
64: real DE; real IE; real TE; real PM;
65: DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
^
66: TE=DE+IE; PM=IE/TE;
-------------------------------------------------
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
failed to parse Stan model 'file724b2b2a572c' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
variable "b_IDM_MedTRUE" does not exist.
error in 'model724b629402e3_file724b2b2a572c' at line 65, column 42
-------------------------------------------------
63: real prior_b_IDM = student_t_rng(3,0,10);
64: real DE; real IE; real TE; real PM;
65: DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
^
66: TE=DE+IE; PM=IE/TE;
-------------------------------------------------
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
failed to parse Stan model 'file724b2b2a572c' due to the above error.
>
I also tried to use the “final” names of the coefficients (i. e. without the b_
prefix), with the same result.
The question is therefore : how to refer to the regression parameters in the “generated quantities” paragaph ?