Brms : How to refer to model parameters in the "generated quantities" of the Stan code of a brms model?

I’m trying to show that the simultaneous analysis of intermediate and final outcomes of a randomized trialis necessary to assess the global effect of the drug under trial.

Highly idealized situation. a drug Med is supposed to have a large effect protecting against the occurrence of of arterial hypertension HTA , which highly eases the occurence of a myocardial infarction IDM :

CrobardHTA

Highly idealized (caricatural) data

   HTA   Med NIDM NSain
1 FALSE  FALSE    1    19
2  TRUE  FALSE   12    28
3 FALSE  TRUE     3    37
4  TRUE  TRUE     8    12

It is easier to work to wotrk on “Long format” dataframes (i. e. one row per invidual observation):

LongData <-
structure(list(Sexe = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("F", "M"), class = "factor"), HTA = c(FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE), Med = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE), IDM = c(FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE)), row.names = c(NA, -120L), class = "data.frame")

A few verifications show that :

  • The drug is indeed protective against HTA onset :
> summary(glm(HTA~Med, data=LongData, family=binomial))

Call:
glm(formula = HTA ~ Med, family = binomial, data = LongData)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4823  -0.9005   0.0000   0.9005   1.4823  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.6931     0.2739   2.531 0.011373 *  
MedTRUE      -1.3863     0.3873  -3.579 0.000344 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 166.36  on 119  degrees of freedom
Residual deviance: 152.76  on 118  degrees of freedom
AIC: 156.76

Number of Fisher Scoring iterations: 4
  • The HTA is indeed prgnostic of MI :
> summary(glm(IDM~HTA, data=LongData, family=binomial))

Call:
glm(formula = IDM ~ HTA, family = binomial, data = LongData)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9005  -0.9005  -0.3715  -0.3715   2.3272  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.6391     0.5175  -5.099 3.41e-07 ***
HTATRUE       1.9459     0.5855   3.323  0.00089 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 120.10  on 119  degrees of freedom
Residual deviance: 105.77  on 118  degrees of freedom
AIC: 109.77

Number of Fisher Scoring iterations: 5
  • The experiment is unable to show a “significant” protective effect of the drug againts MI :
> summary(glm(IDM~Med, data=LongData, family=binomial))

Call:
glm(formula = IDM ~ Med, family = binomial, data = LongData)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.6988  -0.6988  -0.6364  -0.6364   1.8420  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.2852     0.3134  -4.101 4.11e-05 ***
MedTRUE      -0.2087     0.4577  -0.456    0.648    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 120.10  on 119  degrees of freedom
Residual deviance: 119.89  on 118  degrees of freedom
AIC: 123.89

Number of Fisher Scoring iterations: 4

A misguided attempt to adjust on the HTA shows a (non-“significant”) deleterious effect of the drug on MI :

> summary(glm(IDM~Med+HTA, data=LongData, family=binomial))

Call:
glm(formula = IDM ~ Med + HTA, family = binomial, data = LongData)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0103  -0.8448  -0.3952  -0.3194   2.4500  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.9503     0.6455  -4.571 4.86e-06 ***
MedTRUE       0.4400     0.5157   0.853 0.393565    
HTATRUE       2.1037     0.6190   3.398 0.000678 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 120.10  on 119  degrees of freedom
Residual deviance: 105.04  on 117  degrees of freedom
AIC: 111.04

Number of Fisher Scoring iterations: 5

The problem is, of course, that these analyses can only show the direct effect of the drug on MI ; the protective effect against HTA, which is prognostic of MI, doesn’t appear. Time for mediation analysis…

The direct effect of the drug on MI is represented by the “MedTRUE” coefficient of the last analysis, whose mean is 0.4400 ; the effect mediated by HTA should be the product of the effect of Med on HTA, (i. e. the coefficient MedTRUE of the first analysis, of mean -1.3863) multiplied by the effect of HTA on MI (i. e. the coefficient HTATrue of the last analysis, of mean 2.1037). The product of means is about -2.92 which would indicate a strong protective effect against MI ; but, of course, this indication is false (the mean of the product is not the product of the means, and its disribution has but remote links with the distributions of its terms…).

There comes the idea of building a simultaneous model of HTA and IDM. A piece of cake with brms:

B1 <- brm(
  bf(HTA~Med, family=bernoulli) +
  bf(IDM~HTA+Med, family=bernoulli),
  data=LongData,
  save_all_pars=TRUE, sample_prior="yes",
  prior=prior(student_t(3, 0, 10), class="b"))

This runs OK :

> B1
 Family: MV(bernoulli, bernoulli) 
  Links: mu = logit
         mu = logit 
Formula: HTA ~ Med 
         IDM ~ HTA + Med 
   Data: LongData (Number of observations: 120) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
HTA_Intercept     0.71      0.28     0.16     1.27       5195 1.00
IDM_Intercept    -3.08      0.66    -4.46    -1.87       3618 1.00
HTA_MedTRUE      -1.42      0.40    -2.20    -0.67       5041 1.00
IDM_HTATRUE       2.20      0.64     1.05     3.51       3795 1.00
IDM_MedTRUE       0.44      0.54    -0.58     1.47       4920 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

But I still have to loop through the samples to re-estimate the indirect (mediated) effect, the total effect and proportion mediated.

That’s where I want to add the relevant code in the “generated quantities” paragraph of the stan code :

B2 <- brm(
  bf(HTA~Med, family=bernoulli) +
  bf(IDM~HTA+Med, family=bernoulli),
  data=LongData,
  save_all_pars=TRUE, sample_prior="yes",
  prior=prior(student_t(3, 0, 10), class="b"),
  stanvars=stanvar(scode="real DE; real IE; real TE; real PM;
                          DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
                          TE=DE+IE; PM=IE/TE;",
                   block="genquant"))

This fails to compile :

> B2 <- brm(
+   bf(HTA~Med, family=bernoulli) +
+   bf(IDM~HTA+Med, family=bernoulli),
+   data=LongData,
+   save_all_pars=TRUE, sample_prior="yes",
+   prior=prior(student_t(3, 0, 10), class="b"),
+   stanvars=stanvar(scode="real DE; real IE; real TE; real PM;
+                           DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
+                           TE=DE+IE; PM=IE/TE;",
+                    block="genquant"))
Setting 'rescor' to FALSE by default for this model
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "b_IDM_MedTRUE" does not exist.
  error in 'model724b4d97b3db_file724b2b2a572c' at line 65, column 42
  -------------------------------------------------
    63:   real prior_b_IDM = student_t_rng(3,0,10);
    64: real DE; real IE; real TE; real PM;
    65:                           DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
                                                 ^
    66:                           TE=DE+IE; PM=IE/TE;
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file724b2b2a572c' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "b_IDM_MedTRUE" does not exist.
  error in 'model724b629402e3_file724b2b2a572c' at line 65, column 42
  -------------------------------------------------
    63:   real prior_b_IDM = student_t_rng(3,0,10);
    64: real DE; real IE; real TE; real PM;
    65:                           DE=b_IDM_MedTRUE; IE=b_HTA_MedTRUE*b_IDM_HTATRUE;
                                                 ^
    66:                           TE=DE+IE; PM=IE/TE;
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file724b2b2a572c' due to the above error.
> 

I also tried to use the “final” names of the coefficients (i. e. without the b_ prefix), with the same result.

The question is therefore : how to refer to the regression parameters in the “generated quantities” paragaph ?

Please use the “Interfaces - brms” tag for brms related questions. I changed it manually this time.

I would recommend not doing stuff in the generated quantities manually, but to do the relevant computations in R after fitting the model. All you have to do it extract the posterior samples via posterior_samples and then work on those.

The reason, your approach is not working is because in the Stan code the regression coefficients are all combined in a vector named “b”. You can see this by extracting the stan code via make_stancode.

Okay : if I follow you correctly, the names associated to the parameters are unknown to the Stan code, so there is no “general” way to use them un stanvars().

Another way to get what I wish: is there a way to “reintegrate” the results of the post-fit computations in the brmsfit object ? This would allow to use the multitude of methods available for such objects, such as the nice plots for effects…

Can you explain what you mean be “reintegrate”?

Add them to the model as if they were specified in the first place (i. e. treating them as if they were model parameters). For example, one could ask for their distribution as one asks for the posterior distribution of a parameter.

Tha’ts what Stan does with the generated quantities which are, post-processing, indistinguable from model parameters.

No options for this at the moment and adding them would require working on the internals of the stanfit objects, which is currently super painful. If you can avoid it, I would try another approach to working with quantities generated after model fitting.

That’s what I feared…

But that’s your penalty of creating a good working environment : once one is acclimatized to it, one wants to use it for all things close to it, even if they don’t belong to it. And emits unreasonable requests to you, the poor creator of this environment…

“No good omen will go unpunished”…

Thanks for the compliment! I hope we will improve stanfit objects in rstan 3.0 so that they become easier to work with.