Marginal posteriors for additive Bayesian networks (ABNs)

Hello all,

In order to do parametric bootstrapping for ABNs, I need to perform Bayesian regression on given data and then construct marginal posteriors of the parameters, which I will use to simulate new data for the network. Considering the simple network A → B, where both variables are normally distributed, we have five parameters: beta_A0, beta_B0, beta_B1, sigmaA, and sigmaB, which gives
A ~ N(beta_A0, sigmaA) and
B ~ N(beta_B0 + beta_B1 * A, sigmaB)

I implemented the regression as follows:

data {
int N;
vector[N] B;
vector[N] A;
}
parameters {
real betaA0; //Intercept of A
real sigmaA; //Standard deviation of A
real betaB0; //Intercept of B
real betaB1; //Slope of B
real sigmaB; //Standard deviation of B
}
model {
betaA0 ~ cauchy(0,10); //prior for the intercept following Gelman 2008
betaB0 ~ cauchy(0,10); //prior for the intercept following Gelman 2008
betaB1 ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
A ~ normal(betaA0, sigmaA);
B ~ normal(betaB0 + betaB1 * A, sigmaB);
}

Now, I’m quite confused about the construction of the marginal posteriors. Do I need to apply a marginalization forumula as it is done in the “Latent discrete parameters” section in the manual or can I just use something like

categorical_rng(softmax(betaA0))

Any help or pointing in the right direction would be greatly appreciated.

Best Marco

(edit: uploaded incomplete topic by accident)

I’m probably misunderstanding what you actually want, but to get the marginal posterior distributions of the parameters in your Stan model you just need to extract the posterior draws of the variables declared in the parameters block of your Stan program. For example with rstan you could use the as.matrix function. If you have a matrix of posterior draws with one column per parameter then the marginal posterior (approximated by MCMC) for a given parameter is just that parameter’s column of the matrix.

But I may have misinterpreted your question. (My hunch is that you were trying to ask about something else, although if not that’s fine too!)

Thank you for the quick answer. The info that the posterior draws already are the marginal posteriors actually resolved a big part of my confusion.
I put the draws in a matrix params (with column order beta_A0, sigmaA, beta_B0, beta_B1, sigmaB) and want to simulate now
A ~ N(beta_A0, sigmaA) and
B ~ N(beta_B0 + beta_B1 * A, sigmaB)
with the Stan code:

data {
int N;
int K;
matrix[N, K] params;
}
model {}
generated quantities {
int distdraws[K];
real sp[K];
real A;
real B;
for (i in 1:K) {
distdraws[i] = categorical_rng(softmax(params[, i]));
}
for (i in 1:K) {
sp[i] = params[distdraws[i], i];
}
A = normal_rng(sp[1], sp[2]);
B = normal_rng(sp[3] + sp[4] * A, sp[5]);
}

Do you think that makes sense?

If it helps (otherwise ignore this part), I have BUGS model as a guideline where A is simulated as follows:

a ~ dnorm(mu.a, prec.a);
mu.a ← a.C0;

a.M0 ~ dcat(a.p[,2]);
a.C0 ← a.p[a.M0,1];

where a.p contains the posterior draws, however in 2 columns x and f(x), which I don’t quite understand.

Thanks again for helping.

I haven’t dealt much with ABNs, so I could still be misinterpreting what you are actually after here, but if you really just need to simulate new data based on the posterior distribution of the estimated parameters you can actually do this within the same Stan program you use to fit the model.

The generated quantities block of a Stan program (optional, but if you have one it has to go last after the model block) gives you access to the posterior draws of the parameters at each iteration and lets you use them to create simulations. To take a super simple example for demonstration purposes, if you just had a simple linear regression

data {
  int N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  // ... priors for alpha, beta, sigma

  y ~ normal(alpha + beta * x, sigma);
}

then if you wanted to simulate some new data using the posterior draws of the parameters you could add this at the end of the Stan program:

generated quantities {
  vector[N] y_sim;
  for (n in 1:N) {
   y_sim[N] = normal_rng(alpha + beta * x, sigma);
 }
}

It seems like you should be able to apply this approach to your problem.

Oh and one other thing before I forget. In the Stan program in your original post you need to add <lower=0> to the declarations of sigmaA and sigmaB (like sigma in my linear regression example).

I think you’re interpreting everything correctly and your advice sounds absolutely reasonable. I guess what confuses me is the way this is programmed in the JAGS/BUGS guideline code with the dcat() function. Maybe this is due to the fact that the posteriors have been computed in a separate function or it may just be in the nature of BUGS implementation…

Is there a categorical variable here? That’s what it looks like from the
bugs code. But from your original question it looks like everything is
actually continuous.

Yes A and B are continuous (gaussian), however technically, a.p[,2] is non continuous (a vector containing posterior draws). Maybe dcat() is needed so that the distribution of the draws can be assigned to a.M0.

edit: By the way, I implemented the generated quantities block without the loop, otherwise Stan gives me back N columns of simulations (A[1], A[2], …, A[N])

data {
int N;
vector[N] B;
vector[N] A;
}
parameters {
real betaA0; //Intercept of A
real sigmaA; //Standard deviation of A
real betaB0; //Intercept of B
real betaB1; //Slope of B
real sigmaB; //Standard deviation of B
}
model {
betaA0 ~ cauchy(0,10); //prior for the intercept following Gelman 2008
betaB0 ~ cauchy(0,10); //prior for the intercept following Gelman 2008
betaB1 ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
A ~ normal(betaA0, sigmaA);
B ~ normal(betaB0 + betaB1 * A, sigmaB);
}
generated quantities {
real Asim;
real Bsim;
Asim = normal_rng(betaA0, sigmaA);
Bsim = normal_rng(betaB0 + betaB1 * Asim, sigmaB);
}

Does that still make use of the posterior distributions properly?

Right, if you do it in a loop from 1:N then for every iteration of the Markov chain you will get a simulation of the entire A vector (which has N elements). The resulting matrix will have dimensions S x N where S is the number of simulations (post-warmup iterations) and N is the number of data points in A. So each row is a simulation of the whole N-vector A.

The way you now have the generated quantities block it simulates a single value for A (instead of the whole A vector) at each iteration of the Markov chain.

Both ways give you one simulation per iteration (i.e. they both result in the same number of rows when you extract as a matrix), so they both account for the posterior uncertainty.

1 Like

Yes great, that makes sense. If I may ask one last question, I tried to apply the same procedure to the same network, however with A binomially distributed instead of gaussian.

data{
  int N;
  int A[N];
  vector[N] B;
}
parameters {
  real betaA0; //Intercept of A
  real betaB0; //Intercept of B
  real betaB1; //Slope of B
  real<lower=0> sigmaB; //Standard deviation of B
}
model {
  betaA0 ~ cauchy(0,10); //prior for the intercept following Gelman 2008
  betaB0 ~ cauchy(0,10); //prior for the intercept following Gelman 2008
  betaB1 ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
  A ~ bernoulli_logit(betaA0);
  B ~ normal(betaB0 + betaB1 * A, sigmaB);
}
generated quantities {
  int Asim;
  real Bsim;
  Asim = bernoulli_rng(1/(1 + exp(betaA0)));
  Bsim = normal_rng(betaB0 + betaB1 * Asim, sigmaB);
}

Unfortunately, I’m multiplying a real value and an integer when modelling B, which gives the error:

No matches for: 
  real * int[]

Is there any way around that?

You can multiply a real number by an integer but not by a whole array of
integers at once. One way to get around that is to do the multiplication in
a loop. At the top of the model block you could use a temporary variable
and do something like this:

vector[N] b_mean;
for (n in 1:N) {
 b_mean[n] = betaB0 + betaB1 * A[n];
}

and then use b_mean inside of normal() later in the model block. b_mean
won’t get saved in the output but it will just be used in the model block.

1 Like

Thank you very much, Jonah. Great support.

Glad you’re giving Stan a try. Happy to help.

One more thing I overlooked before: these Cauchy priors are probably heavier tailed than what we (including @andrewgelman) would recommend these days. We have a wiki with a collection of tips about priors (from Andrew and others) that might be useful:

http://andrewgelman.com/2017/04/28/prior-choice-recommendations-wiki/

Oh okay, I’ll have a look. Thanks.

The posterior draws are joint posteriors. If you take subsets of the parameters, the draws are from the joint marginal posteriors. In the limit, if you just look at one parameter, it’s from the marginal posterior for that parameter.

int b[N];
real a;
...
vector[K] ab = a * to_vector(b);
1 Like

Yeah this bit is what I was attempting to convey.

Thanks for the reminder aboutto_vector as an alternative to the loop. I keep forgetting we have that.