Optimization of model with dependent Beta distributions

specification
performance

#1

Hi everybody,

I am currently trying to infer the levels of 5mC and 5hmC in a typical DNA methylation microarray experiment, and I would like to develop a Bayesian hierarchical model that could not only estimate those levels, but also describe the posteriors for the differences among groups, thus leading to a ROPE-based, Bayesian decision making process.

I am a completely newbie when it comes to Bayesian modeling, and I have developed a model that seems to work nicely. Problem is, although the sampling is very fast, I need to fit in the order of more than 400K models at the same time, so I need to push the performance limits in order to achieve a decent compromise between execution time and the flexibility this kind of approach provides.

I will try to define the problem a little bit. Input data comes in the shape of two matrices of the same dimension. Each row represents a probe from the array, and each column a sample. The model also receives a vector containing the group each sample belongs to. My idea was to model the methylation state as a Dirichlet process over a 3-dimension simplex in the form (%no_methylation, %5mC, %5hmC) for each of the groups. That would be the latent methylation values for the problem. The matrices contain values between 0 and 1 indicating the amount of methylation, and they come from different experimental techniques, by which we know that one of the matrices represent the amount of 5mC and the other the sum of 5mC and 5hmC.

I have modeled the observed matrices as Beta distributions depending on the previous Dirichlet generated values. One of the Beta distributions has a mean corresponding to the second component of the simplex, and the other’s mean is the sum of the second and third components. The Beta distributions are parameterized by their mean and sample size, and I have set a prior on the sample size using a Gamma distribution whith shape = 2 and mean = 1000.

This is the model I have developed:

data {
  int <lower = 1> n_probes;
  int <lower = 1> n_samples;
  int <lower = 1> n_groups;
  real <lower = 0, upper = 1> BS[n_probes, n_samples];
  real <lower = 0, upper = 1> OX[n_probes, n_samples];
  int <lower = 1> group_id[n_samples];
  vector <lower = 1>[3] alpha;
  real <lower = 0> nu_shape;
  real <lower = 0> nu_mean;
}

parameters {
  real <lower = 0> nu_minus_one;
  simplex[3] meth_state[n_groups];
}

model {
  vector[n_groups] BS_alpha;
  vector[n_groups] BS_beta;
  vector[n_groups] OX_alpha;
  vector[n_groups] OX_beta;
  vector[n_groups] mc;
  vector[n_groups] hmc;
  real nu;

  nu_minus_one ~ gamma(nu_shape, nu_shape / nu_mean);
  nu = nu_minus_one + 1;

  for (j in 1:n_groups) {
    meth_state[j] ~ dirichlet(alpha);

    mc[j] = meth_state[j][2];
    hmc[j] = meth_state[j][3];
  }

  OX_alpha = mc * nu;
  OX_beta = (1 - mc) * nu;
  BS_alpha = (mc + hmc) * nu;
  BS_beta = (1 - mc - hmc) * nu;

  for (j in 1:n_samples) {
    BS[, j] ~ beta(BS_alpha[group_id[j]], BS_beta[group_id[j]]);
    OX[, j] ~ beta(OX_alpha[group_id[j]], OX_beta[group_id[j]]);
  }
}

Currently I am achieving an approximate average time of 5s for each probe, with the usual 4000 iterations and 4 chains. Model diagnostics seem to be ok, and the simple test cases are working. But I would like to improve the model performance, as any improvement will lead to substantially lower execution times over more than 400K fits. I have tried to vectorize as much as I could, but due to my inexperience in this field, I suspect I might be missing the point completely.

Any hint or help would be much appreciated.

Regards,
Gustavo


#2

The model’s pretty efficient as written.

What is your n_eff when you’re done and what do you need for inference? You can tune at run time by lowering the number of warmup iterations if you’re converging quickly (say down to a few hundred) and see if that works. Then only run as many post-warmup iterations as you need.

I don’t know what your priors are on gamma, but I’d be inclined to do anything like this jointly with more than one trial and use the replications to simulataneously estimate the priors for nu. In any case, setting an appropriately weakly informative prior on nu is important here for efficiency.

Why add one to nu_minus_one (I see that it’ll lower bound nu at 1)? If you just take nu as distributed according to that gamma with <lower = 0>, then do you get a lot of estimates with nu << 1? If so, imposing this constraint can cause efficiency problems if probability mass piles up near the boundary (0 for nu_minus_one).

You can just write

vector[n_groups] mc = meth_state[ , 2];

and similarly for hmc.

But the Dirichlet isn’t vectorizable (yet). That again is something where you probably want to use a reasonable value for alpha. If you’re just using 1 to get a uniform, then you can just drop the loop (not that it’ll take much time.

I couldn’t follow the definitions of OX and BS in terms of the underlying simplexes and how these relate to the beta distributions. Isn’t (1 - mc - hmc) just meth_state[j,1]? If so, that’s a better thing to use. Avoid arithmetic if you can at all manage it.

P.S. Variables named mc and hmc are very confusing!