Markov Random Field Terms in Stan Model

Hi all,

I was hoping to develop a model in Stan comprising a Markov random field term (by this, I mean, a model with parametric inter-dependencies that are neither directed nor necessarily acyclic). Unfortunately, the model that I have developed has trouble with — at least — treedepth issues.

I have written a simplified version of the developed model to illustrate the problem in the hope that somebody might be able to provide some feedback. It may be worth noting that in reality, the actual model produces better convergence and space exploration performance (in terms of Rhat and n_eff) than the simplified model given here, but the simplified model seems sufficient to illustrate the concept.

Specifically, given the following scenario:

  • Each observation in 1,…,N has a corresponding y_n value in {0,1} and x_n value in (0,1) denoting a dependent variable and independent variable respectively.
  • Each observation 1,…,N is has a corresponding group, c_n in C for n in 1,…,N.
  • For each observation 1,…,N, the primary quantity of interest is theta_n for n in 1,…,N.

The goal is to model theta_n under the following conditions:

y_n ~ bernoulli(theta_n),
theta_n ~ beta(mu_n, phi),
mu_n = 0.5*x_n + 0.5+*gamma_n and phi is a constant,
gamma_n = binomial_cdf( successes_n | above-t, (l_n/(N-1)) )

where successes_n is the number of observations that are members of c_n (excluding observation n itself) with a theta value greater than a threhsold t under the current iteration, above-t is the total number of observations (excluding observation n itself) with a theta value greater than t under the current iteration and l_n is the size of c_n (excluding observation n itself).

The code for the model is as follows:

//////////////////////////////////////////////////

data{
 int N; // Dimension size.
 int M; // Dimension size.

 int<lower=0,upper=1> y[N]; // Dependent variable.
 vector<lower=0,upper=1>[N] x; // Independent variable.

 int c[M]; // 2D membership structure represented as single vector --- analogous to approach recommended 16.2 Ragged Data Structures section of the June 2017 Stan Modeling Language manual for version 2.16.0.
 int s[N]; // Starting index (in c) for each of the N groups.
 int l[N]; // Length of each of the N groups.

 real<lower=0,upper=1> t; // Threshold value.

 real<lower=0> phi; // Scale parameter.
}

parameters {
 vector<lower=0,upper=1>[N] theta;
}

transformed parameters{
 vector<lower=0,upper=1>[N] gamma;
 vector<lower=0,upper=1>[N] mu;

 vector<lower=0>[N] alpha;
 vector<lower=0>[N] beta;

 {
  // Total number of observations where theta > t: //
  int above_t;
  above_t = 0;
  for(n in 1:N){
   above_t = above_t + ( theta[n] > t ? 1 : 0 );
  }

  for(n in 1:N){
   // Total number of observations associated with observation n (i.e. group n) where theta > t --- "successes": //
   int above_t_corrected;
   int successes;
   if( theta[n] > t ){ above_t_corrected = above_t-1; }else{ above_t_corrected = above_t; }
   successes = 0;
   for(i in 1:l[n]){
    successes = successes + ( theta[ c[ (s[n]+i-1) ] ] > t ? 1 : 0 );
   }

   // gamma score - function of binomial cdf --- Markov random field term: //
   gamma[n] = binomial_cdf( successes, above_t_corrected, (l[n]/(N-1.0)) );
  }
 }

 // Prior hyperparameter for mean is a linear combination of input data and Markov random field term: //
 mu = 0.5*x + 0.5*gamma;

 // Reparameterization of parameters for beta distribution: //
 alpha = mu * phi;
 beta = (1-mu) * phi;
}

model {
 y ~ bernoulli(theta); // Likelihood.
 theta ~ beta(alpha,beta); // Prior.
}

//////////////////////////////////////////////////

If anybody has any comments on why the model is doomed/how to improve the model/general feedback, I’d love to hear from you.

Thanks!

1 Like

Sorry for the delay here—I’m just sweeping up old mesages now. Complex modeling questions tend to go unanswered.

I’m not sure why your MRF involves a binomial CDF.

Conditioning on a parameter, like theta[n] > t can cause non-differentiability. What’s happening is that you lose gradient information and Stan’s going to devolve to a random walk.

Isn’t there a way to formulate all the clique potentials and then observations without either CDFs or discontinuities?

Data-dependent priors can spell trouble in all sorts of ways for the behavior of Bayesian models, but that’s probably not your problem here. You need to keep alpha and beta positive, of course, and I don’t see how that’s enforced.