Hi all,

I was hoping to develop a model in Stan comprising a Markov random field term (by this, I mean, a model with parametric inter-dependencies that are neither directed nor necessarily acyclic). Unfortunately, the model that I have developed has trouble with — at least — treedepth issues.

I have written a simplified version of the developed model to illustrate the problem in the hope that somebody might be able to provide some feedback. It may be worth noting that in reality, the *actual* model produces better convergence and space exploration performance (in terms of Rhat and n_eff) than the simplified model given here, but the simplified model seems sufficient to illustrate the concept.

Specifically, given the following scenario:

- Each observation in 1,…,N has a corresponding
*y_n*value in {0,1} and*x_n*value in (0,1) denoting a dependent variable and independent variable respectively. - Each observation 1,…,N is has a corresponding group,
*c_n*in*C*for n in 1,…,N. - For each observation 1,…,N, the primary quantity of interest is
*theta_n*for n in 1,…,N.

The goal is to model *theta_n* under the following conditions:

*y_n* ~ bernoulli(*theta_n*),

*theta_n* ~ beta(*mu_n*, *phi*),

*mu_n* = 0.5**x_n* + 0.5+**gamma_n* **and** *phi* is a constant,

*gamma_n* = binomial_cdf( *successes_n* | *above-t*, (*l_n*/(*N*-1)) )

where *successes_n* is the number of observations that are members of *c_n* (excluding observation *n* itself) with a *theta* value greater than a threhsold *t* under the current iteration, *above-t* is the total number of observations (excluding observation *n* itself) with a *theta* value greater than *t* under the current iteration and *l_n* is the size of *c_n* (excluding observation *n* itself).

The code for the model is as follows:

```
//////////////////////////////////////////////////
data{
int N; // Dimension size.
int M; // Dimension size.
int<lower=0,upper=1> y[N]; // Dependent variable.
vector<lower=0,upper=1>[N] x; // Independent variable.
int c[M]; // 2D membership structure represented as single vector --- analogous to approach recommended 16.2 Ragged Data Structures section of the June 2017 Stan Modeling Language manual for version 2.16.0.
int s[N]; // Starting index (in c) for each of the N groups.
int l[N]; // Length of each of the N groups.
real<lower=0,upper=1> t; // Threshold value.
real<lower=0> phi; // Scale parameter.
}
parameters {
vector<lower=0,upper=1>[N] theta;
}
transformed parameters{
vector<lower=0,upper=1>[N] gamma;
vector<lower=0,upper=1>[N] mu;
vector<lower=0>[N] alpha;
vector<lower=0>[N] beta;
{
// Total number of observations where theta > t: //
int above_t;
above_t = 0;
for(n in 1:N){
above_t = above_t + ( theta[n] > t ? 1 : 0 );
}
for(n in 1:N){
// Total number of observations associated with observation n (i.e. group n) where theta > t --- "successes": //
int above_t_corrected;
int successes;
if( theta[n] > t ){ above_t_corrected = above_t-1; }else{ above_t_corrected = above_t; }
successes = 0;
for(i in 1:l[n]){
successes = successes + ( theta[ c[ (s[n]+i-1) ] ] > t ? 1 : 0 );
}
// gamma score - function of binomial cdf --- Markov random field term: //
gamma[n] = binomial_cdf( successes, above_t_corrected, (l[n]/(N-1.0)) );
}
}
// Prior hyperparameter for mean is a linear combination of input data and Markov random field term: //
mu = 0.5*x + 0.5*gamma;
// Reparameterization of parameters for beta distribution: //
alpha = mu * phi;
beta = (1-mu) * phi;
}
model {
y ~ bernoulli(theta); // Likelihood.
theta ~ beta(alpha,beta); // Prior.
}
//////////////////////////////////////////////////
```

If anybody has any comments on why the model is doomed/how to improve the model/general feedback, I’d love to hear from you.

Thanks!