# Stochastic Blockmodel by Stan

I am trying to sample from the density given by the last expression on page 5.
Here is my Stan code. It doesn’t converge.

(upload://8IPHO5hhKDQ5E1Hk00jUPVudszd.pdf) (544.2 KB) ](https://arxiv.org/pdf/1203.3083.pdf)

``````data{
int N;
int K;
int X[N*N];

}

parameters{
vector<lower=0>[K] alpha1; //各オブジェクトの混合割合
vector<lower=0>[K] alpha2;
simplex[K] pi1; //K面サイコロ(Dirichret分布)
simplex[K] pi2;
real<lower=0> a0;
real<lower=0> b0;
vector<lower=0,upper=1>[K*K] theta; //クラスタ間の関係の強さ
}

model{
real lp[K*K];

pi1 ~ dirichlet(alpha1);
pi2 ~ dirichlet(alpha2);
theta ~ beta(a0, b0);

for (k in 1:K){
for (l in 1:K){
lp[k+ K*(l-1)] = log(pi1[k]) + log(pi2[l]) + bernoulli_lpmf(X | theta[k+ K*(l-1)]);
}
}
target += lp;
}
``````

I’m new to Stan. I would like you to help coding this by Stan properly.

1 Like

The weirdest part to me in the model formulation is `bernoulli_lpmf(X | theta[k+ K*(l-1)])` are you sure you want to express the whole `X` as a mixture of components, each individually bernoulli? This would mean that the data can be fully summarized by `sum(X)` and `N` (your code would be equivalent to `binomial_lpmf(sum(X) | N * N, theta[k+ K*(l-1)])`)?

In its current form, the model likely suffers from non-identifiability ie. that there are multiple model configurations that lead to the same (or very similar) mode (maximum) of the log density, because if you swap `pi1` with `pi2` and transpose `theta` , you’ll get exactly the same likelihood.

A few additional notes:

• It is useful to provide more details about your issues (the exact messages, the data you pass to the model)
• Stan has types for matrices and 2D arrays, so you could have `matrix<lower=0,upper=1>[K,K] theta` or `int X[N,N]` making it easire to write your model
• Useful information can be extracted diagnostic plots from ShinyStan or as described in the Bayesplot Vignette https://cran.r-project.org/web/packages/bayesplot/vignettes/visual-mcmc-diagnostics.html

Best of luck!