Equivalent models and problem with mixing when some parameters are latent

I am working on the following generative model that generates a graph G(V, E) as follows for a network of size N:

  1. I generate i.i.d. rank values r_i for each node from a truncated exponential distribution with parameter \lambda and support [0, 1].
  2. For every pair of nodes (i, j) the probability of an undirected edge between i and j is f(i, j) = c^{-2+\max \{ r_i, r_j \}} for some 1 < c < e^\lambda.

Suppose that I have a realized network with M edges that corresponds to an N \times N adjacency matrix A[N, N] and an edge matrix E[M, 2] where each row represents an edge.

I have created two mathematically equivalent models:

First is a naive model where each pass costs O(N^2)

// naive.stan

data {
  int N;
  int A[N, N];
  real<lower=0, upper=1> ranks[N];
}

parameters {
  real<lower=1> c[L];
  real<lower=0> lambda;
}

model {
  ranks ~ exponential(lambda);

  for (i in 1:N) {
    for (j in 1:(i - 1)) {
      if (ranks[i] >= ranks[j]) {
        A[i, j] ~ bernoulli(pow(c, - 2 + ranks[i]));
      }
      else {
        A[i, j] ~ bernoulli(pow(c, - 2 + ranks[j]));
      }
  }
  }
}

The second model is equivalent to the first but more sophisticate: i.e. after a preprocessing step of O(M) can run in a pass in O(N) giving per step complexity O(N + M).

// optimized.stan
data {
  int N;
  int M;
  int E[M, 2];
  real<lower=0, upper=1> ranks[N];
}

parameters {
  real<lower=1> c;
  real<lower=0> lambda;
}

model {
  int ordering[N];
  real max_rank;
  int argmax_rank;
  int sizes[N];
  int neg_sizes[N];
  int ordered_edges[M, K];
  real ordered_ranks[N];
  ranks ~ exponential(lambda);
  ordering = sort_indices_desc(ranks);
  ordered_ranks = sort_desc(ranks);

  for (i in 1:N) {
    sizes[i] = 0;
  }

  for (m in 1:M) {
    for (k in 1:K) {
      ordered_edges[m, k] = ordering[E[m, k] + 1];
    }
  }

  for (m in 1:M) {
    argmax_rank = min(ordered_edges[m, 1:K]);
    sizes[argmax_rank] = sizes[argmax_rank] + 1;
  }

  for (i in 1:N) {
    sizes[i] ~ binomial(N - i, pow(c, - 2 + ordered_ranks[i]));
  }
}

I have the following problem (still with simulated data from the same model):

  1. when I supply both models with all my data i.e. both structure (adjacency or edge matrix) and the ranks matrix then both models are able to yield the same results (and the 2nd model is faster; as expected).
  2. when I supply both models only with the structural data (adjacency or edge matrix) but I leave the ranks as latent variables then the naive model is able to converge and the optimized model is not. More specifically I get a max_tree_depth warning after sampling with the optimized model and the results do not look good.

Thank you in advance