Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain

I did some tests on the scale of the soft-centering using an ICAR prior and a simple Poisson model:

functions {
 void icar_normal_lp(int N, int[] node1, int[] node2, real s, vector phi) {
   target += -0.5 * dot_self(phi[node1] - phi[node2]);
  // soft sum-to-zero constraint on phi
  // more efficient than mean(phi) ~ normal(0, s)
  sum(phi) ~ normal(0, s * N);
 }
}
data {
  real<lower=0, upper=0.1> s;  // scale close to zero
  int<lower=0> N;
  int<lower=0> N_edges;
  int<lower=1, upper=N> node1[N_edges];  // node1[i] adjacent to node2[i]
  int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]

  int<lower=0> y[N];              // count outcomes
  vector<lower=0>[N] E;           // exposure
}
transformed data {
  vector[N] log_E = log(E);
}
parameters {
  real beta0;             // intercept
  real<lower=0> sigma;    // overall standard deviation
  vector[N] phi;         // spatial effects
}
model {
  y ~ poisson_log(log_E + beta0 + phi * sigma);
  beta0 ~ normal(0.0, 1.0);
  sigma ~ normal(0.0, 1.0);
  icar_normal_lp(N, node1, node2, s, phi);

}

I ran this over the NYC pedestrian traffic data used in the ICAR case study, and tried scale of 0.1, 0.01, and 0.001, 3 chains, 2000 iterations (default). when the scale was 0.1 the Rhat values indicated failure to converge, however there were no divergences or other warnings, only Rhat values above 1.1 (close but no cigar Rhats).
because the scale 0.1 didn’t really converge well, sampling took much longer, and the overall warmup times seemed longer too. there wasn’t much difference between scale 0.01 and 0.001 - here’s the times for these latter two:

NYC:  1921 regions, intercept only, pois + icar
s = 0.01
3 parallel chains:
 Elapsed Time: 505.259 seconds (Warm-up)
 Elapsed Time: 555.046 seconds (Warm-up)
 Elapsed Time: 575.119 seconds (Warm-up)
               262.570 seconds (Sampling)
               250.358 seconds (Sampling)
               243.568 seconds (Sampling)
               767.829 seconds (Total)
               805.404 seconds (Total)
               818.687 seconds (Total)

****************

s = 0.001
3 parallel chains:
 Elapsed Time: 528.069 seconds (Warm-up)
 Elapsed Time: 543.13 seconds (Warm-up)
 Elapsed Time: 556.438 seconds (Warm-up)
               273.678 seconds (Sampling)
               268.504 seconds (Sampling)
               265.584 seconds (Sampling)
               801.747 seconds (Total)
               811.634 seconds (Total)
               822.021 seconds (Total)

I reran the model with scale 0.1 several times - there the warmup took on the order of 600 - 700 seconds, as did the sampling - given that there were 1000 iterations in both sampling and warmups, the fact that sampling iterations took as long as warmups is also an indication of failure to converge.