Differences in simplex constraining function vs normal simplex

Can you be more specific on “not mixes well”?

  • Do you see chains getting stuck locally? Numerical problems can make chains to stuck depending on initial values and then convergence diagnostics indicate multimodality. We have seen also numerical problems that cause less smooth numerical log density leading to smaller adapted step size leading to lower ESS (without chains getting stuck)
  • Are you using the default initialization? It’s quite often bad. What if you try with init=0.1 (helps if the default init=2 produces too extreme inits) or with Pathfinder initialization (helps with multimodality)?

Hey Aki,

There are some traceplots further up in the thread and now in a different model (where the relevant parameters are described in the above post) I’m having the same issue, that is, poor mixing and no convergence. I always set init = 0.1 for more complex models like these. Tl;dr, the simplex constraining a sum-to-zero vector or directly providing simplexes performs as expected, but simplex_jacobian on free vectors often has terrible mixing.

Sorry, this probably isn’t very useful!

Do the traceplot look similar for the latest model + posterior? The problematic traceplot shows that many chains for many betas are at zero and the one chain for many betas is so close to zero that numerical problems might explain the behavior for that model + posterior. If there is some underflow or otherwise the values are too close to what floating point presentation can accurately present, then the numerical floating point log density and gradient can be quite different from the theoretical ones

Once again, as seems to be a trend for me with this issue, I’m struggling to replicate it. I’ll say quiet on this for now until I have a fully reproducible example. Thank you.

Your definitions do exactly the same thing, so it must be something else in your model.

Here’s a demonstration.

v1.stan

transformed data {
  array[2] int<lower=0> V = { 3, 4 };
}
parameters {
  simplex[V[1]] phi_psi;
  simplex[V[2]] phi_mu;
}
transformed parameters {
  // ragged matrix of simplexes
  matrix[2, V[2]] phi = rep_matrix(0, 2, V[2]);
  phi[1, :V[1]] = phi_psi';
  phi[2] = phi_mu';
}
model {
  phi[1, 1:V[1]] ~ dirichlet(rep_vector(1, V[1]));
  phi[2, 1:V[2]] ~ dirichlet(rep_vector(1, V[2]));
}

v2.stan

transformed data {
  array[2] int<lower=0> V = { 3, 4 };
}
parameters {
  vector[sum(V) - 2] phi_u;
}
transformed parameters {
  matrix[2, V[2]] phi = rep_matrix(0, 2, V[2]);
  phi[1, :V[1]] = simplex_jacobian(head(phi_u, V[1] - 1))';
  phi[2] = simplex_jacobian(tail(phi_u, V[2] - 1))';
}
model {
  phi[1, 1:V[1]] ~ dirichlet(rep_vector(1, V[1]));
  phi[2, 1:V[2]] ~ dirichlet(rep_vector(1, V[2]));
}

And here’s a run in CmdStanPy showing they produce identical results.

$ python3 
Python 3.9.6 (default, Aug  8 2025, 19:06:38) 
[Clang 17.0.0 (clang-1700.3.19.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cmdstanpy as csp
>>> m1 = csp.CmdStanModel(stan_file='v1.stan')
15:52:41 - cmdstanpy - INFO - compiling stan file /Users/bcarpenter/temp2/hollanders/v1.stan to exe file /Users/bcarpenter/temp2/hollanders/v1
15:52:44 - cmdstanpy - INFO - compiled model executable: /Users/bcarpenter/temp2/hollanders/v1
>>> m2 = csp.CmdStanModel(stan_file='v2.stan')
15:52:48 - cmdstanpy - INFO - compiling stan file /Users/bcarpenter/temp2/hollanders/v2.stan to exe file /Users/bcarpenter/temp2/hollanders/v2
15:52:52 - cmdstanpy - INFO - compiled model executable: /Users/bcarpenter/temp2/hollanders/v2
>>> f1 = m1.sample(seed=1234)
15:53:05 - cmdstanpy - INFO - CmdStan start processing
chain 1 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
chain 2 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
chain 3 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
chain 4 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
15:53:05 - cmdstanpy - INFO - CmdStan done processing.
>>> f2 = m2.sample(seed=1234)
15:53:11 - cmdstanpy - INFO - CmdStan start processing
chain 1 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
chain 2 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
chain 3 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
chain 4 |████████████████████████████████████████████████████████████████████████████████████████████████| 00:00 Sampling completed
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
15:53:12 - cmdstanpy - INFO - CmdStan done processing.
>>> f1.summary()
                 Mean      MCSE    StdDev       MAD         5%        50%       95%  ESS_bulk  ESS_tail  ESS_bulk/s    R_hat
lp__       -10.514400  0.047159  1.814510  1.673580 -14.047000 -10.162000 -8.278340   1566.32   2029.85     17599.1  1.00219
phi_psi[1]   0.332426  0.003178  0.227941  0.253611   0.032544   0.295013  0.759463   4518.17   2489.73     50766.0  1.00046
phi_psi[2]   0.341475  0.003245  0.235444  0.264922   0.027967   0.303745  0.783702   5076.52   2343.67     57039.6  1.00209
phi_psi[3]   0.326098  0.003420  0.236445  0.259068   0.017188   0.284675  0.768034   4297.89   2099.65     48290.9  1.00199
phi_mu[1]    0.248542  0.002433  0.189714  0.189978   0.017991   0.207200  0.621276   5159.20   2358.37     57968.5  1.00037
phi_mu[2]    0.248351  0.002469  0.189295  0.186581   0.018128   0.206972  0.622735   5006.86   2262.31     56256.9  1.00082
phi_mu[3]    0.248727  0.002508  0.191123  0.191591   0.017278   0.208531  0.628241   4558.43   2123.97     51218.3  1.00088
phi_mu[4]    0.254380  0.002562  0.193477  0.191099   0.018237   0.211426  0.637267   4052.58   1855.31     45534.6  1.00104
phi[1,1]     0.332426  0.003178  0.227941  0.253611   0.032544   0.295013  0.759463   4518.17   2489.73     50766.0  1.00046
phi[1,2]     0.341475  0.003245  0.235444  0.264922   0.027967   0.303745  0.783702   5076.52   2343.67     57039.6  1.00209
phi[1,3]     0.326098  0.003420  0.236445  0.259068   0.017188   0.284675  0.768034   4297.89   2099.65     48290.9  1.00199
phi[1,4]     0.000000       NaN  0.000000  0.000000   0.000000   0.000000  0.000000       NaN       NaN         NaN      NaN
phi[2,1]     0.248542  0.002433  0.189714  0.189978   0.017991   0.207200  0.621276   5159.20   2358.37     57968.5  1.00037
phi[2,2]     0.248351  0.002469  0.189295  0.186581   0.018128   0.206972  0.622735   5006.86   2262.31     56256.9  1.00082
phi[2,3]     0.248727  0.002508  0.191123  0.191591   0.017278   0.208531  0.628241   4558.43   2123.97     51218.3  1.00088
phi[2,4]     0.254380  0.002562  0.193477  0.191099   0.018237   0.211426  0.637267   4052.58   1855.31     45534.6  1.00104
>>> f2.summary()
               Mean      MCSE    StdDev       MAD         5%        50%       95%  ESS_bulk  ESS_tail  ESS_bulk/s    R_hat
lp__     -10.514400  0.047159  1.814510  1.673580 -14.047000 -10.162000 -8.278340   1566.32   2029.85     16147.7  1.00219
phi_u[1]  -0.010096  0.020046  1.229220  1.106380  -1.920970  -0.004964  1.999520   4062.58   2380.03     41882.3  1.00093
phi_u[2]   0.080824  0.023505  1.356300  1.206490  -1.871410  -0.010014  2.535870   3784.24   2015.03     39012.8  1.00172
phi_u[3]  -0.001905  0.019724  1.221390  1.049180  -2.038820   0.001182  2.035830   4025.58   2319.70     41500.9  1.00073
phi_u[4]   0.015602  0.022350  1.288220  1.160890  -1.906490  -0.075052  2.239230   3887.26   2216.25     40074.8  1.00111
phi_u[5]  -0.031973  0.024358  1.242770  1.098490  -1.882940  -0.150980  2.214260   3393.57   1743.39     34985.3  1.00487
phi[1,1]   0.332426  0.003178  0.227941  0.253611   0.032544   0.295013  0.759463   4518.17   2489.73     46579.1  1.00046
phi[1,2]   0.341475  0.003245  0.235444  0.264922   0.027967   0.303745  0.783702   5076.52   2343.67     52335.3  1.00209
phi[1,3]   0.326098  0.003420  0.236445  0.259068   0.017188   0.284675  0.768034   4297.89   2099.65     44308.2  1.00199
phi[1,4]   0.000000       NaN  0.000000  0.000000   0.000000   0.000000  0.000000       NaN       NaN         NaN      NaN
phi[2,1]   0.248542  0.002433  0.189714  0.189978   0.017991   0.207200  0.621276   5159.20   2358.37     53187.6  1.00037
phi[2,2]   0.248351  0.002469  0.189295  0.186581   0.018128   0.206972  0.622735   5006.86   2262.31     51617.1  1.00082
phi[2,3]   0.248727  0.002508  0.191123  0.191591   0.017278   0.208531  0.628241   4558.43   2123.97     46994.1  1.00088
phi[2,4]   0.254380  0.002562  0.193477  0.191099   0.018237   0.211426  0.637267   4052.58   1855.31     41779.2  1.00104
>>> 
1 Like