Why do model estimates change with udf vs non-udf using the same seed?

I have 2 models testing a repulsive prior for mixtures. It’s the same model but in 1 I have a user defined pdf in 2 I write it out in the model block.

Here’s the code for 1

data {
 int<lower = 0> N;
 vector[N] y;
}

parameters {
  ordered[2] mu;
  real<lower=0> sigma[2];
  real<lower=0> tau;
  real<lower=0, upper=1> theta;
}

model {
 sigma ~ normal(0, 2);
 mu ~ normal(0, 2);
 theta ~ beta(5, 5);
 tau ~ gamma(2, 2);

 for (n in 1:N)
   target += log_mix(theta,
                     normal_lpdf(y[n] | mu[1], sigma[1]) + log(1 - exp(- squared_distance(mu[1], mu[2]) * tau)),
                     normal_lpdf(y[n] | mu[2], sigma[2]) + log(1 - exp(- squared_distance(mu[1], mu[2]) * tau)));
 }

and here’s the code for 2

functions {
  real potential_lpdf(real y, vector a, real s, real tau, real mu) {
    real f = normal_lpdf(y | mu, s);
    real r = log(1 - exp(-squared_distance(a[1], a[2]) * tau));
   
    return f + r;
  }
}
data {
 int<lower = 0> N;
 vector[N] y;
}

parameters {
  ordered[2] mu;
  real<lower=0> sigma[2];
  real<lower=0> tau;
  real<lower=0, upper=1> theta;
}

model {
 sigma ~ normal(0, 2);
 mu ~ normal(0, 2);
 theta ~ beta(5, 5);
 tau ~ gamma(2, 2);

 for (n in 1:N)
   target += log_mix(theta,
                     potential_lpdf(y[n] | mu, sigma[1], tau, mu[1]),
                     potential_lpdf(y[n] | mu, sigma[2], tau, mu[2]));
 }

What’s interesting is that the output differ. Maybe this is just due to normal random error but I’m curious as to why when I use the same seed.

Output 1:

3 chains, each with iter=600; warmup=200; thin=1; 
post-warmup draws per chain=400, total post-warmup draws=1200.

             mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
mu[1]       -0.74    0.02 0.22    -1.22    -0.87    -0.72    -0.58    -0.36   204 1.00
mu[2]        0.83    0.03 0.33     0.24     0.59     0.81     1.05     1.52   174 1.01
sigma[1]     0.97    0.00 0.07     0.82     0.92     0.98     1.03     1.12   265 1.00
sigma[2]     0.98    0.01 0.10     0.77     0.92     0.99     1.05     1.15   213 1.01
tau          3.37    0.05 0.85     2.04     2.79     3.28     3.81     5.18   325 1.01
theta        0.61    0.01 0.16     0.29     0.51     0.63     0.74     0.88   203 1.00
lp__     -1629.54    0.11 1.81 -1634.09 -1630.39 -1629.15 -1628.27 -1627.15   295 1.01

Samples were drawn using NUTS(diag_e) at Wed Sep 02 05:21:36 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

and output 2

Inference for Stan model: mix_betancourt.
3 chains, each with iter=600; warmup=200; thin=1; 
post-warmup draws per chain=400, total post-warmup draws=1200.

             mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
mu[1]       -0.76    0.02 0.23    -1.30    -0.89    -0.73    -0.60    -0.40   150 1.02
mu[2]        0.79    0.02 0.32     0.17     0.56     0.79     1.00     1.43   182 1.01
sigma[1]     0.97    0.01 0.08     0.78     0.92     0.97     1.02     1.10   167 1.01
sigma[2]     1.00    0.01 0.10     0.80     0.93     1.00     1.06     1.17   265 1.01
tau          3.44    0.04 0.89     2.08     2.82     3.29     3.92     5.56   431 1.01
theta        0.59    0.01 0.16     0.24     0.49     0.61     0.71     0.86   156 1.02
lp__     -1629.70    0.11 1.79 -1634.17 -1630.53 -1629.39 -1628.41 -1627.23   289 1.01

Samples were drawn using NUTS(diag_e) at Wed Sep 02 05:23:06 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The full call to R

library(rstan)
options(mc.cores = parallel::detectCores())
N <- 1000
mu <- c(-0.75, 0.75);
sigma <- c(1, 1);
lambda <- 0.4
z <- rbinom(N, 1, lambda) + 1;
y <- rnorm(N, mu[z], sigma[z]);


stan_rdump(c("N", "y"), file="mix.data.R")

input_data <- read_rdump("mix.data.R")
singular_fit <- stan(file='..\\mix_betancourt_fun.stan', data=input_data[1:2],
                     chains = 3, iter = 600, warmup = 200, 
                     control = list(adapt_delta = 0.9),
                     seed = 483892929)

singular_fit

singular_fit2 <- stan(file='..\\mix_betancourt.stan', data=input_data[1:2],
                     chains = 3, iter = 600, warmup = 200, 
                     control = list(adapt_delta = 0.9),
                     seed = 483892929)
singular_fit2 

Is it also not the same if you do:

real potential_lpdf(real y, vector a, real s, real tau, real mu) {
    return normal_lpdf(y | mu, s) + log(1 - exp(-squared_distance(a[1], a[2]) * tau));
  }

That does give the same. Is this an issue where I should be using log_sum_exp?