Fitting a logistic multi normal isn't working, but the non-logistic version is fine

I’m trying to fit some injected parameters, and for some reason when I use the bernoulli_logit on top of the multi_normal_cholesky I can’t recover my standard deviation (tau). If I don’t use bernoulli_logit the model fits just fine. I’m also getting divergent transitions. Any clues what I might be doing wrong here? This seems really straightforward and I’ve spent days trying everything to no avail.

Logit model that doesn’t work:

data {
   int<lower=1> M; //Number of observations
   int<lower=1> N_features; //Number of predictor variables
   int<lower=0, upper=1> y_p_baseline[M, N_features]; //Boolean Observations
}

transformed data {
   cholesky_factor_corr[N_features] L_Omega = to_matrix([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]);
   vector[N_features] mu = to_vector([0.1,0.4,-0.3,-0.9]);
}

parameters {
   vector<lower=0>[N_features] tau;
   vector[N_features] logit_p_baseline;
}

model {
   
   matrix[N_features,N_features] L_Sigma = diag_pre_multiply(tau, L_Omega);   

   logit_p_baseline ~ multi_normal_cholesky(mu, L_Sigma);
   
   tau ~ normal(0,0.5);

   for (m in 1:M) {
      y_p_baseline[m] ~ bernoulli_logit(logit_p_baseline);
   }

}

Wrong values (should be 0.1,0.2,0.3,0.4)
In addition: Warning messages:
1: There were 12 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

mean se_mean sd 10% 90% n_eff Rhat
tau[1] 0.2074085 0.01745183 0.2115113 0.02606833 0.4987222 146.8875 0.9962213
tau[2] 0.2519658 0.01535765 0.2236631 0.03904499 0.5640318 212.0991 1.0017057
tau[3] 0.2553860 0.02014701 0.2468789 0.03511752 0.5762193 150.1574 1.0055040
tau[4] 0.2543119 0.01650095 0.2171723 0.03971491 0.5733396 173.2170 1.0171034

R code for injecting the values:

M=1000 # observations
N_features = 4 # columns
mu = c(0.1,0.4,-0.3,-0.9)
Omega <- rbind(
      c(1, 0, 0, 0),
      c(0, 1, 0, 0),
      c(0, 0, 1, 0),
      c(0, 0, 0, 1)
)
tau = c(0.1, 0.2, 0.3, 0.4)
Sigma = diag(tau) %*% Omega %*% diag(tau)

y_p_baseline = plogis(mvtnorm::rmvnorm(M, mu, Sigma))

for(i in 1:N_features) {
   for (j in 1:M) {
      y_p_baseline[j,i]=rbinom(1,1,y_p_baseline[j,i])
   }
}

Non-logit model that works just fine:

data {
   int<lower=1> M; //Number of observations
   int<lower=1> N_features; //Number of predictor variables
   vector[N_features] y_p_baseline[M]; // Observations
}

transformed data {
   cholesky_factor_corr[N_features] L_Omega = to_matrix([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]);
   vector[N_features] mu = to_vector([0.1,0.4,-0.3,-0.9]);
}

parameters {
   vector<lower=0>[N_features] tau;
}

model {
   
   matrix[N_features,N_features] L_Sigma = diag_pre_multiply(tau, L_Omega);  
   
   tau ~ normal(0,0.5);
   
   y_p_baseline ~ multi_normal_cholesky(mu, L_Sigma);

}

summary(fit, pars = “tau”, probs = c(0.1,0.9))$summary
mean se_mean sd 10% 90% n_eff Rhat
tau[1] 0.1006119 0.0001386487 0.002224633 0.09789271 0.1035016 257.4455 0.9963595
tau[2] 0.2005033 0.0002549114 0.004368046 0.19502377 0.2057183 293.6270 0.9960373
tau[3] 0.2863369 0.0004980778 0.006869206 0.27825528 0.2946214 190.2036 1.0009162
tau[4] 0.3894062 0.0004740003 0.008429819 0.37932946 0.4006192 316.2854 0.9976708