Why is one parameter not sampling at all?

I have the model below, which I run with 4 chains. It’s a logit-normal model with normal priors for the mean and uniform for the variance. Even after 4000 iterations with 1000 warmups the chains don’t converge at all. First I get an error message saying

Log probability evaluates to log(0), i.e. negative infinity.

Then I noticed in the diagnostic plots that while the beta_z parameters somewhat move around, y_sigma remains fixed in every sample in every chain. It’s constant at 5998. Finally I get the error that

The largest R-hat is Inf, indicating chains have not mixed.

and that every transition is divergent.

What is wrong?

data {
   int<lower = 1> N;
   int<lower = 1> M;
   vector[N] y;
   matrix[N, M] X;
   cholesky_factor_cov[M, M] prior_L;
   vector[M] prior_betas;
   int<lower = 1> test_N;
   matrix[test_N, M] test_X;
  parameters {
   vector[M] beta_z;
   real<lower = 0> y_sigma;
  model {
   vector[M] beta = prior_betas + prior_L * beta_z;
    beta_z  ~ normal(0, 1);
    y_sigma ~ uniform(0.5, 2.5);
    for (n in 1:N) {
      target += -pow((logit(y[n]) - (row(X, n) * beta)) / (sqrt2() * pi()), 2.0);
      target += -log(y[n] * (1 - y[n]));
      target += -log(y_sigma * sqrt(2 * pi()));
  generated quantities {
    vector[N] pred;
    vector[test_N] test_pred;
    vector[M] beta = prior_betas + prior_L * beta_z;
    for (n in 1:N) {
      pred[n] = inv_logit(normal_rng(row(X, n) * beta, y_sigma));
    for (n in 1:test_N) {
      test_pred[n] = inv_logit(normal_rng(row(test_X, n) * beta, y_sigma));

I am just a novice user, certainly not an expert, so I may not be entirely correct. A problem I see is that you have a lower bound of 0 and no upper bound on y_sigma but the prior you put on it has support only over the range [0.5,2.5]. Because the variable has a lower bound, it gets log transformed (section 10.2 of the reference manual version 2.21) to have an unconstrained range during fitting and on that scale it gets initialized on the range [-2,2], so [e^{-2},e^2]. The initial values can fall where the probability is zero for the prior and bad things happen, or the sampling can go to a region with zero probability. Try putting a prior on y_sigma that goes down to zero and does not have a hard upper limit. There are recommendations published for priors on scale parameters, though I cannot remember just where that is at the moment.

Edit: One source of recommendations on priors