Why is my basic linear regression model not converging?

I’m just learning Stan so I set up a simple model. Regression with 2 variables, everything simulated from standard normals. The prior is normal with specified mean and variance and likelihood is also normal.

After 4,000 iterations with 1,000 warm-up iterations, I get errors on R-Hat being too high (1.84), and the sample sizes being too low. Why is that? This seems like a trivial model with very well behaved data.


# Random X
X <- matrix(rnorm(6000 * 2), ncol = 2)

cov_betas_L <- t(X) %*% X
cov_betas_L <- t(chol(cov_betas_L))

# Sum the columns of X, add noise
y <- X %*% rep(1, ncol(X)) + rnorm(nrow(X))
y <- y[, 1]

prior_betas <- c(-2, 2)

stan_data <- list(N = nrow(X), M = ncol(X), X = X, y = y,
                  prior_betas = prior_betas, prior_L = cov_betas_L)

data {
 int<lower = 1> N;
 int<lower = 1> M;
 
 vector[N] y;
 matrix[N, M] X;
 
 cholesky_factor_cov[M, M] prior_L;
 vector[M] prior_betas;
}

parameters {
 real alpha;
 vector[M] beta_z;
 real<lower = 0> y_sigma;
}

model {
 vector[M] beta = prior_betas + prior_L * beta_z;
  
  beta_z ~ normal(0, 1);

  y ~ normal(X * beta, y_sigma);
}

The errors are:

Warning messages:
1: There were 4352 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
2: Examine the pairs() plot to diagnose sampling problems

3: The largest R-hat is 1.84, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

Parameter alpha is not used anywhere in the model. It doesn’t even have a prior so the posterior is improper and sampling cannot work. Maybe you meant

  y ~ normal(alpha + X * beta, y_sigma);