Bayesian Multiple regression with Stan

Could you please give me your sussing and corrections of this Stan code
When I check this code I received this error:

Error in stanc(filename, allow_undefined = TRUE) : 0
Semantic error in 'string', line 36, column 8 to column 46:
   -------------------------------------------------
    34:          }
    35:          // Likelihood
    36:          y[n] ~ multi_normal_cholesky(mu, L_e);
                 ^
    37:      }
    38:  }
   -------------------------------------------------

Ill-typed arguments supplied to function 'multi_normal_cholesky':
(real, vector, matrix)
Available signatures:
(vector, array[] row_vector, matrix) => real
  The first argument must be vector but got real
(vector, row_vector, matrix) => real
  The first argument must be vector but got real
(vector, array[] vector, matrix) => real
  The first argument must be vector but got real
(vector, vector, matrix) => real
  The first argument must be vector but got real
(row_vector, array[] row_vector, matrix) => real
  The first argument must be row_vector but got real
(Additional signatures omitted)
data {
    int<lower=0> N;               // Total number of valid observations
    int<lower=0> K;               // Number of predictors (tree-rings)
    int<lower=0> M;               // Number of unique streamflow stations
    matrix[N, K] X;               // Predictor matrix (only valid rows)
    vector[N] y;                  // Flattened response vector (non-missing responses)
    int<lower=1, upper=M> index[N]; // Index for streamflow stations (1 to M)
}

transformed data {
    real<lower=0> df_e;           // Degrees of freedom for the inverse Wishart distribution
    df_e = M + 1;                 // Degrees of freedom for the inverse Wishart distribution
}

parameters {
    vector[M] alphas;             // Mean for each streamflow station
    matrix[K, M] betas;           // Coefficients for predictors, varying by station
    cov_matrix[M] sigmas_e;       // Covariance matrix for residuals
}

transformed parameters {
    cholesky_factor_cov[M] L_e;   // Cholesky factor of the covariance matrix
    L_e = cholesky_decompose(sigmas_e); // Cholesky decomposition of the covariance matrix
}

model {
    // Priors
    alphas ~ normal(0, 100);
    to_vector(betas) ~ normal(0, 100);
    sigmas_e ~ inv_wishart(df_e, diag_matrix(rep_vector(1, M)));

    // Likelihood
    for (n in 1:N) {
        vector[M] mu; // Initialize mean vector for the current observation
        mu = alphas; // Start with alphas

        // Compute the contribution from the predictors
        for (m in 1:M) {
            mu[m] += dot_product(X[n], betas[, m]); // Each entry of mu
        }

        // Likelihood
        y[n] ~ multi_normal_cholesky(mu, L_e);
    }
}

[edit: escaped code]

Welcome to the Discourse! The issue youā€™re having is that youā€™re modeling a single value (y[n]) using a distribution that needs a vector of values.

Maybe you meant to pass y as a matrix?

At least, thatā€™s what the error is telling you; you may have other bugs that are not immediately apparently.

1 Like

Dear Simonbrauer
Thank you so much for your quick reply
Would you please explain more your suggestion?
Best regards

Sure. The first part of the error is saying that you tried to call the function multi_normal_cholesky with a three arguments: a (single value) real, a vector, and a matrix. However, there is no implementation of that function that takes those three kinds of arguments.

The second part of the error is listing out all of the possible type signatures that are implement. For example,

means that you can call multi_normal_cholesky with a vector, another vector, and a matrix. This makes sense, since multi_normal_cholesky is a multivariate distribution, and so the first argument (y in your case) needs to be the same length as the second argument (mu).

There are multiple ways of doing this. But Iā€™ll just copy your code and illustrate what I had in mind. Note how y is now an array of vectors, so y[n] refers to a vector of length M.

data {
  int<lower=0> N; // Total number of valid observations
  int<lower=0> K; // Number of predictors (tree-rings)
  int<lower=0> M; // Number of unique streamflow stations
  matrix[N, K] X; // Predictor matrix (only valid rows)
  array[N] vector[M] y; // Flattened response vector (non-missing responses)
  int<lower=1, upper=M> index[N]; // Index for streamflow stations (1 to M)
}
transformed data {
  real<lower=0> df_e; // Degrees of freedom for the inverse Wishart distribution
  df_e = M + 1; // Degrees of freedom for the inverse Wishart distribution
}
parameters {
  vector[M] alphas; // Mean for each streamflow station
  matrix[K, M] betas; // Coefficients for predictors, varying by station
  cov_matrix[M] sigmas_e; // Covariance matrix for residuals
}
transformed parameters {
  cholesky_factor_cov[M] L_e; // Cholesky factor of the covariance matrix
  L_e = cholesky_decompose(sigmas_e); // Cholesky decomposition of the covariance matrix
}
model {
  // Priors
  alphas ~ normal(0, 100);
  to_vector(betas) ~ normal(0, 100);
  sigmas_e ~ inv_wishart(df_e, diag_matrix(rep_vector(1, M)));
  
  // Likelihood
  for (n in 1:N) {
      vector[M] mu; // Initialize mean vector for the current observation
      mu = alphas; // Start with alphas
  
      // Compute the contribution from the predictors
      for (m in 1:M) {
          mu[m] += dot_product(X[n], betas[, m]); // Each entry of mu
      }
  
      // Likelihood
      y[n] ~ multi_normal_cholesky(mu, L_e);
  }
}

As a side-note, you can surround your code in ```stan ``` to make it highlight syntax. For example.

```stan
data{
vector[N] Y;
}
```

becomes

data{
   vector[N] Y;
}

Thank you so much for your help

1 Like

Aside from the coding issue, which @simonbrauer correctly diagnosed, thereā€™s a computational issue in this model.

Youā€™re better off here just using regular multi_normal and not doing the Cholesky factorization yourself. The multi-normal does it already.

Given that youā€™re using a unit matrix in the inverse Wishart prior (i.e., not something informative), I would suggest instead decomposing the prior into an LKJ prior on the correlation matrix (shrink to unit correlation) plus an independent prior on the scales. Hereā€™s a description of why:

http://www.stat.columbia.edu/~gelman/research/unpublished/Visualization.pdf

Aside from the applied modeling motivation, itā€™s a whole lot faster. The Cholesky factor operations are quadratic whereas the full covariance operations are cubic and with this approach you donā€™t ever need to create the full covariance matrix. Thereā€™s a description in the Userā€™s Guide chapter on regression of how to code it in Stan. We should really introduce Cholesky-parameterized inverse Wisharts but we donā€™t use them a lot ourselves given @bgoodriā€™s paper linked above.

this can probably be sped up with some of our linear algebra and compound operations like dot_product_rows. At the very least, if youā€™re using in this form, itā€™ll be much more efficient to declare things this way because no copying will be required to extract a row of X and column of betas.

array[N] row_vector[K] X;
array[M] vector[K] betas_t;
for (m in 1:M) {
  mu[m] += X[n] * betas_t[m];
}

I used betas_t because itā€™s the transpose of your original betas. X keeps the same shape, just a different declaration.

We do have this. Covariance Matrix Distributions

In fact, you helped with the docs and review!

2 Likes

The error is due to mu being a real instead of a vector[M]. Change it to vector[M] and make sure L_e is correctly set as a Cholesky factor. That should fix the issue!

I knew it was a plan, but I hadnā€™t realized itā€™d gotten all the way through to being released! This is super cool. Iā€™ve wanted to have this for years!

I convert Y to a data frame, remove NAs, and flatten
In this example, I updated Stan Model with Cholesky Decomposition
But still not working I received this errors:
ā€œSAMPLING FOR MODEL ā€˜anon_modelā€™ NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: multi_normal_cholesky_lpdf: Size of random variable (123) and rows of covariance parameter (4) must match in size (in ā€˜anon_modelā€™, line 43, column 4 to column 39)
[1] ā€œError : Exception: multi_normal_cholesky_lpdf: Size of random variable (123) and rows of covariance parameter (4) must match in size (in ā€˜anon_modelā€™, line 43, column 4 to column 39)ā€ā€
###########################################################################
Here the different steps:
Y ā† readRDS(ā€˜data/data.ann.stramflow.rdsā€™) %>% .[-c(1,5:36)]%>% as.matrix()
X ā† readRDS(ā€˜data/data.treerings.rdsā€™) %>% .[-c(1,4:27)] %>%
mutate_all(~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x)) %>% as.matrix()
nt ā† nrow(Y)
ns ā† ncol(Y)
np ā† ncol(X)

Convert Y to a data frame, remove NAs, and flatten

y_na ā† Y %>%
as.data.frame() %>%
mutate(row = row_number()) %>%
pivot_longer(-row, names_to = ā€œcolumnā€, values_to = ā€œvalueā€) %>%
filter(!is.na(value)) %>%
mutate(column = match(column, colnames(Y))) %>% # Convert column names to column numbers
set_colnames(c(ā€˜yearā€™,ā€˜siteā€™,ā€˜flowā€™))

nb = nrow(y_na)

Data list for Stan

stan_data ā† list(nb = nb, ns = ns, nt = nt, np = np, x = X, y = y_na$flow, site = y_na$site, year = y_na$year, Wp_e = diag(ns))

stan_model_code ā† ā€˜/Users/samir/Documents/Paleo_hydrology_Algeria/Bayesian_model_rcode/stan_np.stanā€™
stan_model ā† stan(file = stan_model_code, data = stan_data, #init = init_list,
iter = 1000, chains = 1,
cores = 8, control = list(adapt_delta = .9, max_treedepth = 15))
###########################################################################
Here the Stan code:
data {
int<lower=0> nb; // Total number of observations (non-missing)
int<lower=0> ns; // Number of streamflow stations
int<lower=0> nt; // Number of years
int<lower=0> np; // Number of predictors (tree-rings)

matrix[nt, np] x;               // Predictor matrix (no missing values)
vector[nb] y;                  // Response vector (non-missing values only)

int<lower=1, upper=ns> site[nb]; // Index for streamflow stations (1 to ns)
int<lower=1, upper=nt> year[nb]; // Index for years (1 to nt)

matrix[ns, ns] Wp_e;            // Scale matrix for the inverse-Wishart prior

}

transformed data {
real<lower=0> df_e; // Degrees of freedom for the inverse-Wishart distribution
df_e = ns + 1;
}

parameters {
vector[ns] alphas; // Mean for each streamflow station
matrix[np, ns] betas; // Coefficients for predictors, varying by station

cov_matrix[ns] sigmas_e;        // Covariance matrix for residuals between stations

}

transformed parameters {
vector[nb] mu; // Predicted means for non-missing observations

for (n in 1:nb) {
    // Calculate the predicted mean for each non-missing observation
    mu[n] = alphas[site[n]] + dot_product(x[year[n], ], betas[, site[n]]);
}

}

model {
matrix[ns, ns] L_e; // Cholesky factor of the covariance matrix

// Priors
alphas ~ normal(0, 100);             // Priors for station means
to_vector(betas) ~ normal(0, 100);   // Priors for station-specific coefficients
sigmas_e ~ inv_wishart(df_e, Wp_e);  // Prior for the covariance matrix

// Cholesky decomposition of the covariance matrix
L_e = cholesky_decompose(sigmas_e);

// Likelihood
y ~ multi_normal_cholesky(mu, L_e);  // Ensure mu is correct size and matches the covariance

}

Iā€™m copying your model below but formatted for better legibility using the method I described above.

data {
int<lower=0> nb; // Total number of observations (non-missing)
int<lower=0> ns; // Number of streamflow stations
int<lower=0> nt; // Number of years
int<lower=0> np; // Number of predictors (tree-rings)

matrix[nt, np] x;               // Predictor matrix (no missing values)
vector[nb] y;                  // Response vector (non-missing values only)

int<lower=1, upper=ns> site[nb]; // Index for streamflow stations (1 to ns)
int<lower=1, upper=nt> year[nb]; // Index for years (1 to nt)

matrix[ns, ns] Wp_e;            // Scale matrix for the inverse-Wishart prior
}

transformed data {
real<lower=0> df_e; // Degrees of freedom for the inverse-Wishart distribution
df_e = ns + 1;
}

parameters {
vector[ns] alphas; // Mean for each streamflow station
matrix[np, ns] betas; // Coefficients for predictors, varying by station

cov_matrix[ns] sigmas_e;        // Covariance matrix for residuals between stations
}

transformed parameters {
vector[nb] mu; // Predicted means for non-missing observations

for (n in 1:nb) {
    // Calculate the predicted mean for each non-missing observation
    mu[n] = alphas[site[n]] + dot_product(x[year[n], ], betas[, site[n]]);
}
}

model {
matrix[ns, ns] L_e; // Cholesky factor of the covariance matrix

// Priors
alphas ~ normal(0, 100);             // Priors for station means
to_vector(betas) ~ normal(0, 100);   // Priors for station-specific coefficients
sigmas_e ~ inv_wishart(df_e, Wp_e);  // Prior for the covariance matrix

// Cholesky decomposition of the covariance matrix
L_e = cholesky_decompose(sigmas_e);

// Likelihood
y ~ multi_normal_cholesky(mu, L_e);  // Ensure mu is correct size and matches the covariance
}

This error is telling you that y has 123 elements (nb) while L_e only has 4 (ns). I think your approach of flattening your data into one long vector is making this more difficult. I would instead pass y as an array of vectors or a matrix.

I used Y as one long vector with length nb=123 because I removed the missing data (NAs)

You can estimate these models with missing data, but it requires slicing the mean vector and covariance matrix at each step, which can get complex. Iā€™m not even sure if that would work for the Cholesky decomposition version (someone can clarify here), so you might have to switch to y[n] ~ multi_normal(mu, Sigma)

With that in mind, Iā€™d recommend that you get your model running with complete data and then figuring out the slicing later. So restructure Y either as a matrix (matrix[N,M] Y) or an array of vectors (array[N] vector[M] Y) as described above. You may still have other errors to resolve (I havenā€™t looked closely at the rest of your model), but that should get you past this initial error.

1 Like

Thank you so much