Bayesian Multiple regression with Stan

Lazhar-Belkhiri · September 25, 2024, 4:24pm

Could you please give me your sussing and corrections of this Stan code
When I check this code I received this error:

Error in stanc(filename, allow_undefined = TRUE) : 0
Semantic error in 'string', line 36, column 8 to column 46:
   -------------------------------------------------
    34:          }
    35:          // Likelihood
    36:          y[n] ~ multi_normal_cholesky(mu, L_e);
                 ^
    37:      }
    38:  }
   -------------------------------------------------

Ill-typed arguments supplied to function 'multi_normal_cholesky':
(real, vector, matrix)
Available signatures:
(vector, array[] row_vector, matrix) => real
  The first argument must be vector but got real
(vector, row_vector, matrix) => real
  The first argument must be vector but got real
(vector, array[] vector, matrix) => real
  The first argument must be vector but got real
(vector, vector, matrix) => real
  The first argument must be vector but got real
(row_vector, array[] row_vector, matrix) => real
  The first argument must be row_vector but got real
(Additional signatures omitted)

data {
    int<lower=0> N;               // Total number of valid observations
    int<lower=0> K;               // Number of predictors (tree-rings)
    int<lower=0> M;               // Number of unique streamflow stations
    matrix[N, K] X;               // Predictor matrix (only valid rows)
    vector[N] y;                  // Flattened response vector (non-missing responses)
    int<lower=1, upper=M> index[N]; // Index for streamflow stations (1 to M)
}

transformed data {
    real<lower=0> df_e;           // Degrees of freedom for the inverse Wishart distribution
    df_e = M + 1;                 // Degrees of freedom for the inverse Wishart distribution
}

parameters {
    vector[M] alphas;             // Mean for each streamflow station
    matrix[K, M] betas;           // Coefficients for predictors, varying by station
    cov_matrix[M] sigmas_e;       // Covariance matrix for residuals
}

transformed parameters {
    cholesky_factor_cov[M] L_e;   // Cholesky factor of the covariance matrix
    L_e = cholesky_decompose(sigmas_e); // Cholesky decomposition of the covariance matrix
}

model {
    // Priors
    alphas ~ normal(0, 100);
    to_vector(betas) ~ normal(0, 100);
    sigmas_e ~ inv_wishart(df_e, diag_matrix(rep_vector(1, M)));

    // Likelihood
    for (n in 1:N) {
        vector[M] mu; // Initialize mean vector for the current observation
        mu = alphas; // Start with alphas

        // Compute the contribution from the predictors
        for (m in 1:M) {
            mu[m] += dot_product(X[n], betas[, m]); // Each entry of mu
        }

        // Likelihood
        y[n] ~ multi_normal_cholesky(mu, L_e);
    }
}

[edit: escaped code]

simonbrauer · September 25, 2024, 5:33pm

Welcome to the Discourse! The issue you’re having is that you’re modeling a single value (y[n]) using a distribution that needs a vector of values.

Maybe you meant to pass y as a matrix?

At least, that’s what the error is telling you; you may have other bugs that are not immediately apparently.

Lazhar-Belkhiri · September 25, 2024, 8:05pm

Dear Simonbrauer
Thank you so much for your quick reply
Would you please explain more your suggestion?
Best regards

simonbrauer · September 25, 2024, 11:12pm

Sure. The first part of the error is saying that you tried to call the function multi_normal_cholesky with a three arguments: a (single value) real, a vector, and a matrix. However, there is no implementation of that function that takes those three kinds of arguments.

The second part of the error is listing out all of the possible type signatures that are implement. For example,

means that you can call multi_normal_cholesky with a vector, another vector, and a matrix. This makes sense, since multi_normal_cholesky is a multivariate distribution, and so the first argument (y in your case) needs to be the same length as the second argument (mu).

There are multiple ways of doing this. But I’ll just copy your code and illustrate what I had in mind. Note how y is now an array of vectors, so y[n] refers to a vector of length M.

data {
  int<lower=0> N; // Total number of valid observations
  int<lower=0> K; // Number of predictors (tree-rings)
  int<lower=0> M; // Number of unique streamflow stations
  matrix[N, K] X; // Predictor matrix (only valid rows)
  array[N] vector[M] y; // Flattened response vector (non-missing responses)
  int<lower=1, upper=M> index[N]; // Index for streamflow stations (1 to M)
}
transformed data {
  real<lower=0> df_e; // Degrees of freedom for the inverse Wishart distribution
  df_e = M + 1; // Degrees of freedom for the inverse Wishart distribution
}
parameters {
  vector[M] alphas; // Mean for each streamflow station
  matrix[K, M] betas; // Coefficients for predictors, varying by station
  cov_matrix[M] sigmas_e; // Covariance matrix for residuals
}
transformed parameters {
  cholesky_factor_cov[M] L_e; // Cholesky factor of the covariance matrix
  L_e = cholesky_decompose(sigmas_e); // Cholesky decomposition of the covariance matrix
}
model {
  // Priors
  alphas ~ normal(0, 100);
  to_vector(betas) ~ normal(0, 100);
  sigmas_e ~ inv_wishart(df_e, diag_matrix(rep_vector(1, M)));
  
  // Likelihood
  for (n in 1:N) {
      vector[M] mu; // Initialize mean vector for the current observation
      mu = alphas; // Start with alphas
  
      // Compute the contribution from the predictors
      for (m in 1:M) {
          mu[m] += dot_product(X[n], betas[, m]); // Each entry of mu
      }
  
      // Likelihood
      y[n] ~ multi_normal_cholesky(mu, L_e);
  }
}

As a side-note, you can surround your code in ```stan ``` to make it highlight syntax. For example.

```stan
data{
vector[N] Y;
}
```

becomes

data{
   vector[N] Y;
}

Lazhar-Belkhiri · September 26, 2024, 10:42am

Thank you so much for your help

Bob_Carpenter · September 26, 2024, 9:59pm

Aside from the coding issue, which @simonbrauer correctly diagnosed, there’s a computational issue in this model.

Lazhar-Belkhiri:

parameters {
   cov_matrix[M] sigmas_e;       // Covariance matrix for residuals
}
transformed parameters {
    cholesky_factor_cov[M] L_e = cholesky_decompose(sigmas_e); 
}
model {
    y[n] ~ multi_normal_cholesky(mu, L_e);
}

You’re better off here just using regular multi_normal and not doing the Cholesky factorization yourself. The multi-normal does it already.

Given that you’re using a unit matrix in the inverse Wishart prior (i.e., not something informative), I would suggest instead decomposing the prior into an LKJ prior on the correlation matrix (shrink to unit correlation) plus an independent prior on the scales. Here’s a description of why:

http://www.stat.columbia.edu/~gelman/research/unpublished/Visualization.pdf

Aside from the applied modeling motivation, it’s a whole lot faster. The Cholesky factor operations are quadratic whereas the full covariance operations are cubic and with this approach you don’t ever need to create the full covariance matrix. There’s a description in the User’s Guide chapter on regression of how to code it in Stan. We should really introduce Cholesky-parameterized inverse Wisharts but we don’t use them a lot ourselves given @bgoodri’s paper linked above.

this can probably be sped up with some of our linear algebra and compound operations like dot_product_rows. At the very least, if you’re using in this form, it’ll be much more efficient to declare things this way because no copying will be required to extract a row of X and column of betas.

array[N] row_vector[K] X;
array[M] vector[K] betas_t;
for (m in 1:M) {
  mu[m] += X[n] * betas_t[m];
}

I used betas_t because it’s the transpose of your original betas. X keeps the same shape, just a different declaration.

spinkney · September 27, 2024, 1:23am

We do have this. Covariance Matrix Distributions

In fact, you helped with the docs and review!

Aida · September 27, 2024, 1:52pm

The error is due to mu being a real instead of a vector[M]. Change it to vector[M] and make sure L_e is correctly set as a Cholesky factor. That should fix the issue!

Bob_Carpenter · September 27, 2024, 9:36pm

I knew it was a plan, but I hadn’t realized it’d gotten all the way through to being released! This is super cool. I’ve wanted to have this for years!

Lazhar-Belkhiri · September 30, 2024, 3:18pm

I convert Y to a data frame, remove NAs, and flatten
In this example, I updated Stan Model with Cholesky Decomposition
But still not working I received this errors:
“SAMPLING FOR MODEL ‘anon_model’ NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: multi_normal_cholesky_lpdf: Size of random variable (123) and rows of covariance parameter (4) must match in size (in ‘anon_model’, line 43, column 4 to column 39)
[1] “Error : Exception: multi_normal_cholesky_lpdf: Size of random variable (123) and rows of covariance parameter (4) must match in size (in ‘anon_model’, line 43, column 4 to column 39)””
###########################################################################
Here the different steps:
Y ← readRDS(‘data/data.ann.stramflow.rds’) %>% .[-c(1,5:36)]%>% as.matrix()
X ← readRDS(‘data/data.treerings.rds’) %>% .[-c(1,4:27)] %>%
mutate_all(~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x)) %>% as.matrix()
nt ← nrow(Y)
ns ← ncol(Y)
np ← ncol(X)

Convert Y to a data frame, remove NAs, and flatten

y_na ← Y %>%
as.data.frame() %>%
mutate(row = row_number()) %>%
pivot_longer(-row, names_to = “column”, values_to = “value”) %>%
filter(!is.na(value)) %>%
mutate(column = match(column, colnames(Y))) %>% # Convert column names to column numbers
set_colnames(c(‘year’,‘site’,‘flow’))

nb = nrow(y_na)

Data list for Stan

stan_data ← list(nb = nb, ns = ns, nt = nt, np = np, x = X, y = y_na$flow, site = y_na$site, year = y_na$year, Wp_e = diag(ns))

stan_model_code ← ‘/Users/samir/Documents/Paleo_hydrology_Algeria/Bayesian_model_rcode/stan_np.stan’
stan_model ← stan(file = stan_model_code, data = stan_data, #init = init_list,
iter = 1000, chains = 1,
cores = 8, control = list(adapt_delta = .9, max_treedepth = 15))
###########################################################################
Here the Stan code:
data {
int<lower=0> nb; // Total number of observations (non-missing)
int<lower=0> ns; // Number of streamflow stations
int<lower=0> nt; // Number of years
int<lower=0> np; // Number of predictors (tree-rings)

matrix[nt, np] x;               // Predictor matrix (no missing values)
vector[nb] y;                  // Response vector (non-missing values only)

int<lower=1, upper=ns> site[nb]; // Index for streamflow stations (1 to ns)
int<lower=1, upper=nt> year[nb]; // Index for years (1 to nt)

matrix[ns, ns] Wp_e;            // Scale matrix for the inverse-Wishart prior

}

transformed data {
real<lower=0> df_e; // Degrees of freedom for the inverse-Wishart distribution
df_e = ns + 1;
}

parameters {
vector[ns] alphas; // Mean for each streamflow station
matrix[np, ns] betas; // Coefficients for predictors, varying by station

cov_matrix[ns] sigmas_e;        // Covariance matrix for residuals between stations

}

transformed parameters {
vector[nb] mu; // Predicted means for non-missing observations

for (n in 1:nb) {
    // Calculate the predicted mean for each non-missing observation
    mu[n] = alphas[site[n]] + dot_product(x[year[n], ], betas[, site[n]]);
}

}

model {
matrix[ns, ns] L_e; // Cholesky factor of the covariance matrix

// Priors
alphas ~ normal(0, 100);             // Priors for station means
to_vector(betas) ~ normal(0, 100);   // Priors for station-specific coefficients
sigmas_e ~ inv_wishart(df_e, Wp_e);  // Prior for the covariance matrix

// Cholesky decomposition of the covariance matrix
L_e = cholesky_decompose(sigmas_e);

// Likelihood
y ~ multi_normal_cholesky(mu, L_e);  // Ensure mu is correct size and matches the covariance

}

simonbrauer · October 1, 2024, 1:41pm

I’m copying your model below but formatted for better legibility using the method I described above.

data {
int<lower=0> nb; // Total number of observations (non-missing)
int<lower=0> ns; // Number of streamflow stations
int<lower=0> nt; // Number of years
int<lower=0> np; // Number of predictors (tree-rings)

matrix[nt, np] x;               // Predictor matrix (no missing values)
vector[nb] y;                  // Response vector (non-missing values only)

int<lower=1, upper=ns> site[nb]; // Index for streamflow stations (1 to ns)
int<lower=1, upper=nt> year[nb]; // Index for years (1 to nt)

matrix[ns, ns] Wp_e;            // Scale matrix for the inverse-Wishart prior
}

transformed data {
real<lower=0> df_e; // Degrees of freedom for the inverse-Wishart distribution
df_e = ns + 1;
}

parameters {
vector[ns] alphas; // Mean for each streamflow station
matrix[np, ns] betas; // Coefficients for predictors, varying by station

cov_matrix[ns] sigmas_e;        // Covariance matrix for residuals between stations
}

transformed parameters {
vector[nb] mu; // Predicted means for non-missing observations

for (n in 1:nb) {
    // Calculate the predicted mean for each non-missing observation
    mu[n] = alphas[site[n]] + dot_product(x[year[n], ], betas[, site[n]]);
}
}

model {
matrix[ns, ns] L_e; // Cholesky factor of the covariance matrix

// Priors
alphas ~ normal(0, 100);             // Priors for station means
to_vector(betas) ~ normal(0, 100);   // Priors for station-specific coefficients
sigmas_e ~ inv_wishart(df_e, Wp_e);  // Prior for the covariance matrix

// Cholesky decomposition of the covariance matrix
L_e = cholesky_decompose(sigmas_e);

// Likelihood
y ~ multi_normal_cholesky(mu, L_e);  // Ensure mu is correct size and matches the covariance
}

This error is telling you that y has 123 elements (nb) while L_e only has 4 (ns). I think your approach of flattening your data into one long vector is making this more difficult. I would instead pass y as an array of vectors or a matrix.

simonbrauer:

for (n in 1:N) {
      vector[M] mu; // Initialize mean vector for the current observation
      mu = alphas; // Start with alphas
  
      // Compute the contribution from the predictors
      for (m in 1:M) {
          mu[m] += dot_product(X[n], betas[, m]); // Each entry of mu
      }
  
      // Likelihood
      y[n] ~ multi_normal_cholesky(mu, L_e);
  }

Lazhar-Belkhiri · October 1, 2024, 10:29pm

I used Y as one long vector with length nb=123 because I removed the missing data (NAs)

simonbrauer · October 2, 2024, 7:51pm

You can estimate these models with missing data, but it requires slicing the mean vector and covariance matrix at each step, which can get complex. I’m not even sure if that would work for the Cholesky decomposition version (someone can clarify here), so you might have to switch to y[n] ~ multi_normal(mu, Sigma)

With that in mind, I’d recommend that you get your model running with complete data and then figuring out the slicing later. So restructure Y either as a matrix (matrix[N,M] Y) or an array of vectors (array[N] vector[M] Y) as described above. You may still have other errors to resolve (I haven’t looked closely at the rest of your model), but that should get you past this initial error.

Lazhar-Belkhiri · October 3, 2024, 11:18am

Thank you so much

Topic		Replies	Views
Stan model General example-models	8	657	August 9, 2022
Multiple logistic regression Modeling rstan	1	575	January 18, 2022
Problems adjusting Linear Regression at Stan Modeling rstan	3	805	September 29, 2020
Multivariate Bayesian Modelling via Stan Modeling rstan	4	519	April 28, 2021
Model Problems and Syntax Error Modeling rstan , fitting-issues	1	410	August 29, 2021

Bayesian Multiple regression with Stan

Convert Y to a data frame, remove NAs, and flatten

Data list for Stan

Related topics