# Multivariate Linear Regression and Seemingly Unrelated Regression SUR

HI

I have that looks like the Seemingly Unrelated Regression. I want to write this code in Stan.

The data that I have is:

I have 6 Reaches, each reach has 6 dependent variables (y’s) and each y has different indecent variables like this:
R1(Reach1)
y1 = x1 + x2 +x3
y2 = x1+x2 + x3 + x4
y3 = x5
y4 = x4
y5 = x1 + x3
y6 = x1 + x2

The rest of reaches has the same models in reach 1, how to specify the in Stan and R, does anyone has idea how to do that? I’ve checked the manual but it is not help.

The second problem is
Multivariate regression one reach and four y’s like this:
y1 = x1 + x2 + x3 +x4 +x5
y2 = x1 + x2 + x3 +x4 +x5
y3 = x1 + x2 + x3 +x4 +x5
y4 = x1 + x2 + x3 +x4 +x5

How to declare the N, J, K …etc inside the Stan code and match with R code?

For some guidance on setting up these multivariate/seemingly-unrelated regressions see this section of the Users Guide: https://mc-stan.org/docs/2_24/stan-users-guide/multivariate-outcomes.html

Thank you. I know that but could you explicit to me some symbols
In the same example that you recommended to me what N, J, K represent for?
N = nrow(dataframe) ### in R cod
J ?
K?
which of these are represent ncol(y), ncol(x)?

Hi Ali,

• `N` is the number of individuals (`nrow(y)`)
• `J` is the number of covariates (`ncol(x)`)
• `K` is the number of outcomes per individual (`ncol(y)`)

Hi Andrew
Thank you
I did this code in multivariate linear regression but I couldn’t get the results and when I check the Stan code, the code is syntactically correct. I don’t know why doesn’t work. The code is:

#### R code

dat <- within(list(), {
y <- as.matrix(dplyr::select(df, F1, F2, F3, F4))
X <- model.matrix(~ 0 + Total_Fines + TOC + Total_solids, data = df)
N <- nrow(y)
K <- ncol(y)
P <- ncol(X)
alpha_loc <- rep(0, K)
alpha_scale <- rep(10, K)
beta_loc <- matrix(0, K, P)
beta_scale <- matrix(2.5, K, P)
Sigma_corr_shape <- 2
Sigma_scale_scale <- 5

})

### Stan code

data {
// multivariate outcome
int N;
int K;
vector[K] y[N];
// covariates
int P;
vector[P] X[N];
// prior
vector[K] alpha_loc;
vector[K] alpha_scale;
vector[P] beta_loc[K];
vector[P] beta_scale[K];
real Sigma_corr_shape;
real Sigma_scale_scale;
}
parameters {
// regression intercept
vector[K] alpha;
// regression coefficients
vector[P] beta[K];
// Cholesky factor of the correlation matrix
cholesky_factor_corr[K] Sigma_corr_L;
vector[K] Sigma_scale;
// student-T degrees of freedom

}
transformed parameters {
vector[K] mu[N];
matrix[K, K] Sigma;
// covariance matrix
Sigma = crossprod(diag_pre_multiply(Sigma_scale, Sigma_corr_L));
for (i in 1:N) {
for (k in 1:K) {
mu[i, k] = alpha[k] + dot_product(X[i], beta[k]);
}
}
}
model {
for (k in 1:K) {
alpha[k] ~ normal(alpha_loc[k], alpha_scale[k]);
beta[k] ~ normal(beta_loc[k], beta_scale[k]);
}

Sigma_scale ~ cauchy(0., Sigma_scale_scale);
Sigma_corr_L ~ lkj_corr_cholesky(Sigma_corr_shape);
y ~ multi_normal_cholesky(mu, Sigma);

}

There are two issues that I see here. First, you haven’t declared a lower bound for your variance parameter (`Sigma_scale`), so your model is giving negative variances. You need to change the declaration to:

``````vector<lower=0>[K] Sigma_scale;
``````

Additionally, you’re taking the cross-product for `Sigma` when you shouldn’t, since you just need the cholesky factor. You only need:

``````Sigma = diag_pre_multiply(Sigma_scale, Sigma_corr_L);
``````

Thank Andrew

I still have the same issue after modification.

No results, the code just run for 3 seconds and then stop. When I summarize the model, I got this:
summary(fit.model)
Length Class Mode
1 stanmodel S4

Can you run:

``````library(rstan)
example(stan_model,run.dontrun=T,verbose=T)
``````

And post any lines that start with `error:`?

No error messages, just warning message:In find.package(package, lib.loc, verbose = verbose) :
package ‘base’ found more than once, using the first from
“/Library/Frameworks/R.framework/Resources/library/base”,
“/Library/Frameworks/R.framework/Versions/4.0/Resources/library/base”

Can you post the command that you’re using to run your model?

fit.model <- stan_model(“stanTest.stan”)

That would be the issue. `stan_model` just compiles the syntax to a model, it doesn’t do any sampling. You need to use the `stan` command and specify the list of data.

``````fit.model <- stan(“stanTest.stan”, data = your_data)
``````

This is the error message:
fit.model <- stan(“stanTest.stan”, data = dat)
hash mismatch so recompiling; make sure Stan code ends with a blank line
Error in mod\$fit_ptr() :
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=X; position=0; dims declared=(171,3); dims found=(162,3) (in ‘model5f6f630714d2_stanTest’ at line 8)

failed to create the sampler; sampling not done

This errors means that you declared `X` has having 171 rows, but the data that you passed to Stan only has 162 rows, so you need to check your data

I have NA values inside the data frame may be that the reason.

It works right now, thank you so much.
I have some technical questions, please

1. When can I do scale() function for the predictors?
2. How can I take log for Ys, should I do that in Stan code or R code?
3. What is the criteria to select n value in LKJ distribution, what a pre-calculation should I do to select the n value, and values of Cauchy distribution?
4. What do you think, should I use LKJ or Wishart as prior distribution, in the same time, should I use
cholesky_factor_corr or cholesky_factor_cov?
5. Is this command correct if I want to calculate the predicted y to fit between predicted vs measured
``````generated quantities {
real y_rep[N];

for (n in 1:N) {
y_rep[n] = multi_normal_rng(X[n], sigma);
}``````

I have some technical questions, please

1. When can I do scale() function for the predictors?
2. How can I take log for Ys, should I do that in Stan code or R code?
3. What is the criteria to select n value in LKJ distribution, what a pre-calculation should I do to select the n value, and values of Cauchy distribution?
4. What do you think, should I use LKJ or Wishart as prior distribution, in the same time, should I use
cholesky_factor_corr or cholesky_factor_cov?
5. Is this command correct if I want to calculate the predicted y to fit between predicted vs measured
``````generated quantities {
real y_rep[N];

for (n in 1:N) {
y_rep[n] = multi_normal_rng(X[n], sigma);
}
``````