Multivariate Linear Regression and Seemingly Unrelated Regression SUR

alialsarraji · September 21, 2020, 4:05am

HI

I have that looks like the Seemingly Unrelated Regression. I want to write this code in Stan.

The data that I have is:

I have 6 Reaches, each reach has 6 dependent variables (y’s) and each y has different indecent variables like this:
R1(Reach1)
y1 = x1 + x2 +x3
y2 = x1+x2 + x3 + x4
y3 = x5
y4 = x4
y5 = x1 + x3
y6 = x1 + x2

The rest of reaches has the same models in reach 1, how to specify the in Stan and R, does anyone has idea how to do that? I’ve checked the manual but it is not help.

The second problem is
Multivariate regression one reach and four y’s like this:
y1 = x1 + x2 + x3 +x4 +x5
y2 = x1 + x2 + x3 +x4 +x5
y3 = x1 + x2 + x3 +x4 +x5
y4 = x1 + x2 + x3 +x4 +x5

How to declare the N, J, K …etc inside the Stan code and match with R code?

andrjohns · September 21, 2020, 7:19am

For some guidance on setting up these multivariate/seemingly-unrelated regressions see this section of the Users Guide: https://mc-stan.org/docs/2_24/stan-users-guide/multivariate-outcomes.html

alialsarraji · September 22, 2020, 2:14am

Thank you. I know that but could you explicit to me some symbols
In the same example that you recommended to me what N, J, K represent for?
N = nrow(dataframe) ### in R cod
J ?
K?
which of these are represent ncol(y), ncol(x)?

andrjohns · September 22, 2020, 2:21am

Hi Ali,

N is the number of individuals (nrow(y))
J is the number of covariates (ncol(x))
K is the number of outcomes per individual (ncol(y))

alialsarraji · September 22, 2020, 10:20am

Hi Andrew
Thank you
I did this code in multivariate linear regression but I couldn’t get the results and when I check the Stan code, the code is syntactically correct. I don’t know why doesn’t work. The code is:

R code

dat <- within(list(), {
y <- as.matrix(dplyr::select(df, F1, F2, F3, F4))
X <- model.matrix(~ 0 + Total_Fines + TOC + Total_solids, data = df)
N <- nrow(y)
K <- ncol(y)
P <- ncol(X)
alpha_loc <- rep(0, K)
alpha_scale <- rep(10, K)
beta_loc <- matrix(0, K, P)
beta_scale <- matrix(2.5, K, P)
Sigma_corr_shape <- 2
Sigma_scale_scale <- 5

})

Stan code

data {
// multivariate outcome
int N;
int K;
vector[K] y[N];
// covariates
int P;
vector[P] X[N];
// prior
vector[K] alpha_loc;
vector[K] alpha_scale;
vector[P] beta_loc[K];
vector[P] beta_scale[K];
real Sigma_corr_shape;
real Sigma_scale_scale;
}
parameters {
// regression intercept
vector[K] alpha;
// regression coefficients
vector[P] beta[K];
// Cholesky factor of the correlation matrix
cholesky_factor_corr[K] Sigma_corr_L;
vector[K] Sigma_scale;
// student-T degrees of freedom

}
transformed parameters {
vector[K] mu[N];
matrix[K, K] Sigma;
// covariance matrix
Sigma = crossprod(diag_pre_multiply(Sigma_scale, Sigma_corr_L));
for (i in 1:N) {
for (k in 1:K) {
mu[i, k] = alpha[k] + dot_product(X[i], beta[k]);
}
}
}
model {
for (k in 1:K) {
alpha[k] ~ normal(alpha_loc[k], alpha_scale[k]);
beta[k] ~ normal(beta_loc[k], beta_scale[k]);
}

Sigma_scale ~ cauchy(0., Sigma_scale_scale);
Sigma_corr_L ~ lkj_corr_cholesky(Sigma_corr_shape);
y ~ multi_normal_cholesky(mu, Sigma);

}

andrjohns · September 22, 2020, 12:02pm

There are two issues that I see here. First, you haven’t declared a lower bound for your variance parameter (Sigma_scale), so your model is giving negative variances. You need to change the declaration to:

vector<lower=0>[K] Sigma_scale;

Additionally, you’re taking the cross-product for Sigma when you shouldn’t, since you just need the cholesky factor. You only need:

Sigma = diag_pre_multiply(Sigma_scale, Sigma_corr_L);

alialsarraji · September 22, 2020, 1:34pm

Thank Andrew

I still have the same issue after modification.

andrjohns · September 22, 2020, 1:36pm

Can you give me more information about the issue? Are the results wrong or something else?

alialsarraji · September 22, 2020, 1:57pm

No results, the code just run for 3 seconds and then stop. When I summarize the model, I got this:
summary(fit.model)
Length Class Mode
1 stanmodel S4

andrjohns · September 22, 2020, 2:06pm

Can you run:

library(rstan)
example(stan_model,run.dontrun=T,verbose=T)

And post any lines that start with error:?

alialsarraji · September 22, 2020, 2:12pm

No error messages, just warning message:In find.package(package, lib.loc, verbose = verbose) :
package ‘base’ found more than once, using the first from
“/Library/Frameworks/R.framework/Resources/library/base”,
“/Library/Frameworks/R.framework/Versions/4.0/Resources/library/base”

andrjohns · September 22, 2020, 2:14pm

Can you post the command that you’re using to run your model?

alialsarraji · September 22, 2020, 2:14pm

fit.model <- stan_model(“stanTest.stan”)

andrjohns · September 22, 2020, 2:19pm

That would be the issue. stan_model just compiles the syntax to a model, it doesn’t do any sampling. You need to use the stan command and specify the list of data.

Your call should be:

fit.model <- stan(“stanTest.stan”, data = your_data)

alialsarraji · September 22, 2020, 2:46pm

This is the error message:
fit.model <- stan(“stanTest.stan”, data = dat)
hash mismatch so recompiling; make sure Stan code ends with a blank line
Error in mod$fit_ptr() :
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=X; position=0; dims declared=(171,3); dims found=(162,3) (in ‘model5f6f630714d2_stanTest’ at line 8)

failed to create the sampler; sampling not done

andrjohns · September 23, 2020, 12:32am

This errors means that you declared X has having 171 rows, but the data that you passed to Stan only has 162 rows, so you need to check your data

alialsarraji · September 23, 2020, 12:59am

I have NA values inside the data frame may be that the reason.

alialsarraji · September 23, 2020, 6:34am

It works right now, thank you so much.
I have some technical questions, please

When can I do scale() function for the predictors?
How can I take log for Ys, should I do that in Stan code or R code?
What is the criteria to select n value in LKJ distribution, what a pre-calculation should I do to select the n value, and values of Cauchy distribution?
What do you think, should I use LKJ or Wishart as prior distribution, in the same time, should I use
cholesky_factor_corr or cholesky_factor_cov?
Is this command correct if I want to calculate the predicted y to fit between predicted vs measured

generated quantities {
 real y_rep[N];

 for (n in 1:N) {
 y_rep[n] = multi_normal_rng(X[n], sigma);
 }

alialsarraji · September 24, 2020, 4:18am

I have some technical questions, please

When can I do scale() function for the predictors?
How can I take log for Ys, should I do that in Stan code or R code?
What is the criteria to select n value in LKJ distribution, what a pre-calculation should I do to select the n value, and values of Cauchy distribution?
What do you think, should I use LKJ or Wishart as prior distribution, in the same time, should I use
cholesky_factor_corr or cholesky_factor_cov?
Is this command correct if I want to calculate the predicted y to fit between predicted vs measured

generated quantities {
 real y_rep[N];

 for (n in 1:N) {
 y_rep[n] = multi_normal_rng(X[n], sigma);
 }

Topic		Replies	Views
Multivariate Model Modeling	2	455	September 24, 2020
Seemingly Unrelated Regression example stan manual Modeling fitting-issues , specification	1	488	February 25, 2023
Feedback requested on possible subtle pathology of typical multivariate specification (or maybe bug?) Modeling	20	1538	March 31, 2021
Divergent transitions in a multivariate multilevel regression model Modeling fitting-issues	17	1403	August 25, 2018
Stan_mvmer for seemingly unrelated regressions? rstanarm	3	1710	April 19, 2018

Multivariate Linear Regression and Seemingly Unrelated Regression SUR

R code

Stan code

Related topics