Multivariate Normal Distribution with different values for N1 and N2

i am new to RStan and trying to code this multivariate model up on RStan but it takes very long to load so i am assuming that there is probably a better code.

Y1 = (Y1.1,Y1.2 . . . , Y1.n1).
Y1.i ∼ N (θ1, 1) for all 1 ≤ i ≤ n1.
A prior distribution for θ1 is N (0, λ1^-1).
Y1 = (Y2.1,Y2.2 . . . , Y2.n2). Y2.i~ N (θ1 + θ2, 1) for all 1 ≤ i ≤ n2.
The prior distribution on θ2 is N (0, λ2^-1).
The value of θ1* is set as 0, the value of θ2* is set as 1.
n1 is set as 100 and n2 = 1000,
Furthermore, we use λ1 = 1 and λ2 = 100.
Mean of Y1 is 0.06 and a mean of Y2 is to 1.04.

This is my R code so far:

options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

N1 <-100
N2 <-1000
Y1 <- rnorm(N1, mean = 0, sd = 1)
Y2 <- rnorm(N2, mean = 1, sd = 1)
lambda1 <- 1
lambda2 <- 100
model_data <- list(N1=N1, N2 = N2, Y1 = Y1,Y2 = Y2, lambda1 = lambda1, lambda2 = lambda2)
fit1 <- stan(file = ‘model2.stan’, data = model_data)

And this is my code for model2.stan

data { int<lower=0> N1;
int<lower=0> N2;
vector[N1] Y1;
vector[N2] Y2;
real<lower=0> lambda1;
real<lower=0> lambda2;

transformed data{

matrix[2,2] lambda = [[lambda1,0],[0,lambda2]];

parameters { vector[2]theta;


transformed parameters {
row_vector[2]x1_row = [1,0];
matrix[N1,2]X1 = rep_matrix(x1_row,N1);
row_vector[2]x2_row = [1,1];
matrix[N2,2]X2 = rep_matrix(x2_row,N2);
vector[2]mu = [0,0]’;
vector[N1]I1 = rep_vector(1,N1);
vector[N2]I2 = rep_vector(1,N2);


model { theta ~ multi_normal(mu, lambda);
Y1 ~ multi_normal(X1 theta,diag_matrix(I1));
Y2 ~ multi_normal(X2

Is there a better/more effective way i can do this? Thanks for your help!

Can I just clarify your model? So you’re modelling:

Y_1 \sim N(\theta_1,1)
Y_2 \sim N(\theta_1 + \theta_2,1)
\theta_1 \sim N(0,\lambda_1)
\theta_2 \sim N(0,\lambda_2)

Is that right?


Its (λ1)^-1 and ((λ2)^-1 for the priors of θ1 and θ2, sorry for the formatting! Other than that the rest is correct

No worries! Just to double-check, Stan models the normal distribution as N(mean, SD), is \lambda_1^{-1} the SD or is this the precision?

Hi Andrew!
λ1^-1 is the variance, λ1 is the precision!

Great! So a couple things before I post the full model. As I mentioned above, Stan parameterises the normal distribution using the standard deviation, so the priors for theta will now look like:

\theta_1 \sim N(0,\sqrt{\lambda_1^{-1}})

Additionally, if you don’t have covariances then it is much more efficient to just use the normal distribution, rather than the multi_normal distribution (which is likely where a lot of your slowdown came from).

The Stan code for your model would look something like:

data {
  int<lower=0> N1;
  int<lower=0> N2;
  vector[N1] Y1;
  vector[N2] Y2;
  real<lower=0> lambda1;
  real<lower=0> lambda2;

transformed data {
  vector[2] lambda = sqrt(inv([lambda1, lambda2]'));

parameters {
  vector[2] theta;

model {
  theta ~ normal(0, lambda);

  Y1 ~ normal(theta[1], 1);
  Y2 ~ normal(theta[1] + theta[2], 1);

Let me know if you’d like me to explain any of the model specification, or if you have issues running it with your data

Hi Andrew, thank you so much for your help, the model looks good!

Can i also ask, how do i plot the density curve of θ1 and θ2?

This is what i have tried:
fit1 <- stan(file = ‘model3.stan’, data = model_data)
my_samples1 <- extract(fit1)

However, i got this:
Error in density.default(my_samples1$theta[1]) :
need at least 2 points to select a bandwidth automatically

plot(density(my_samples1$theta)) gives me the density curve for θ2 only.

Apologies for any inconvenience caused!

Great! This is built into the rstan package and can be called using stan_dens(fit1)

For more flexibility and plotting options, you can also look at the bayesplot package:

Alright, thank you so much for your help i really appreciate it! God bless and take care!

1 Like