Hello!
I am presenting a correlation model with some extra generated statistics in the generated quantities box. The model runs fine if the sample sizes are equal and I present the data as a 2 column vector in the data blox. However, I would now like to run the model with unbalanced sample sizes. Prior to posting this, I read chapter 11 and 16 in the Stan manual and I think the model is close; however, I can’t quite put my finger as to wear it is getting caught up.
I also read this post on missing data and used it as a guide.
I understand what the error message (see below) means; however, I can’t seem to figure out a way to fix it. If I turn x into a vector I can no longer add a_obs and a_mis in the transformed parameter block.
A found another post suggestion to make x a row vector, which also did not end up with a solution.
If i declare x as vector[2] x[n] in the data block the row assignments for x in transformed parameters results in errors d/t illegal expressions in the transformed parameter box again.
Below is my model and some simulated data.
use
library(rstan)
library(MASS)
#making up data
a <- mvrnorm(n=50, 0, 4)
b <- mvrnorm(n=45, 0, 3.3)
a_mis <- mvrnorm(n=5,0, 4)
standata <- list(a_obs <- as.vector(a),
b_obs <- as.vector(b),
N_a_obs <- nrow(a),
N_a_mis <- nrow(a_mis),
N_b_obs <- nrow(b),
a_mis <- a_mis)
#initial start values
myinits <- list(
list(r=.5, mu=c(1, 2), sigma=c(1, 0.1)),
list(r=-.5, mu=c(0, -2), sigma=c(.1, 1)),
list(r=.9, mu=c(.5, -0.5), sigma=c(0.5, 5)),
list(r=-.9, mu=c(-1, 3), sigma=c(0.001, 3)))
# parameters to be monitored:
parameters <- c("r", "mu", "sigma","T", "bias", "varErr", "rmsd", "diffsigma")
library(rstan)
model <- "
// Pearson Correlation
data {
int<lower=0> N_a_obs; //number of observed a
int<lower=0> N_a_mis; //number of missing a
int<lower=0> N_b_obs;
vector [N_a_obs] a_obs;
vector [N_b_obs] b_obs;
}
transformed data {
int<lower=0> N = N_a_obs + N_a_mis;
}
parameters {
vector[N_a_mis] a_mis;
vector[2] mu;
vector<lower=0>[2] sigma;
real<lower=-1, upper=1> r;
}
transformed parameters {
matrix[2,48] x;
cov_matrix[2] T;
x[,1] = a_obs;
x[,1] = a_mis;
x[,2] = b_obs;
// Reparameterization
T[1,1] <- square(sigma[1]);
T[1,2] <- r * sigma[1] * sigma[2];
T[2,1] <- r * sigma[1] * sigma[2];
T[2,2] <- square(sigma[2]);
}
model {
// Priors
mu ~ normal(0, 100);
sigma ~ cauchy(0,3);
r ~ uniform(-1, 1);
// Data
x ~ multi_normal(mu, T);
}
generated quantities{
vector<lower=-5>[1] bias;
vector<lower=-5>[1] varErr;
vector<lower=-5>[1] rmsd;
vector<lower=-5>[1] diffsigma;
bias[1] = mu[1] - mu[2];
varErr[1] = sqrt(T[2,2] + T[1,1] - 2*T[1,2]);
rmsd[1] = sqrt(bias[1]^2 + varErr[1]^2);
diffsigma[1] = sigma[1] - sigma[2];
}"
samples<- stan(model_code=model,
data=standata,
init=myinits,
pars=parameters,
iter=1000,
chains=4,
cores = 4,
seed = 100
)
The error I am getting is:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Warning (non-fatal): assignment operator <- deprecated in the Stan language; use = instead.
Warning (non-fatal): assignment operator <- deprecated in the Stan language; use = instead.
Warning (non-fatal): assignment operator <- deprecated in the Stan language; use = instead.
Warning (non-fatal): assignment operator <- deprecated in the Stan language; use = instead.
No matches for:
matrix ~ multi_normal(vector, matrix)
Available argument signatures for multi_normal:
vector ~ multi_normal(vector, matrix)
vector ~ multi_normal(row vector, matrix)
row vector ~ multi_normal(vector, matrix)
row vector ~ multi_normal(row vector, matrix)
vector ~ multi_normal(vector[], matrix)
vector ~ multi_normal(row vector[], matrix)
row vector ~ multi_normal(vector[], matrix)
row vector ~ multi_normal(row vector[], matrix)
vector[] ~ multi_normal(vector, matrix)
vector[] ~ multi_normal(row vector, matrix)
row vector[] ~ multi_normal(vector, matrix)
row vector[] ~ multi_normal(row vector, matrix)
vector[] ~ multi_normal(vector[], matrix)
vector[] ~ multi_normal(row vector[], matrix)
row vector[] ~ multi_normal(vector[], matrix)
row vector[] ~ multi_normal(row vector[], matrix)
require real scalar return type for probability function.
error in 'modelb4d467c49a9b_3a768e768399d96b5f12d9161ad86093' at line 44, column 25
-------------------------------------------------
42: r ~ uniform(-1, 1);
43: // Data
44: x ~ multi_normal(mu, T);
^
45: }
-------------------------------------------------
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model '3a768e768399d96b5f12d9161ad86093' due to the above error.
I understand the correlation model is a simple one, so I am hoping it will also be a good model to learn about missing data in Stan.
Thank you,
Alex Swiderski