# Passing matrix as initial values

In my fairly complex model (see here), I have parameters which are 2 x k matrices, and for which I need to provide initial conditions. Specifically:

parameters{
// lots of other parameters
real
matrix[2,k] mu1;
matrix[2,k] mu2;
}

model{
// lots of stuff
for(i in 1:k){
col(mu1, i) ~ multi_normal(mu1_mean, mu1_Sigma);
col(mu2, i) ~ multi_normal(mu2_mean, mu2_Sigma);
}
}


Using rstan, I include them in the inits as follows:

mu1.init <- cbind(x1s, y1s)
mu2.init <- cbind(x1s, y1s)

sampling(mymodel,  inits = list(mu1 = mu1.init, mu2 = mu2.init, ... [other parameters]))


Which doesn’t seem to work … i.e., all the other initial values are fed into it, but for mu1 and mu2 I get the [-2,2] default sampling, which is very far from a good value.

Any ideas why or how to fix?

Finally … I have seen it noted that

vector[k] mu1[2];
vector[k] mu2[2];


is a “better” (slightly more efficient?) way to store and sample with these parameters. In that case, I guess the comparable sampling instruction be:

for(i in 1:k){
mu1[i] ~ multi_normal(mu1_mean, mu1_Sigma);
mu2[i] ~ multi_normal(mu2_mean, mu2_Sigma);
}


(Yes?)

But how would you provide the initial value for that? As a 2 x k matrix? A k x 2 matrix? A list with k 2-length vectors?

Separately, the values of these two matrices are drawn from hyper-parameters mu1_mean and mu1_Sigma via the bivariate normal above. The initial values for those parameters are fine (e.g. mu1_mean ~ [-200, 800]). I’m curious why it is that it even initializes those values so far away from the given distribution? I mean - I understand that each parameter gets initialized uniquely, but why not draw those from the distribution that its supposed to be drawn from anyway?

Thanks,
Elie

Hard to comment without seeing your R code as well, but I would suspect code like: mu1.init <- cbind(x1s, y1s) produces n \times 2 matrices, you might want rbind instead.

When you have statements like vector[k] mu1[2], I find it easiest to think about this as an "array of 2 things, where each of the thing is a vector of length k, and when one is subsetting mu1, the fastest running index subsets the array, and the slower running index subsets the thing in each element of the array. So mu1[1] is the first vector of length k. In terms of initialising this, use array in R to build the same size and shape array, something like: mu1_init <- array(data = your_data_here, dim = c(2, k)).

For a specific example, I have a Stan model with parameter vector[2] beta_midpoint[n_times], and in my initial value list I have:

init_list <- list(
... ,
beta_midpoint = array(c(rep(77, n_times), rep(150, n_times)), dim = c(n_times, 2)),
... )


This loop:

Will try and access mu[3:k], which do not exist. You just need to loop over 1:2.

Hi hhau,

Thanks for responses - very helpful, especially your array example. The cbind was a typo in the question - I did try rbind (so the correct dimensions) in the model, so it’s still a mystery why they’re not passing, but I’ve experimented more (with simpler models - below the break) and should be able to figure it out.

But the take home is: matrix[2,k] mu object can be fed from R with a 2 x k matrix. Whereas a vector[2] mu[k] can be fed with a k x 2 matrix.

Or, in more detail, here’s a table of equivalences:

element matrix version vector version
data: matrix[2,n] xy vector[2] xy[n]
parameters: matrix[2,k] mu vector[2] mu[k]
model: for(i in 1:n) col(xy, i) ~ normal(col(mu, id[i]), sigma) for(i in 1:n) xy[i] ~ normal(mu[id[i]], sigma)
data (from R): xy = rbind(x, y) xy = cbind(x, y)
initial values (from R): mu = rbind(mu_x, mu_y) mu = cbind(mu_x, mu_y)

Any computational reason to choose one version over the other - other than the somewhat neater syntax in v. 2?

Elie

ps. Below (for the record) two versions of the stan code for a model where coordinates
{\bf Z} \sim N(\mu_{ind}, \sigma_{ind}) and \mu_{ind} \sim BivarN(\mu_{pop}, \Sigma_{pop})

version 1:

data {
int<lower=0> n;               // n. total observations
int k;                        // n. individuals
matrix[2,n] xy;               // xy-coordinate
int<lower=1,upper=k> id[n];   // ID of individual
real inits[4];                // initial values for priors
int printpars;
}

parameters {
// population parameters
vector[2] mu_mean;
real<lower = 0> mux_sigma;
real<lower = 0> muy_sigma;
real<lower=-1, upper=1> rho;
// individual  parameters
real<lower = 0> sigma_ind;
matrix[2,k] mu_ind;
}

transformed parameters {
cov_matrix[2] mu_Sigma;
mu_Sigma[1,1] = mux_sigma^2;
mu_Sigma[2,2] = muy_sigma^2;
mu_Sigma[1,2] = mux_sigma*muy_sigma * rho;
mu_Sigma[2,1] = mux_sigma*muy_sigma * rho;
}

model {
//priors
mu_mean[1] ~ normal(inits[1], 30.0);
mu_mean[2] ~ normal(inits[2], 30.0);
mux_sigma ~ lognormal(log(inits[3]), log(30.0));
muy_sigma ~ lognormal(log(inits[4]), log(30.0));
//sampling
for(i in 1:k) col(mu_ind, i) ~ multi_normal(mu_mean, mu_Sigma);
for(i in 1:n) col(xy, i) ~ normal(col(mu_ind, id[i]), sigma_ind);
}


version 2

data {
int<lower=0> n;               // n. total observations
int k;                        // n. individuals
vector[2] xy[n];              // xy-coordinate
int<lower=1,upper=k> id[n];   // ID of individual
real inits[4];                // initial values for priors
int printpars;
}

parameters {
// population mean
vector[2] mu_mean;

real<lower = 0> mux_sigma;
real<lower = 0> muy_sigma;
real<lower=-1, upper=1> rho;

// individual  parameters
real<lower = 0> sigma_ind;
//matrix[2,k] mu_ind;
vector[2] mu_ind[k];
}

transformed parameters {
cov_matrix[2] mu_Sigma;
// convert sigmas to usable Variance-Covariance matrix
mu_Sigma[1,1] = mux_sigma^2;
mu_Sigma[2,2] = muy_sigma^2;
mu_Sigma[1,2] = mux_sigma*muy_sigma * rho;
mu_Sigma[2,1] = mux_sigma*muy_sigma * rho;
}

model {
//priors
mu_mean[1] ~ normal(inits[1], 30.0);
mu_mean[2] ~ normal(inits[2], 30.0);
mux_sigma ~ lognormal(log(inits[3]), log(30.0));
muy_sigma ~ lognormal(log(inits[4]), log(30.0));

// sampling
for(i in 1:k)	mu_ind[i] ~ multi_normal(mu_mean, mu_Sigma);
for(i in 1:n) xy[i] ~ normal(mu_ind[id[i]], sigma_ind);
}


And the R code to simulate data to test:

mu_x <- 50; mu_y <- 100;
sigma_x <- 30; sigma_y <- 50; rho <- -0.6
Sigma <- rbind(c(sigma_x^2, rho*sigma_x*sigma_y), c(rho*sigma_x*sigma_y, sigma_y^2))

k <- 10L; n <- 200L
sigma_ind <- 5

mu <- rmvnorm(k, c(mu_x, mu_y), sigma = Sigma) %>% t
simdata <- adply(mu, 2, function(mu) data.frame(x = rnorm(n/k, mu[1], sigma_ind),
y = rnorm(n/k, mu[2], sigma_ind))) %>% rename(c(X1 = "id"))

require(ggplot2)
ggplot(simdata, aes(x,y, col = id)) + coord_fixed() + geom_point()

simdata.stan <- with(simdata, list(n=length(x), k=length(unique(id)),
xy = cbind(x,y),
id = as.numeric(id),
inits = c(mean(x), mean(y), sd(x), sd(y)),
printpars = 1))


Update: I found out how to make my initial matrix pass correctly, but have no idea why.
It has something to do with R’s plyr::daply function, but whatever is going on is awfully mysterious. Not quite a bug - but weird.

Basically, here are two different ways to generate the same matrix - some numbers around 200, and some numbers around 500:

simdata <- data.frame(id = rep(1:k, each = n/k),
x = rnorm(n) + 200,
y = rnorm(n) + 500)
mu_init1 <- daply(simdata %>% mutate(id = as.integer(id)), "id",
function(df) df[1,c("x", "y")])
mu_init2 <- daply(simdata %>% mutate(id = as.integer(id)), "id",
function(df) df[1,c("x", "y")]) %>% as.numeric %>% matrix(ncol=2)


Both are definitely matrices, with the same values and dimensions:

> is(mu_init1)
[1] "matrix"    "array"     "structure" "vector"
> is(mu_init2)
[1] "matrix"    "array"     "structure" "vector"
> mu_init1 == mu_init2

id      x    y
1  TRUE TRUE
2  TRUE TRUE
3  TRUE TRUE
4  TRUE TRUE
5  TRUE TRUE
6  TRUE TRUE
7  TRUE TRUE
8  TRUE TRUE
9  TRUE TRUE
10 TRUE TRUE


But option 1 does not work as an initial value! Compare a stan run (of the model above - printing the parameters)

SAMPLING FOR MODEL 'test4' NOW (CHAIN 1).
Parameters:
mu_mean: [50,100]
mu_Sigma: [[900,-900],[-900,2500]]
mu_ind: [1.9506,-0.128294],[-1.16584,1.9416],[-0.224931,-1.57477],[-0.0749533,-0.67237],[-1.03259,-1.45989],[0.917404,-0.214269],[-1.89901,0.318395],[-0.170434,1.59988],[1.07838,-1.50049],[1.37672,-1.1417]]


Clearly sampling by default from [-2,2].

Compared to version 2:

SAMPLING FOR MODEL 'test4' NOW (CHAIN 1).
Parameters:
mu_mean: [50,100]
mu_Sigma: [[900,-900],[-900,2500]]
mu_ind: [[200.596,500.14],[200.232,500.321],[198.502,499.32],[198.395,501.109],[201.202,499.859],[198.483,500.505],[199.446,500.206],[201.751,499.874],[199.075,499.813],[198.634,500.465]]


This was a VERY WEIRD (and frustrating) thing! Aside from row names and column names in one that the other doesn’t have, I can’t tell (from within R) what the difference might be.

Anyways - be warned.

I think they are pretty much equivalent here. In general choose the one that lets you vectorise the code / loop over the smaller number of things.

Hard to say, but I think that hunch that == is doing an element wise comparison. The documentation for daply suggests that it returns a data.frame, so I reckon you could probably just get away with an as.matrix on the end of your piping for mu_init2. I don’t know what package is providing that is functionality, but it looks like it’s hiding type information from you that you might want to know.

daply starts with a data frame and returns an array (hence “da”), and yes == does pairwise comparisons. In R, a data frame is a list of equal length elements regardless of the class, and a matrix is a 2-dimensional array where all elements are of the same class - an important distinction.

But - yeah - it’s strange. Even if you add the attributes (column names and row names), v2 is somehow different from v1 - in some very subtle way I don’t know how to detect that lets R recognize both as a matrix, but not STAN:

> require(magrittr); require(plyr)
>
> n <- 4; k <- 2
> simdata <- data.frame(id = rep(1:k, each = n/k), x = rnorm(n), y = rnorm(n))
> v1 <- daply(simdata %>% mutate(id = as.integer(id)), "id",
+                  function(df) df[1,c("x", "y")])
> v2 <- v1 %>% as.numeric %>% matrix(ncol=2)
>
> attributes(v2) <- attributes(v1)
> v1

id  x          y
1 -0.7416209 1.517454
2 0.7167674  0.1886133
> v2

id           x         y
1 -0.7416209 1.5174542
2  0.7167674 0.1886133
> v1 == v2

id     x    y
1 TRUE TRUE
2 TRUE TRUE
> identical(v1, v2)
[1] FALSE
> identical(is(v1), is(v2))
[1] TRUE


If you know the location and scale of mu1, mu2, you can reparameterize to mu1_raw and define mu = loc1 + scale1 * mu1_raw;