Passing matrix as initial values

In my fairly complex model (see here), I have parameters which are 2 x k matrices, and for which I need to provide initial conditions. Specifically:

parameters{
 // lots of other parameters
  real
  matrix[2,k] mu1;  
  matrix[2,k] mu2; 
}

model{
 // lots of stuff
 for(i in 1:k){
	  col(mu1, i) ~ multi_normal(mu1_mean, mu1_Sigma);
	  col(mu2, i) ~ multi_normal(mu2_mean, mu2_Sigma);
  }
}

Using rstan, I include them in the inits as follows:

mu1.init <- cbind(x1s, y1s)
mu2.init <- cbind(x1s, y1s)

sampling(mymodel,  inits = list(mu1 = mu1.init, mu2 = mu2.init, ... [other parameters]))

Which doesn’t seem to work … i.e., all the other initial values are fed into it, but for mu1 and mu2 I get the [-2,2] default sampling, which is very far from a good value.

Any ideas why or how to fix?

Finally … I have seen it noted that

vector[k] mu1[2];
vector[k] mu2[2];

is a “better” (slightly more efficient?) way to store and sample with these parameters. In that case, I guess the comparable sampling instruction be:

for(i in 1:k){
	  mu1[i] ~ multi_normal(mu1_mean, mu1_Sigma);
	  mu2[i] ~ multi_normal(mu2_mean, mu2_Sigma);
  }

(Yes?)

But how would you provide the initial value for that? As a 2 x k matrix? A k x 2 matrix? A list with k 2-length vectors?

Separately, the values of these two matrices are drawn from hyper-parameters mu1_mean and mu1_Sigma via the bivariate normal above. The initial values for those parameters are fine (e.g. mu1_mean ~ [-200, 800]). I’m curious why it is that it even initializes those values so far away from the given distribution? I mean - I understand that each parameter gets initialized uniquely, but why not draw those from the distribution that its supposed to be drawn from anyway?

Thanks,
Elie

Hard to comment without seeing your R code as well, but I would suspect code like: mu1.init <- cbind(x1s, y1s) produces n \times 2 matrices, you might want rbind instead.

When you have statements like vector[k] mu1[2], I find it easiest to think about this as an "array of 2 things, where each of the thing is a vector of length k, and when one is subsetting mu1, the fastest running index subsets the array, and the slower running index subsets the thing in each element of the array. So mu1[1] is the first vector of length k. In terms of initialising this, use array in R to build the same size and shape array, something like: mu1_init <- array(data = your_data_here, dim = c(2, k)).

For a specific example, I have a Stan model with parameter vector[2] beta_midpoint[n_times], and in my initial value list I have:

init_list <- list(
... ,
beta_midpoint = array(c(rep(77, n_times), rep(150, n_times)), dim = c(n_times, 2)),
... )

This loop:

Will try and access mu[3:k], which do not exist. You just need to loop over 1:2.

Hi hhau,

Thanks for responses - very helpful, especially your array example. The cbind was a typo in the question - I did try rbind (so the correct dimensions) in the model, so it’s still a mystery why they’re not passing, but I’ve experimented more (with simpler models - below the break) and should be able to figure it out.

But the take home is: matrix[2,k] mu object can be fed from R with a 2 x k matrix. Whereas a vector[2] mu[k] can be fed with a k x 2 matrix.

Or, in more detail, here’s a table of equivalences:

element matrix version vector version
data: matrix[2,n] xy vector[2] xy[n]
parameters: matrix[2,k] mu vector[2] mu[k]
model: for(i in 1:n) col(xy, i) ~ normal(col(mu, id[i]), sigma) for(i in 1:n) xy[i] ~ normal(mu[id[i]], sigma)
data (from R): xy = rbind(x, y) xy = cbind(x, y)
initial values (from R): mu = rbind(mu_x, mu_y) mu = cbind(mu_x, mu_y)

Any computational reason to choose one version over the other - other than the somewhat neater syntax in v. 2?

Elie

ps. Below (for the record) two versions of the stan code for a model where coordinates
{\bf Z} \sim N(\mu_{ind}, \sigma_{ind}) and \mu_{ind} \sim BivarN(\mu_{pop}, \Sigma_{pop})

version 1:

data {
	int<lower=0> n;               // n. total observations
	int k;                        // n. individuals
	matrix[2,n] xy;               // xy-coordinate
	int<lower=1,upper=k> id[n];   // ID of individual
	real inits[4];                // initial values for priors
	int printpars;
}

parameters {
	// population parameters
	vector[2] mu_mean;
  real<lower = 0> mux_sigma;
	real<lower = 0> muy_sigma;
	real<lower=-1, upper=1> rho;
  // individual  parameters	
	real<lower = 0> sigma_ind;
	matrix[2,k] mu_ind;  
}

transformed parameters {
	cov_matrix[2] mu_Sigma;
	mu_Sigma[1,1] = mux_sigma^2;
	mu_Sigma[2,2] = muy_sigma^2;
	mu_Sigma[1,2] = mux_sigma*muy_sigma * rho;
	mu_Sigma[2,1] = mux_sigma*muy_sigma * rho;
}  

model {
//priors
  mu_mean[1] ~ normal(inits[1], 30.0);
  mu_mean[2] ~ normal(inits[2], 30.0);
  mux_sigma ~ lognormal(log(inits[3]), log(30.0));
  muy_sigma ~ lognormal(log(inits[4]), log(30.0));
//sampling
 for(i in 1:k) col(mu_ind, i) ~ multi_normal(mu_mean, mu_Sigma); 
 for(i in 1:n) col(xy, i) ~ normal(col(mu_ind, id[i]), sigma_ind); 
 }

version 2

data {
	int<lower=0> n;               // n. total observations
	int k;                        // n. individuals
	vector[2] xy[n];              // xy-coordinate
	int<lower=1,upper=k> id[n];   // ID of individual
	real inits[4];                // initial values for priors
	int printpars;
}

parameters {
	// population mean
	vector[2] mu_mean;

  real<lower = 0> mux_sigma;
	real<lower = 0> muy_sigma;
	real<lower=-1, upper=1> rho;

  // individual  parameters	
	real<lower = 0> sigma_ind;
	//matrix[2,k] mu_ind;  
	vector[2] mu_ind[k];  
}

transformed parameters {
	cov_matrix[2] mu_Sigma;
  // convert sigmas to usable Variance-Covariance matrix
	mu_Sigma[1,1] = mux_sigma^2;
	mu_Sigma[2,2] = muy_sigma^2;
	mu_Sigma[1,2] = mux_sigma*muy_sigma * rho;
	mu_Sigma[2,1] = mux_sigma*muy_sigma * rho;
}  

model {
	//priors
  mu_mean[1] ~ normal(inits[1], 30.0);
  mu_mean[2] ~ normal(inits[2], 30.0);
 	mux_sigma ~ lognormal(log(inits[3]), log(30.0));
  muy_sigma ~ lognormal(log(inits[4]), log(30.0));

	// sampling
  for(i in 1:k)	mu_ind[i] ~ multi_normal(mu_mean, mu_Sigma);
  for(i in 1:n) xy[i] ~ normal(mu_ind[id[i]], sigma_ind); 
 }

And the R code to simulate data to test:

mu_x <- 50; mu_y <- 100; 
sigma_x <- 30; sigma_y <- 50; rho <- -0.6
Sigma <- rbind(c(sigma_x^2, rho*sigma_x*sigma_y), c(rho*sigma_x*sigma_y, sigma_y^2))

k <- 10L; n <- 200L
sigma_ind <- 5

mu <- rmvnorm(k, c(mu_x, mu_y), sigma = Sigma) %>% t
simdata <- adply(mu, 2, function(mu) data.frame(x = rnorm(n/k, mu[1], sigma_ind),
                                                y = rnorm(n/k, mu[2], sigma_ind))) %>% rename(c(X1 = "id"))

require(ggplot2)
ggplot(simdata, aes(x,y, col = id)) + coord_fixed() + geom_point()

simdata.stan <- with(simdata, list(n=length(x), k=length(unique(id)),
                                   xy = cbind(x,y),
                                   id = as.numeric(id),
                                   inits = c(mean(x), mean(y), sd(x), sd(y)),
                                   printpars = 1))

Update: I found out how to make my initial matrix pass correctly, but have no idea why.
It has something to do with R’s plyr::daply function, but whatever is going on is awfully mysterious. Not quite a bug - but weird.

Basically, here are two different ways to generate the same matrix - some numbers around 200, and some numbers around 500:

simdata <- data.frame(id = rep(1:k, each = n/k), 
                      x = rnorm(n) + 200, 
                      y = rnorm(n) + 500)
mu_init1 <- daply(simdata %>% mutate(id = as.integer(id)), "id", 
                 function(df) df[1,c("x", "y")]) 
mu_init2 <- daply(simdata %>% mutate(id = as.integer(id)), "id", 
                  function(df) df[1,c("x", "y")]) %>% as.numeric %>% matrix(ncol=2)

Both are definitely matrices, with the same values and dimensions:

> is(mu_init1)
[1] "matrix"    "array"     "structure" "vector"   
> is(mu_init2)
[1] "matrix"    "array"     "structure" "vector"   
> mu_init1 == mu_init2
    
id      x    y
  1  TRUE TRUE
  2  TRUE TRUE
  3  TRUE TRUE
  4  TRUE TRUE
  5  TRUE TRUE
  6  TRUE TRUE
  7  TRUE TRUE
  8  TRUE TRUE
  9  TRUE TRUE
  10 TRUE TRUE 

But option 1 does not work as an initial value! Compare a stan run (of the model above - printing the parameters)

SAMPLING FOR MODEL 'test4' NOW (CHAIN 1).
Parameters:
mu_mean: [50,100]
mu_Sigma: [[900,-900],[-900,2500]]
mu_ind: [1.9506,-0.128294],[-1.16584,1.9416],[-0.224931,-1.57477],[-0.0749533,-0.67237],[-1.03259,-1.45989],[0.917404,-0.214269],[-1.89901,0.318395],[-0.170434,1.59988],[1.07838,-1.50049],[1.37672,-1.1417]]

Clearly sampling by default from [-2,2].

Compared to version 2:

SAMPLING FOR MODEL 'test4' NOW (CHAIN 1).
Parameters:
mu_mean: [50,100]
mu_Sigma: [[900,-900],[-900,2500]]
mu_ind: [[200.596,500.14],[200.232,500.321],[198.502,499.32],[198.395,501.109],[201.202,499.859],[198.483,500.505],[199.446,500.206],[201.751,499.874],[199.075,499.813],[198.634,500.465]]

This was a VERY WEIRD (and frustrating) thing! Aside from row names and column names in one that the other doesn’t have, I can’t tell (from within R) what the difference might be.

Anyways - be warned.

I think they are pretty much equivalent here. In general choose the one that lets you vectorise the code / loop over the smaller number of things.

Hard to say, but I think that hunch that == is doing an element wise comparison. The documentation for daply suggests that it returns a data.frame, so I reckon you could probably just get away with an as.matrix on the end of your piping for mu_init2. I don’t know what package is providing that is functionality, but it looks like it’s hiding type information from you that you might want to know.

daply starts with a data frame and returns an array (hence “da”), and yes == does pairwise comparisons. In R, a data frame is a list of equal length elements regardless of the class, and a matrix is a 2-dimensional array where all elements are of the same class - an important distinction.

But - yeah - it’s strange. Even if you add the attributes (column names and row names), v2 is somehow different from v1 - in some very subtle way I don’t know how to detect that lets R recognize both as a matrix, but not STAN:

> require(magrittr); require(plyr)
> 
> n <- 4; k <- 2
> simdata <- data.frame(id = rep(1:k, each = n/k), x = rnorm(n), y = rnorm(n))
> v1 <- daply(simdata %>% mutate(id = as.integer(id)), "id", 
+                  function(df) df[1,c("x", "y")]) 
> v2 <- v1 %>% as.numeric %>% matrix(ncol=2)
>   
> attributes(v2) <- attributes(v1)
> v1
   
id  x          y        
  1 -0.7416209 1.517454 
  2 0.7167674  0.1886133
> v2
   
id           x         y
  1 -0.7416209 1.5174542
  2  0.7167674 0.1886133
> v1 == v2
   
id     x    y
  1 TRUE TRUE
  2 TRUE TRUE
> identical(v1, v2)
[1] FALSE
> identical(is(v1), is(v2))
[1] TRUE

If you know the location and scale of mu1, mu2, you can reparameterize to mu1_raw and define mu = loc1 + scale1 * mu1_raw;