Possible Mistake in Stan User Guide Section 1.13

I think there are a two typos in Section 1.13 of the Stan User Guide, version 2.25.

For an older User Guide there was an open issue on GitHub as discussed here, but it’s closed now so I don’t really know where to post this.

In the Optimization through Cholesky Factorization section, there is this code in a code block:

transformed parameters {
  matrix[J, K] beta;
  beta = u * gamma + z * diag_post_multiply(L_Omega,tau);
}

Here, z is a matrix, so one row of z is a row vector (later defined to be standard (mv) normally distributed), thus we define a new variable \beta:=\Omega_L\mathrm{diag}(\tau), thus in column vector form: \beta'=\mathrm{diag}(\tau)\Omega_L'z'. But if we the figure out the variance of \beta, we find that \mathbb{V}[\beta']=\mathrm{diag}(\tau)\Omega_L'\Omega_L\mathrm{diag}(\tau). However, \Omega_L'\Omega_L\neq\Omega_L\Omega_L'=\Omega. Thus I think this should read

transformed parameters {
  matrix[J, K] beta;
  beta = u * gamma + z * diag_post_multiply(L_Omega', tau);
}

Another one of those comes a bit later in the lines of

Sigma_beta
  = quad_form_diag(Omega, tau)
  = diag_post_multiply(L_Omega,tau) * diag_post_multiply(L_Omega,tau)'

Here I think this should be
quad_form_diag(Omega, tau)=\mathrm{diag}(\tau)\Omega\mathrm{diag}(\tau)=\mathrm{diag}(\tau)\Omega_L\Omega_L'\mathrm{diag}(\tau)=diag_pre_multiply(tau, Omega_L) * diag_pre_multiply(tau, Omega_L)'

I didn’t test this specific code, but I had the wrong versions in a model of mine and spend too long trying to find out why my estimates were off ever-so-slightly and then I applied the correction and everything works fine.

4 Likes

Shoot, that’s possibly on me. I think I did the most recent edit, just a sec…

(Here’s where my motivation came from; heading over to the docs to see what I messed up…)

1 Like

@jonah Can you take a look? I made my suggestion from a very math naive background, more out of coding aesthetics, so it’s entirely possible I got the transform incorrect.

1 Like

Wow that was quick. Yes this is where the mistake stems from, sorry I didn’t look well enough to find this post.

The term (diag_pre_multiply(tau,L_Omega) * z)'multiplied out has to have the transpose everywhere, so (diag_pre_multiply(tau,L_Omega) * z)'=(\mathrm{diag}(\tau)\Omega_Lz)'=z'\Omega_L'\mathrm{diag}(\tau)'=z'\Omega_L'\mathrm{diag}(\tau)=z' * diag_post_multiply(L_Omega', tau). (and \mathrm{diag}(\tau)'=\mathrm{diag}(\tau) since it is diagonal of course).

Edit to make it more precise: This is for matrix[K, J] z;, which you then transformed to matrix[J, K] z;, thus the new z is the old z' and therefore we’re left with
z * diag_post_multiply(L_Omega', tau).

2 Likes

If there’s no open issue then I think it’s fine to just open a new issue for something like this. Or even better a PR with the fix (no obligation though of course!).

At first glance I think @abeeisnotabug is correct here, although I said the same thing in the thread that @mike-lawrence linked to ;) @bbbales2 I see you liked the initial post. Does this look right to you too?

The more I think about it now, the more I would transpose everything from the beginning. I guess the confusions arise because we usually think about data in columns with observations in rows, which renders one row a multivariate vector, but in linear algebra we think of vectors as columns.

So a more “linear algebra-friendly” version would be:

parameters {
  matrix[K, J] z;
  cholesky_factor_corr[K] L_Omega;
  vector<lower=0,upper=pi()/2>[K] tau_unif;
  matrix[L, K] gamma;                         // group coeffs
  real<lower=0> sigma;                       // prediction error scale
}
transformed parameters {
  matrix[K, J] beta;
  vector<lower=0>[K] tau;     // prior scale
  for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
  beta = u * gamma + diag_pre_multiply(tau, L_Omega) * z;
}
model {
  to_vector(z) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 5);
  y ~ normal(columns_dot_product(beta[, jj] , x), sigma);
}

But of course this now goes a bit beyond simple errata correction (and probably I made new ones as well), just as a suggestion.

1 Like

Double-checking: this bit works with x defined as a N \times K matrix, right?

2 Likes

My bad, you’re right, I very much doubt it. So either this gets ‘ugly’ again via columns_dot_product(beta[, jj] , x') or we need matrix[K, N] x, at least the latter would be concise with having entries that ‘belong together’ in column vectors.

I’m sorry I opened this box, we might as well just correct the errata when you guys have double checked it. I can do a PR if it is easier for you. Thank you

Yeah a PR would be great if you’re up for it, thank you! I think your correction is right (although I’d be happy for someone else to also check it).

For a PR I think the right file to edit is this one:

1 Like

Thank you. I made two pull requests now, a simple one where I just added three transposes to correct what I think is erroneous and a second one where I reworked the section a little bit in the spirit of @mike-lawrence to have as little transposes as possible and to make the matrix algebra more readable by switching from rows to columns as the organizing structure.

3 Likes

@jonah Idea for recruiting new contributors: sprinkle errors in the user docs. Actually, that’s precisely what motivated my original PR on this, of course … :P

4 Likes