Possible Mistake in Stan User Guide Section 1.13

abeeisnotabug · November 19, 2020, 11:53am

I think there are a two typos in Section 1.13 of the Stan User Guide, version 2.25.

For an older User Guide there was an open issue on GitHub as discussed here, but it’s closed now so I don’t really know where to post this.

In the Optimization through Cholesky Factorization section, there is this code in a code block:

transformed parameters {
  matrix[J, K] beta;
  beta = u * gamma + z * diag_post_multiply(L_Omega,tau);
}

Here, z is a matrix, so one row of z is a row vector (later defined to be standard (mv) normally distributed), thus we define a new variable \beta:=\Omega_L\mathrm{diag}(\tau), thus in column vector form: \beta'=\mathrm{diag}(\tau)\Omega_L'z'. But if we the figure out the variance of \beta, we find that \mathbb{V}[\beta']=\mathrm{diag}(\tau)\Omega_L'\Omega_L\mathrm{diag}(\tau). However, \Omega_L'\Omega_L\neq\Omega_L\Omega_L'=\Omega. Thus I think this should read

transformed parameters {
  matrix[J, K] beta;
  beta = u * gamma + z * diag_post_multiply(L_Omega', tau);
}

Another one of those comes a bit later in the lines of

Sigma_beta
  = quad_form_diag(Omega, tau)
  = diag_post_multiply(L_Omega,tau) * diag_post_multiply(L_Omega,tau)'

Here I think this should be
quad_form_diag(Omega, tau)=\mathrm{diag}(\tau)\Omega\mathrm{diag}(\tau)=\mathrm{diag}(\tau)\Omega_L\Omega_L'\mathrm{diag}(\tau)=diag_pre_multiply(tau, Omega_L) * diag_pre_multiply(tau, Omega_L)'

I didn’t test this specific code, but I had the wrong versions in a model of mine and spend too long trying to find out why my estimates were off ever-so-slightly and then I applied the correction and everything works fine.

mike-lawrence · November 19, 2020, 11:54am

Shoot, that’s possibly on me. I think I did the most recent edit, just a sec…

(Here’s where my motivation came from; heading over to the docs to see what I messed up…)

mike-lawrence · November 19, 2020, 12:02pm

@jonah Can you take a look? I made my suggestion from a very math naive background, more out of coding aesthetics, so it’s entirely possible I got the transform incorrect.

abeeisnotabug · November 19, 2020, 12:03pm

Wow that was quick. Yes this is where the mistake stems from, sorry I didn’t look well enough to find this post.

The term (diag_pre_multiply(tau,L_Omega) * z)'multiplied out has to have the transpose everywhere, so (diag_pre_multiply(tau,L_Omega) * z)'=(\mathrm{diag}(\tau)\Omega_Lz)'=z'\Omega_L'\mathrm{diag}(\tau)'=z'\Omega_L'\mathrm{diag}(\tau)=z' * diag_post_multiply(L_Omega', tau). (and \mathrm{diag}(\tau)'=\mathrm{diag}(\tau) since it is diagonal of course).

Edit to make it more precise: This is for matrix[K, J] z;, which you then transformed to matrix[J, K] z;, thus the new z is the old z' and therefore we’re left with
z * diag_post_multiply(L_Omega', tau).

jonah · November 19, 2020, 4:44pm

If there’s no open issue then I think it’s fine to just open a new issue for something like this. Or even better a PR with the fix (no obligation though of course!).

At first glance I think @abeeisnotabug is correct here, although I said the same thing in the thread that @mike-lawrence linked to ;) @bbbales2 I see you liked the initial post. Does this look right to you too?

abeeisnotabug · November 19, 2020, 5:55pm

The more I think about it now, the more I would transpose everything from the beginning. I guess the confusions arise because we usually think about data in columns with observations in rows, which renders one row a multivariate vector, but in linear algebra we think of vectors as columns.

So a more “linear algebra-friendly” version would be:

parameters {
  matrix[K, J] z;
  cholesky_factor_corr[K] L_Omega;
  vector<lower=0,upper=pi()/2>[K] tau_unif;
  matrix[L, K] gamma;                         // group coeffs
  real<lower=0> sigma;                       // prediction error scale
}
transformed parameters {
  matrix[K, J] beta;
  vector<lower=0>[K] tau;     // prior scale
  for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
  beta = u * gamma + diag_pre_multiply(tau, L_Omega) * z;
}
model {
  to_vector(z) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 5);
  y ~ normal(columns_dot_product(beta[, jj] , x), sigma);
}

But of course this now goes a bit beyond simple errata correction (and probably I made new ones as well), just as a suggestion.

mike-lawrence · November 19, 2020, 6:42pm

Double-checking: this bit works with x defined as a N \times K matrix, right?

abeeisnotabug · November 19, 2020, 7:06pm

My bad, you’re right, I very much doubt it. So either this gets ‘ugly’ again via columns_dot_product(beta[, jj] , x') or we need matrix[K, N] x, at least the latter would be concise with having entries that ‘belong together’ in column vectors.

I’m sorry I opened this box, we might as well just correct the errata when you guys have double checked it. I can do a PR if it is easier for you. Thank you

jonah · November 19, 2020, 10:32pm

Yeah a PR would be great if you’re up for it, thank you! I think your correction is right (although I’d be happy for someone else to also check it).

For a PR I think the right file to edit is this one:

github.com

stan-dev/docs/blob/d32e1e64c64018ebd33c56a85e47752bf3041ab3/src/stan-users-guide/regression.Rmd#L1447-L1483


      
          ```
          data {
            matrix[J, L] u;
            ...
          parameters {
            matrix[J, K] z;
            cholesky_factor_corr[K] L_Omega;
            ...
          transformed parameters {
            matrix[J, K] beta;
            beta = u * gamma + z * diag_post_multiply(L_Omega,tau);
          }
          model {
            to_vector(z) ~ std_normal();
            L_Omega ~ lkj_corr_cholesky(2);
            ...
          ```
          
          The data variable `u` was originally an array of vectors, which
          is efficient for access; here it is redeclared as a matrix in order to

This file has been truncated. show original

abeeisnotabug · November 20, 2020, 9:23am

Thank you. I made two pull requests now, a simple one where I just added three transposes to correct what I think is erroneous and a second one where I reworked the section a little bit in the spirit of @mike-lawrence to have as little transposes as possible and to make the matrix algebra more readable by switching from rows to columns as the organizing structure.

mike-lawrence · November 20, 2020, 3:38pm

@jonah Idea for recruiting new contributors: sprinkle errors in the user docs. Actually, that’s precisely what motivated my original PR on this, of course … :P

Topic		Replies	Views
Multivariate Regression and Optimization through Cholesky Factorization - expression is ill formed Modeling	2	365	February 23, 2020
Strange behavior in cmdstan 2.18 with matrices General	9	531	August 18, 2018
Possible error in user guide? "statistically equivalent" Modeling techniques	2	298	July 17, 2023
Why exactly is alpha + beta * x ~ normal(y, sigma) wrong? Is it even wrong? Modeling specification	4	955	January 16, 2023
Tweaking SUG 11.3 ("Multivariate priors for Hierarchical Models") General specification	2	525	October 8, 2020

Possible Mistake in Stan User Guide Section 1.13

Related topics