Efficiently sampling a matrix of independent parameters

jayrobwilliams · May 12, 2018, 2:11am

What is the most efficient way to sample a matrix of parameters drawn independently from the same distribution? I’m working on a multinomial logit model where I have a matrix of coefficients \beta that includes one column for each of K-1 outcomes in the data. I have \beta declared as a D \times K-1 matrix where each element is distributed \mathcal{N}(\mu_\beta, \sigma_\beta), where \mu_\beta and \sigma_\beta are hyperpriors.

I’m currently using to_vector to draw \beta because you can’t have a matrix on the left hand side of a normal_lpdf. A MWE of the model is included below. Is this the most efficient approach to sampling these parameters?

data {
  
  int<lower=1> N; // number of observations
  int<lower=1> D; // number of predictors (p + 1)
  int<lower=2> K; // number of alternatives
  int<lower=1,upper=K> Y[N]; // response variable
  vector[D] X[N]; // design matrix

}

parameters {
  
  matrix[K-1,D] beta_free_raw; // regression coefficeients for K - 1 alternatives
  real mu_beta;
  real<lower=.001> sigma_beta;
  
}

transformed parameters {
  
  matrix[K-1,D] beta_free;
  matrix[K,D] beta;
  beta_free = mu_beta + sigma_beta * beta_free_raw; // re-center parameters
  beta = append_row(rep_row_vector(0, D), beta_free); // append row of 0s to top
  
}

model {
  
  mu_beta ~ normal(0, 5);
  sigma_beta ~ cauchy(0, 2.5);
  to_vector(beta_free_raw) ~ normal(0, 1);
  
  for (n in 1:N) Y[n] ~ categorical_logit(to_vector(beta * X[n]));
  
}

bgoodri · May 12, 2018, 3:25am

Using to_vector(beta_free_raw) ~ normal(0, 1); but you should not read that as beta_free_raw is drawn from a standard normal distribution. Rather, the standard normal PDF in log form is being tacked on to the kernel for each element of beta_free_raw. That is why it is clearer to write target += normal_lpdf(to_vector(beta_free_raw) | 0, 1);.

jayrobwilliams · May 13, 2018, 3:16pm

Got it. What are the practical differences between those two statements? Also, are there any other ways to write this model more efficiently?

bgoodri · May 13, 2018, 4:09pm

The practical difference is that it does not confuse yourself or others as to what Stan is doing. That and the target += version calculates constants, which is somewhat slower but necessary in some situations.

jayrobwilliams · May 13, 2018, 4:42pm

Sorry if I wasn’t clear. I meant why is it important to remind yourself/others that you’re incrementing the log probability instead of drawing from a standard normal?

bgoodri · May 13, 2018, 4:49pm

Because in order to sample from a posterior distribution in Stan, the Stan program has to commit to a log-kernel of a density function. And the shape of that log-kernel affects how well NUTS can draw from that posterior distribution. All of the tricks to improve the effective sample size are based on manipulating the log-kernel to get better geometry. When you ask “Also, are there any other ways to write this model more efficiently?” that is equivalent to asking “Also, are there any other ways to manipulate the log-kernel to get better geometry?”

Bob_Carpenter · June 15, 2018, 10:39pm

It’s often possible (and useful) to think of a model generatively, but that’s not what the computation is doing. The computation is just computing a log density and its gradient and the geometry of that log density is what will determine statistical sampling efficiency.

I’d qualify that. There are ways to make the iterations go faster (usually precomputation and vectorization) and there are ways to make the geometry better so that mixing is better (usually with reparameterizations).

Topic		Replies	Views
Assign priors for parameters in a matrix General techniques	3	815	June 11, 2022
Answers conditions inside model block Modeling rstan , multinomial-response	4	452	September 27, 2021
Correct way of sampling correlated multidimensional vectors Modeling	3	639	February 9, 2019
Non-centered sampling of multivariate normal distribution - precision matrix Modeling specification	1	490	November 25, 2021
Efficient computation of multi-logit Modeling	4	90	December 16, 2024

Efficiently sampling a matrix of independent parameters

Related topics