Vectorizing prior sampling and normalizing prior samples after sampling

I want to normalize my parameters after sampling them from the prior distribution and before using them to perform posterior sampling. I do not need a second set of parameters that is normalized or anything. Still, rather I need to ensure that my parameters are (always) normalized as otherwise, they do not make sense. My code looks something like this:

data {
real zero;
}

transformed data {
  matrix[5, 3] n = [[ 0.7660444, zero, 0.7660444],
                    [-0.7660444, zero, 0.6427876],
                    [zero, 0.7660444, -0.6427876],
                    [zero, -0.6427876, -0.7660444],
                    [zero, 0.6427876, 0.6427876]];
  for (i in 1:5) {
    n[i, ] = n[i, ]./norm2(n[i, ]);
  }
}

parameters {
  matrix[5, 3] m;
}

model {
  for (i in 1:3){
    for (j in 1:5) {
      m[j, i] ~ uniform(m[j, i]-0.01, m[j, i]+0.01);
    }
  }

  for (i in 1:5) {
    m[i, ] = m[i, ]./norm2(m[i, ]);
  }
  target += sum(m.+n).^2;
}

The issue is that

  for (i in 1:5) {
    m[i, ] = m[i, ]./norm2(m[i, ]);
  }

doesn’t work, i.e. I cannot perform calculations like this in the model block. How can I solve this? I was considering using a transformed parameters block, but I don’t think that I understand it enough to implement it here. Specifically, I am not sure whether my normalized parameters would be used for posterior sampling.

The side issue that I have here is using a for-loop to sample from a prior distribution for a matrix parameter. Is it possible to do this more efficiently, such as some kind of vectorized prior sampling statement (it could be uniform or normal distribution)?

The normalization in question converts a vector to unit length as follows:

This won’t work in Stan because you’re reassigning to a parameter and those get set on the outside by algorithms. Instead, you need to define a new variable m_norm and set it equal to the normalized vector.

This is going to be a problem for HMC in that it doesn’t identify the actual parameters, because (-100, 100) and (-1, 1) normalize to the same value, (-1/sqrt(2), 1/sqrt(2)). A better approach is to define m as an array of unit vectors,

array[5] unit_vector[3] m;

This will make sure m[k] is a unit vector of size 3 for k in 1:5.

I’m not sure what you’re trying to do here, because m shows up on the right and the left. Do you want to declare a new parameter that’s sampled very close to m? That would work in Stan’s language, though it’s an unusual approach (usually hard coded boundaries that are not physical are a bad idea in Bayesian models on both statistical and computational grounds). Assuming you declare a new parameter matrix[5, 3] almost_m, you can take

to_vector(almost_m) ~ uniform(m - 0.01, m + 0.01);

The real problem here is that this will only work if you’ve declared constraints on almost_m of the right bounds and we’re not quite set up to do that with matrices. What you’d really want is something like:

matrix<lower=m - 0.01, upper=m + 0.01>[5, 3] almost_m;

but I don’t think Stan supports that ( you can try). If not, you need to declare almost_m as an vector and then convert back to a matrix.

1 Like

Thanks for the normalization response! I didn’t know about the unit_vector declaration. That is useful, however, since (in the real code) I need to perform rows_dot_product() operation on n and m during posterior sampling, would it be possible to enforce this while only using only linear algebra data types? Matrices work great here because I can pass them to rows_dot_product()and Stan figures out what to do. I was thinking just declaring 5 unit vectors separately, but that would require using for loops and posterior function would need to take more arguments (10, instead of 2), and I am concerned that would significantly decrease the computational efficiency. Is there any elegant way to both have unit vector constraints and perform linear algebra operations?

Also, just to confirm, subsetting array[5] unit_vector[3] m; works the same way as subsetting array[5, 3];, i.e. m[3, 3] is the third element of third unit vector (sorry I couldn’t fid this specific case in the docs)?

What I meant by m[j, i] ~ uniform(m[j, i]-0.01, m[j, i]+0.01); is that current iteration’s value of m is sampled from uniform distribution centered on previous iteration’s value of m. I hope my sampling statement achieves that, please correct me if I’m wrong!

What I wanted is to improve the efficiency by removing the for-loops. I have tried, similar to your suggestion, m ~ uniform(m - 0.01, m + 0.01);, but it tells me that arguments for uniform() must be reals. I was hoping there is something like a vectorized sampling statement as I believe that they might not be too niche to be implemented. I searched through the docs, but I could only find “regular” uniform and Gaussian sampling statements.

Loops are super-fast in Stan if there’s no autodiff going on. So you can declare things this way:

array[5] unit_vector[3] m;

matrix[5, 3] mm;
for (k in 1:5) {
  mm[k] = m[k]';  // note that ' is used to transpose here to assign to a row vector
}

Now mm is a matrix with rows that are unit vectors. If you want columns that are unit vectors instead, that’d be:

matrix[3, 5] mm;
for (j in 1:5) {
  mm[ , j] = m[j];  // no transposition necessary to assign to column vector
}

With only 15 entries this is going to be super fast and won’t exert a lot of memory pressure.

This can’t be what you intend because m shows up on the left and right. What I was suggesting if you really really want a jittered version of m (I can’t see where that’d ever be useful, but there are a lot of models out there), then you need to declare a new parameter

parameters {
  real<lower=m - 0.01, upper=m + 0.01> m_jittered;  // lower and upper are critical here
...
model {
  m_jittered ~ uniform(m - 0.01, m + 0.01);
...

Then you need to take this up to vectors with either a loop or vectorization. If your initial m is a matrix, then you can do this:

parameters {
  vector[A * B] m_flat;
  vector<lower=m_flat - 0.01, upper=m_flat + 0.01>[A * B] m_flat_jitter;
  ...
transformed parameters {
  matrix[A, B] m = to_matrix(m_flat, A, B);
  matrix[A, B] m_jitter = to_matrix(m_flat_jitter, A, B);
  ...
model {
  m_flat_jitter ~ uniform(m_flat - 0.01, m_flat + 0.01);
  ...
}
1 Like