Here again with another question. For some context, I am implementing a Hidden Markov Model, that in its simplest form is just a order 1 transition probabilities between the states and Gaussian emissions for the different observations that change depending on the current state. Since each of the timeseries comes from different groups I also have covariates associated with these groups that I want to test whether they influence the transition probabilities between the states.
Basically all I want to know is what is the correct way to make the covariates influence the transition probabilities between the states. Up to now I have been having the covariates affect the log-probability doing this:
transformed parameters {
simplex[M] meta_trans_perPat[P, M]; // Patient-specific transition probabilities
// Hierarchical patient-specific transition probabilities with covariate effects
for (p in 1:P) {
for (m_from in 1:M) {
vector[M] adjusted_transitions_logit;
// Matrix-vector multiplication: X[p] is [1, C] and meta_beta_trans[m_from] is [C, M] giving: [1,M]
adjusted_transitions_logit = log(meta_trans_group[m_from]) + to_vector(X[p] * meta_beta_trans[m_from]);
// Apply softmax to get patient-specific transition probabilities
meta_trans_perPat[p, m_from] = softmax(adjusted_transitions_logit);
}
}
}
However, at some point I was debugging using print and I noticed that different values of meta_beta_trans can actually lead to similar probabilities in meta_trans_perPat[p, m_from] if they shift adjusted_transitions_logit in the same way (see the screenshot for the toy example). I understand that this is because softmax works based on proportions but doesn’t that make it almost impossible for the model to estimate the values of meta_beta_trans since there are different values that produce the same results? Should I then be applying the covariates in a different way?
In softmax regression (i.e. multinomial logistic regression) it is typical to pin one of the arguments to the softmax function to zero, removing the non-identifiability. One way to think about this is that the softmax doesn’t have a unique inverse, so we need a way to pick which particular inverse-softmax we are working with. Pinning an element to zero identifies the unique inverse-softmax.
Oh alright, that makes sense. So just to make sure I understood correctly, basically I should pick a reference element and center the entire input vector to that element (i.e. subtract the opposite of that element to the entire input vector), so that that element is zero and all others are shifted by the reference amount. Something like:
int m_ref = 2; // reference element to center adjusted effects
// Hierarchical patient-specific transition probabilities with covariate effects
for (p in 1:P) {
for (m_from in 1:M) {
vector[M] adjusted_transitions_logit;
// Matrix-vector multiplication: X[p] is [1, C] and meta_beta_trans[m_from] is [C, M] giving: [1,M]
adjusted_transitions_logit = log(meta_trans_group[m_from]) + to_vector(X[p] * meta_beta_trans[m_from]);
// Reference centering
adjusted_transitions_logit = adjusted_transitions_logit - adjusted_transitions_logit[m_ref];
// Apply softmax to get patient-specific transition probabilities
meta_trans_perPat[p, m_from] = softmax(adjusted_transitions_logit);
}
}
EDIT: well actually I was just thinking about this and this does not work, because the same problem remains. The translations do enforce the values to be centered to the reference value, but multiple combinations of adjusted_transition_logit (before reference centering) can lead to the same final adjusted_transitions_logit. Were you suggesting that I just make one of the values 0 in adjusted_transitions_logit ? So baiscally every element is estimated against that reference I guess
Typical in stats but rare in ML. The other way to identify is with a sum-to-zero constraint, which is more symmetric in the priors. But that’s pretty much impossible to do with covariates.
Not quite. There’s nothing to center. You just set the linear predictor to 0 for the input to softmax, and it identifies the beta values.
data {
...
matrix[M, N] x; // covariates (transposed for N observations)
parameters {
matrix[K - 1, M] beta; // regression coefficients
model {
matrix[K-1, N] beta_x = beta * x;
for (n in 1:N) {
simplex[K] theta = softmax(append_row(0, beta_x[, n])); // fix first linear predictor to 0
[edit: get the indexing right and make the matrix multiply efficient]