Hi @Bob_Carpenter,
Thank you for your insights on this problem. I would like to clarify my issue and ensure that I understand each of the points you mentioned.
I want to use the covariates (de, rei in this case) to inform the group means. I attempted this by using ordered intercepts and calculate group means in likelihood section. However, this approach led to the non-identifiability problem I mentioned earlier. My original code below:
parameter{
ordered[K] alpha;
...
}
model{
...
// likelihood
for(n in 1:N){
target += log_sum_exp(log(theta[1]) +
lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
log(theta[2]) +
lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
}
}
Regarding your suggestion to use a more general regression.
tried a similar approach by using covariates to imply group means. However, I encountered several issues and would like to clarify a few points with you.
- I tried the softmax function you suggested. I understand that it transforms a vector of values to sum to 1. However, the X[n]*beta expression returns a real number, resulting in an error saying ‘Ill-typed arguments supplied to function softmax.’ Could you clarify what you meant?
I have seen others use softmax after declaring a theta vector. Is there a difference between directly declaring a theta simplex and using softmax as in my code?
- I tried using the covariates matrix (including a column of 1 to represent the intercept), but I don’t understand how to use your trick of appending a row of 0 in the coefficients, beta. Does N refer to the number of observations or the number of covariates in the model?"
The stan code I have at the moment is below. I try the data matrix and beta (as intercept plus number of covariates). However, I am definitely mis-specify something in the code. Please do not hesitate to point out the errors. Thank you very much.
data {
int<lower=1> K; //number of mixture component (clusters)
int<lower=1> N; // number of observations
int<lower=1> V; // intercept plus number of covariates
vector[N] LW; //measured leaf width
matrix[N, V] CO; // variates matrix
}
parameters {
array[K] vector[V-1] beta_prefix; // Coefficients for intercepts and covariates
simplex[K] theta; // mixture proportion
real<lower=0> sigma1; // dispersion parameter
real<lower=0> sigma2; // dispersion parameter
}
transformed parameters{
// simplex[K] theta; // mixture proportion
// for (n in 1:N){
// for(i in 1:K){
// theta = softmax(CO[n, ] * beta[i]);
// }
// }
array[K] vector[V] beta;
for(i in 1:K){
beta[i] = append_row(beta_prefix[i], 0); // append a 0 to fix identifiability issue
}
}
model {
// priors
for(i in 1:K){
target += normal_lpdf(beta[i] | 0, 1);
}
target += lognormal_lpdf(sigma1 | 0, 2)-
normal_lccdf(0 | 0, 1);
target += lognormal_lpdf(sigma2 | 0, 2)-
normal_lccdf(0 | 0, 1);
target += beta_lpdf(theta | 1, 1);
// likelihood
for(n in 1:N){
target += log_sum_exp(log(theta[1]) +
lognormal_lpdf(LW[n] | CO * beta[1], sigma1),
log(theta[2]) +
lognormal_lpdf(LW[n] | CO * beta[2], sigma2));
}
}
Finally, subtracting a lcdf from a lpdf is a suggestion to deal with non-identifiability from chapter 19.1.1 in this book. I am not using it anymore because I try to solve this with ordered intercept method as applied in brms and in this blog post.
Chieh