Person i is asked to choose among options C_i, which varies in size across people. Our model includes the softmax: \frac{\exp(\eta_{ic'})}{\sum_{c \in C_i} \exp(\eta_{ic})}
What is the best way to code this? How should we store the C_i? Stan doesn’t have ragged arrays, but perhaps as vectors of 0/1s of length equal to the total number of options?
I work with this type of data quite a bit. Typically I’ll stack all choices and individuals on top of each other, then have an index that tells me which individual the row belongs to.
thanks, @James_Savage ! Very helpful. We are debating between your “long” format and a “wide” format that uses asked, a vector of 0s and 1s with length = total possible options, which is elegant for the softmax denominator sum(exp(eta[i]*asked[i])), but might involve less elegant subsetting of the data?
functions {
/* Return what R computes as x[cond] = subset(x, cond, count) */
vector subset(vector x, vector cond, int count) {
vector[count] result;
int pos = 1;
for (n in 1:rows(x)) {
if (cond[n]) {
result[pos] = x[n];
pos = pos + 1;
}
}
return result;
}
}
...
model {
...
exp(eta[i])/sum(exp(eta[i]*asked[i]))
...
for (n in 1:N) {
vector[num_asked[n]] y_asked = subset(y[n], asked[n], num_asked[n]);
y_asked ~ ...
}
}