Softmax with varying subsets

Person i is asked to choose among options C_i, which varies in size across people. Our model includes the softmax:
\frac{\exp(\eta_{ic'})}{\sum_{c \in C_i} \exp(\eta_{ic})}

What is the best way to code this? How should we store the C_i? Stan doesn’t have ragged arrays, but perhaps as vectors of 0/1s of length equal to the total number of options?

exp(eta[ii[n],cc[n]])/sum(exp(eta[ii[n],C_i]))

Hey Shira –

I work with this type of data quite a bit. Typically I’ll stack all choices and individuals on top of each other, then have an index that tells me which individual the row belongs to.

Here is an example (each market has a varying number of choices). https://github.com/khakieconomics/rrcl/blob/master/src/stan_files/vassavage.stan

2 Likes

thanks, @James_Savage ! Very helpful. We are debating between your “long” format and a “wide” format that uses asked, a vector of 0s and 1s with length = total possible options, which is elegant for the softmax denominator sum(exp(eta[i]*asked[i])), but might involve less elegant subsetting of the data?

functions {
/* Return what R computes as x[cond] = subset(x, cond, count) */ 
vector subset(vector x, vector cond, int count) { 
   vector[count] result; 
   int pos = 1; 
   for (n in 1:rows(x)) { 
     if (cond[n]) { 
       result[pos] = x[n]; 
       pos = pos + 1; 
     } 
   } 
   return result; 
 } 
}
...
model {
...
exp(eta[i])/sum(exp(eta[i]*asked[i]))
...
for (n in 1:N) {
  vector[num_asked[n]] y_asked = subset(y[n], asked[n], num_asked[n]);
  y_asked ~ ...
}
}