Discrete choice model an I need help to understand how to crunch my data

Hello,
I am trying to infer about the preference of N=100 individuals to choose k=2 products out of a basket of n=7 (I have a data set of 200 rows).
So I have:

C_{n,k} = \binom{7}{2} = \frac{7!}{2!(7-2)!} = 21

possible couples of products.

Now, I have a strong suspicion that the individuals have preferences, so I would like to infer - using my dataset- the probability \theta_i (i=1\ldots n) of each product to be chosen, and

\sum_{i=1}^{n} \theta_i = 1

I am interested in just \theta, not about the “first” and “second” choices.

If k=1, I would go with a multinomial model, but with k>1 I can’t figure out a way to make an inference about \theta.

I tag @James_Savage because I found his blog about discrete choice models, but any help is appreciated to help me to head in the right direction.
Thanks in advance!

One solution is to treat each of the 21 pairs as a different outcome in a multinomial model with one trial. With unlimited data, you could then sum the estimated probabilities of all the pairs containing element i to get the probability that somebody chooses i. However, with 100 rows in the data, you might not be able to sufficiently constrain the multinomial probabilities for 21 outcomes. To ameliorate this, you might consider some additional assumptions (e.g. that the probability of choosing any element is independent of the other element chosen) that would allow you to construct the probabilities of each of the 21 outcomes from the underlying probabilities-of-choice of each of the seven items. Thus, your model has six free parameters that define probabilities \theta on a seven-dimensional simplex (or if you prefer, \frac{\theta}{2} lives on the simplex). These transform to the 20 (non-independent) parameters of the multinomial distribution for the outcome.

FWIW, it wouldn’t shock me if this turns out to be equivalent to a simpler model–perhaps somebody else will chime in–but even so it’s a neat way to think about what’s going on, and it is general in the sense that there might be multiple sets of assumptions that allow you to specify the multinomial probabilities using the underlying single-choice probabilities.

tanks @jsocolar I drop here some code for me to reproduce some synthetic data to further develop my analyses. I will share any advances in the next posts.

set.seed(123)

N = 100
K = 7

theta = runif(n = K, min = 0, max = 1)
theta = theta/sum(theta)

products = 1:K

data = data.frame()
for (n in 1:N) {
    s = sample(x = products, size = 2, replace = FALSE, prob = theta)
    data = rbind(data, data.frame(PRODUCT = s, REP = n))
}

maybe this is not the best organization of the dataset, but it resembles the dataset I must crunch.

head(data)

   PRODUCT REP
1       2   1
2       4   1
3       4   2
4       3   2
5       4   3
6       5   3

now I’ll try to figure out how to code in stan your advice!

Soooo… I took a few steps in the direction indicated by @jsocolar: the code is same as above: I just added some lines to calculate the combinations and to build the input dataframe

combinations = combn(x = 7, m = 2)
data_combinations = matrix(data = 0L, nrow = N, ncol = ncol(combinations))
for (n in 1:N) {
  chosen = sample(x = products, size = 2, replace = FALSE, prob = theta)  
  data_combinations[n, which(apply(combinations, 2, function(x) {identical(x, chosen) || identical(x, rev(chosen))}))] = 1L
}

data_combination is a matrix with N=100 rows and 21 cols. It’s very sparse and every row has just one element equal to one corresponding to the combination of the two products. This dataset suits the advice from @jsocolar for a multinomial model with one trial:

data {
    int N; // number of observations
    int K; // simplex dimension
    int<lower = 0, upper = K> Y[N,K]; //observations
}

parameters {
    real<lower = 0> alpha;
    simplex[K] theta;
}

model {
    target += std_normal_lpdf(alpha);
    target += dirichlet_lpdf(theta | rep_vector(alpha, K));
    for (n in 1:N) {
        target += multinomial_lpmf(Y[n, ] | theta);
    }
}

This model calculates the \theta on the 21-dimensional simplex. Now I would like to “shrink” to the 7d simplex.

One way, is to consider that the choice of product A and B is independent so that \theta(A \cap B) = \theta(A) \theta(B)

but how to code this in the stan model in terms of target += log(prior*likelihood)?

After many unsuccessful attempts, I realized that the approach I had in mind is wrong.
This is not a Discrete Choice Model, but an Urn Model and so the Wallenius’ Noncentral Hypergeometric Distribution is needed.
AFAIK, this distribution is not implemented in stan.
I mark this topic as “solved” but, in fact, it is not.
Thanks, @jsocolar for your help so far!