# Problems to model choice probabilites directly instead of multinomial data with a dirichlet distribution

Hey all,

I’m working with a model with a multinomial likelihood to estimate parameters based on given responses of 5 different categories of some task. The data looks like this (a frequency table):

> data
ID Freetime    PC IIP IOP DIP DOP NPL
[1,]   1      0.0 0.476 238  86 107  38  31
[2,]   1      0.2 0.674 337  73  16  47  27
[3,]   1      0.4 0.710 355  76  15  33  21
[4,]   1      0.8 0.790 395  49   9  26  21


IIP to NPL are frequencies of given responses per category. As the recovery of this particular model is not that good, I wanted to try to model the data with a higher “resolution” and directly use the probabilities to inform the parameters. I wanted to do this with a dirichlet distribution. As \alpha for the dirichlet, I wanted to use the general count of the response set. For example, the response set for the 5 categories is

     IIP IOP DIP DIOP NPL
[1,]   1   4   1    4  16


Which means that there are 1 possible alternatives for the first category IIP, 4 for the IOP and so on. This seemed pretty straight forward to me, but the model is not performing at all, meaning that the parameter estimates for the hyper parameters are totally of. Here is my model so far, with the dirichlet:

I use a probability matrix as data instead of the pure counts:

        IIP   IOP   DIP   DOP   NPL
[1,] 0.476 0.172 0.214 0.076 0.062
[2,] 0.674 0.146 0.032 0.094 0.054
[3,] 0.710 0.152 0.030 0.066 0.042
[4,] 0.790 0.098 0.018 0.052 0.042


Here’s the model:

functions{
.....
}

}

data {
int <lower=0> N;  // number rownumber
int <lower=0> K;  // categories
int <lower=0> J;  // Dims of Cov Matrix
int <lower=0> Con;  // number of FT categories
vector[K] R;         // number of responses per category
vector [K] p[N*Con];   // observed data
real scale_b; // set scaling for background noise
vector[Con] Freetime;  // freetime conditions after distractor encoding (e.g. 500 ms, 1000 ms or 1500 ms)
int retrievals;

}

parameters {

// Defining vector for hyper and subject parameters

cholesky_factor_corr[J] L_Omega;
vector<lower=0>[J] sigma;
vector[J] hyper_pars;
matrix[J,N] theta;

}

transformed parameters {
vector [K] count[N*Con];

// non-centered multivariate
matrix[J,N] subj_pars =  (
diag_pre_multiply( sigma, L_Omega )
* theta
+ rep_matrix(hyper_pars,N)
) ;

// Transform f Parameter
real mu_f = inv_logit(hyper_pars[3]);
row_vector[N] f = inv_logit(subj_pars[3,]);

// activations
real acts_IIP[N*Con];
real acts_IOP[N*Con];
real acts_DIP[N*Con];
real acts_DIOP[N*Con];
real acts_NPL[N*Con];

// probabilities
vector[K] probs[N*Con];
real SummedActs[N*Con];

// loop over subjects and conditions to compute activations and probabilites

for (i in 1:N){ // for each subject

for(j in 1:Con) {

acts_IIP[j + (i-1)*Con] = scale_b +  ((1+subj_pars[4,i]*Freetime[j])* subj_pars[1,i]) + subj_pars[2,i]; // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b + subj_pars[2,i];        // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure

SummedActs[j + (i-1)*Con] = R[1] * acts_IIP[j + (i-1)*Con] + R[2] * acts_IOP[j + (i-1)*Con] + R[3] * acts_DIP[j + (i-1)*Con]+ // account for different numbers auf DIP / DIOP
R[4] * acts_DIOP[j + (i-1)*Con]+ R[5] * acts_NPL[j + (i-1)*Con];

probs[j + (i-1)*Con,1] = (R[1] * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R[2] * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (R[3] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (R[4] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R[5] * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
}
}

model {

// priors for hyper parameters

hyper_pars[1] ~ normal(20,10); // c
hyper_pars[2] ~ normal(2,10); // a
hyper_pars[3] ~ normal(0,10); // f
hyper_pars[4] ~ normal(1,10); // EE
hyper_pars[5] ~ normal(1,10); // r

//
L_Omega ~ lkj_corr_cholesky(2);
sigma ~ gamma(1,0.01);

// Loop over subjects

for (i in 1:N)
{

theta[,i] ~ normal(0,1);

}

for (j in 1:Con) {
for (i in 1:N) {
// draw data from probabilities determined by MMM parms

p[j + (i-1)*Con,] ~ dirichlet(R);

}
}
}

generated quantities{

vector[(J*(J-1))%/%2] cor_mat_lower_tri;
//real p_rep[N*Con,K];

cor_mat_lower_tri = flatten_lower_tri(multiply_lower_tri_self_transpose(L_Omega));

}



I’m using Luces Choice rule to normalize the activations which arise from the estimatetd parameters into probabilities (end of the transformed parameters block), where R is the composite or “weight” for the response category - which is already taken into account in this normalization. I don’t know whether I’m totally off here, but as I understood, the dirichlet should do the trick, but I don’t really get it to work. Or should I in general try to use a multivariate beta instead?

jan

Hi, @jangoe. think it may take a couple questions to get on the same page with terminology.

I don’t know what PC or Freetime or ID are, but if you look at the last 5 columns, that looks like multinomial data. In which case, letting y_1 = (238, 86, 107, 38, 31) and so on, the natural way to model this is:

\alpha \sim ...

\theta \sim \textrm{Dirichlet}(\alpha)

y_n \sim \textrm{Multinomial}(\alpha) \textrm{ for } n \in 1{:}N

This lets you take the counts into consideration as well as the proportions. You usually don’t want to use a Dirichlet regression unless you don’t have the counts as it loses information.

So I couldn’t figure out what was going on in the rest of the code and discussion. Are you perhaps trying to build a logistic mulitivariate normal instead of Dirichlet? That works exactly the same way only with the multivariate normal prior instead of the Dirichlet. That potentially lets you fit correlations as well as concentration, but you need a fair amount of data to do this well or good priors.

I don’t know what a response set is. I though the responses were the five headers here. Or is each of these broken up into subcategories? If they are broken up into subcategories, do you have the subcategory counts?

Hey Bob, thanks for reaching out and explaining the logic !

Yes it is multinomial data, like you already pointed out, y_1 are category counts. This counts come from a set of alternatives, which have to be chosen.

The “response set” (sorry for not adressing it in the original question) are the initial available alterantives for one choice process - that means that in each “response set (or choice alternatives)” there a n alternatives for each category. So here we have

1 alternatives for category A
4 for category B
1 for category C
4 for category D
and 16 for category E

So in total one has to choose 1 answer out of 26 possibilities for every attempt. So the prior guessing probability of choosing the correct answer for one position would be 1/26 and the probability of choosing category D would be 4/26. As this is close to (at least) my understanding of \alpha in the Dirichlet, I thought I could/ or had to use this information as well to model it.

In you explanation, you wrote

\alpha ~ ...

what exactly would now be alpha here - the given probability vector from the data or some general prior ? The rest is totally clear to me.
choice[i,j] ~ categorical_logit(values_influencing_choice[,i,j]);