Hey all,
I’m working with a model with a multinomial likelihood to estimate parameters based on given responses of 5 different categories of some task. The data looks like this (a frequency table):
> data
ID Freetime PC IIP IOP DIP DOP NPL
[1,] 1 0.0 0.476 238 86 107 38 31
[2,] 1 0.2 0.674 337 73 16 47 27
[3,] 1 0.4 0.710 355 76 15 33 21
[4,] 1 0.8 0.790 395 49 9 26 21
IIP to NPL are frequencies of given responses per category. As the recovery of this particular model is not that good, I wanted to try to model the data with a higher “resolution” and directly use the probabilities to inform the parameters. I wanted to do this with a dirichlet distribution. As \alpha for the dirichlet, I wanted to use the general count of the response set. For example, the response set for the 5 categories is
IIP IOP DIP DIOP NPL
[1,] 1 4 1 4 16
Which means that there are 1 possible alternatives for the first category IIP, 4 for the IOP and so on. This seemed pretty straight forward to me, but the model is not performing at all, meaning that the parameter estimates for the hyper parameters are totally of. Here is my model so far, with the dirichlet:
I use a probability matrix as data instead of the pure counts:
IIP IOP DIP DOP NPL
[1,] 0.476 0.172 0.214 0.076 0.062
[2,] 0.674 0.146 0.032 0.094 0.054
[3,] 0.710 0.152 0.030 0.066 0.042
[4,] 0.790 0.098 0.018 0.052 0.042
Here’s the model:
functions{
.....
}
}
data {
int <lower=0> N; // number rownumber
int <lower=0> K; // categories
int <lower=0> J; // Dims of Cov Matrix
int <lower=0> Con; // number of FT categories
vector[K] R; // number of responses per category
vector [K] p[N*Con]; // observed data
real scale_b; // set scaling for background noise
vector[Con] Freetime; // freetime conditions after distractor encoding (e.g. 500 ms, 1000 ms or 1500 ms)
int retrievals;
}
parameters {
// Defining vector for hyper and subject parameters
cholesky_factor_corr[J] L_Omega;
vector<lower=0>[J] sigma;
vector[J] hyper_pars;
matrix[J,N] theta;
}
transformed parameters {
vector [K] count[N*Con];
// non-centered multivariate
matrix[J,N] subj_pars = (
diag_pre_multiply( sigma, L_Omega )
* theta
+ rep_matrix(hyper_pars,N)
) ;
// Transform f Parameter
real mu_f = inv_logit(hyper_pars[3]);
row_vector[N] f = inv_logit(subj_pars[3,]);
// activations
real acts_IIP[N*Con];
real acts_IOP[N*Con];
real acts_DIP[N*Con];
real acts_DIOP[N*Con];
real acts_NPL[N*Con];
// probabilities
vector[K] probs[N*Con];
real SummedActs[N*Con];
// loop over subjects and conditions to compute activations and probabilites
for (i in 1:N){ // for each subject
for(j in 1:Con) {
acts_IIP[j + (i-1)*Con] = scale_b + ((1+subj_pars[4,i]*Freetime[j])* subj_pars[1,i]) + subj_pars[2,i]; // Item in Position
acts_IOP[j + (i-1)*Con] = scale_b + subj_pars[2,i]; // Item in Other Position
acts_DIP[j + (i-1)*Con] = scale_b + f[i]*((exp(-subj_pars[5,i]*Freetime[j])*subj_pars[1,i]) + subj_pars[2,i]); // Distractor in Position
acts_DIOP[j + (i-1)*Con] = scale_b + f[i]*subj_pars[2,i]; // Distractor in other Position
acts_NPL[j + (i-1)*Con] = scale_b; // non presented Lure
SummedActs[j + (i-1)*Con] = R[1] * acts_IIP[j + (i-1)*Con] + R[2] * acts_IOP[j + (i-1)*Con] + R[3] * acts_DIP[j + (i-1)*Con]+ // account for different numbers auf DIP / DIOP
R[4] * acts_DIOP[j + (i-1)*Con]+ R[5] * acts_NPL[j + (i-1)*Con];
probs[j + (i-1)*Con,1] = (R[1] * acts_IIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,2] = (R[2] * acts_IOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,3] = (R[3] * acts_DIP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,4] = (R[4] * acts_DIOP[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
probs[j + (i-1)*Con,5] = (R[5] * acts_NPL[j + (i-1)*Con]) ./ (SummedActs[j + (i-1)*Con]);
}
}
}
model {
// priors for hyper parameters
hyper_pars[1] ~ normal(20,10); // c
hyper_pars[2] ~ normal(2,10); // a
hyper_pars[3] ~ normal(0,10); // f
hyper_pars[4] ~ normal(1,10); // EE
hyper_pars[5] ~ normal(1,10); // r
//
L_Omega ~ lkj_corr_cholesky(2);
sigma ~ gamma(1,0.01);
// Loop over subjects
for (i in 1:N)
{
theta[,i] ~ normal(0,1);
}
for (j in 1:Con) {
for (i in 1:N) {
// draw data from probabilities determined by MMM parms
p[j + (i-1)*Con,] ~ dirichlet(R);
}
}
}
generated quantities{
vector[(J*(J-1))%/%2] cor_mat_lower_tri;
//real p_rep[N*Con,K];
cor_mat_lower_tri = flatten_lower_tri(multiply_lower_tri_self_transpose(L_Omega));
}
I’m using Luces Choice rule to normalize the activations which arise from the estimatetd parameters into probabilities (end of the transformed parameters block), where R is the composite or “weight” for the response category - which is already taken into account in this normalization. I don’t know whether I’m totally off here, but as I understood, the dirichlet should do the trick, but I don’t really get it to work. Or should I in general try to use a multivariate beta instead?
thanks in adavance
jan