Ordered_logistic in Item response model with no constant category

Good night,

I am from Colombia and needing a few help with my code in Stan. I am making a ordered logistic model with not constant number of categories. Some of you could give me some ideas for making a model with a varible number of categories by item. For example: item 1 have 4 categories, item 2 have 5 categories, … .

In my code, i have a funtional code with the same number of categories boundaries.

I will be so gratedfull,


int<lower=1> N; // number of answers
int<lower=10> N_fami; // number de entrepreneur
int<lower=2> N_item; // number de items
int<lower=2,upper=3> N_cat; // number of categories item
int<lower=1,upper=N_cat> y[N]; // respuestas; y[n], n-th respuesta
int<lower=1,upper=N_fami> famiempresas[N];// respuestas del empresario[n]
int<lower=1,upper=N_item> item[N]; // número de items [n]
}//end data

vector[N_fami] theta; // latent traits of microempresarios
ordered[N_cat-1] beta[N_item];// coefficients of predictors (cut p)
real mu_beta; // mean of the beta-parameters: difficulty of the item
real<lower=0> sigma_beta; //sd of the prior distributions of category difficulty
real pho_theta; // parámetro de asimétria del trazo latente
vector<lower=0>[N_item] alpha;
}//end parameters

theta ~ skew_normal(0,1, pho_theta);// latent traits
for(j in 1:N_item){
beta[j]~ normal(mu_beta,sigma_beta);// prior of the item parameters
alpha[j] ~ normal(0,1); // prior discriminación
mu_beta ~ normal(0,2); // hyper prior for mu_beta
sigma_beta ~ cauchy(0,2); // hyper prior for sigma_beta
pho_theta ~ normal(0,1); // hyper prior for theta, parámetro de asimétria.
for(n in 1:N){
y[n] ~ ordered_logistic(alpha[item[n]]*theta[famiempresas[n]],beta[item[n]]);
}//end model

generated quantities {
vector[N] log_lik;
for (n in 1: N){
log_lik[n] = ordered_logistic_log(y[n],alpha[item[n]]*theta[famiempresas[n]],beta[item[n]]);
}//generated quantities

The model

P[u_{ij}=g|\theta_j,\xi]=logit^{-1} (\eta_{g-1})- logit^{-1} (\eta_{g}) where, logit^{-1}(\eta_{g-1})= (1+e^{-\eta_{g-1}})^{-1}, \eta_{g_{-1}}=(\alpha_i*\theta_j-\alpha_i*\beta_{i,g-1}) and logit^{-1}{(\eta_g)= (1+e^{-\eta_g})^{-1}} \eta_g=(\alpha_i*\theta_j-\alpha_i*\beta_{i,g}).
P[u_{ij}=g|\theta_j,\xi]= \left\{ \begin{array}{lcc} 1-(1+e^{-\eta_{1}})^{-1} & si & g=1 \\ (1+e^{-\eta_{g-1}})^{-1}-(1+e^{-\eta_g})^{-1}& si & 1 < g < k \\ (1+e^{-\eta_{k-1}})^{-1} & si & g = k-1 \end{array} \right.
\theta_j \sim SN(0, 1, \rho), \nonumber\\ \rho \sim N(0,1),\nonumber \\ \nonumber \alpha_i \sim N(0,1),\\ \nonumber \beta_{ig}\sim N(\mu_{beta_{ig}},\sigma_{beta_{ig}}),\\ \nonumber \mu_{beta}\sim N (0,1),\\ \nonumber \sigma_{beta}\sim Cauchy (0,2)

Hi, Henry!

Just a few questions and some remarks:

You say you have a functional code, but what exactly do you mean? Does it compile? What estimates and diagnostics are you getting when you run it? What help do you want from the community?

Looking quickly over the code, and not knowing much about your data, I don’t immediately see a problem (but I’m no expert). Is your half-normal (0,1) prior on alpha working well? I’d think it would place a lot of weight on 0 discrimination, which is unlikely, given that the items are designed to measure something, and perhaps a little tight for high discrimination?

1 Like

Hi Erling, thank for answering

Continuing the discussion from Ordered_logistic in Item response model with no constant category:

This code compile well. I am working a Graded Logistic Model (Samejima). My data are in long type in R like this:

List of 7

$ famiempresas: int [1:3072] 1 2 3 4 5 6 7 8 9 10 …

$ item : int [1:3072] 1 1 1 1 1 1 1 1 1 1 …

$ y : int [1:3072] 1 2 2 1 2 2 1 1 2 2 …

$ N : int 3072

$ N_fami : int 384

$ N_cat : int 3

$ N_item : num 8

I want to change the N_cat , in this code is constant equal to 3. That works very well when I have a test with the same number of possible answers by item. But I have now a test with different number of possible answers by item. I am thinking of making a vector with a number of categories by item, but I do not how to include then in the parameter and model block.

ordered[N_cat-1] beta[N_item];// coefficients of predictors (cut p)

for(n in 1:N){

y[n] ~ ordered_logistic(alpha[item[n]]*theta[famiempresas[n]],beta[item[n]]);


I will appreciate so much your help,


Please note: I am not a programmer by trade. There may very well be more efficient ways of doing this.

You could do it by a vector of of the number of categories per item, as you suggest, and then inserting some if-statements in your for-loop. You’ll also need a separate vector to index the questions with the same number of response alternatives. Say you have 2 questions (items 1 and 2) with three alternatives and 2 questions with four alternatives (items 3 and 4), and three respondents:

Respondent-vector: famieimpresas 111122223333
Item-vector: item 123412341234
Categories per item-vector: n_cat_ind 334433443344
Item category number: cat_nr 121212121212

You’ll then declare separate beta parameters with different numbers of cutpoints, say beta_3 and beta_4

In the model block, you could then do:

for(n in 1:N){
     if( n_cat_ind[n]==3 ){
         y[n] ~ ordered_logistic(alpha[item[n]]*theta[famiempresas[n]],beta_3[cat_nr[n]]);}
     if( n_cat_ind[n]==4 ){
         y[n] ~ ordered_logistic(alpha[item[n]]*theta[famiempresas[n]],beta_4[cat_nr[n]]);}

But again, there may be more efficient ways of coding this. If ordered_logistic was vectorised (but I don’t think it is?) it would perhaps be better to split the data into separate long format vectors for the item sets with different numbers of categories, and just have three vectorised sampling statements for those, with the same theta parameter.

Also, if you are using the calculated log_lik for the loo package, it’s worth thinking about that you are calculating the log_lik of the response to each item, not the set of responses of each respondent, which will determine the interpretation of crossvalidation.

1 Like

Thank you so much. I will do it.