I’ve been working with ordinal regression models that have category inflation, i.e. increased responses for certain categories beyond the normal response process. For instance, in some surveys, participants might choose the first value of the ordinal response scale to rush through the survey quickly, leading to inflated category 1 values.
I’ve recovered the parameters for more involved models, but the simple case below has been tripping me up.
The likelihood function is:
P( y | \kappa, \boldsymbol{\theta}) = \begin{cases} \kappa + (1-\kappa) \text{Categorical}(1 | \boldsymbol{\theta}) & \text{if } y = 1 \\ (1-\kappa)\text{Categorical}(y | \boldsymbol{\theta}) & \text{if } 1 < y \leq 3 \end{cases}
where y = {1,2,3} is the ordinal data, \kappa is the probability of drawing a category 1 from the inflation component (false 1), and \boldsymbol{\theta} is the vector of ordinal category probabilities for the true process.
I simulate data as:
set.seed(2020)
N <- 1000
# mean of latent scale generating the ordinal data
alpha <- 1
# sd of latent scale generating the ordinal data
sigma <- 1
# inflation probability
kappa <- 0.1
# simulate some inflated responses
is_inflated <- rbinom(N, 1, prob = kappa)
# probabilities of ordinal categories from the true process
theta <- rep(NA, 3)
theta[1] <- pnorm( (1.5 - alpha)/sigma )
theta[2] <- pnorm( (2.5 - alpha)/sigma ) - pnorm( (1.5 - alpha)/sigma )
theta[3] <- 1 - pnorm( (2.5 - alpha)/sigma )
# simulate the data
y <- rep(NA, N)
for(i in 1:N){
# if inflated, y == 1
if(is_inflated[i]){
y[i] <- 1
}
if(!is_inflated[i]){ # not inflated
# sample ordinal category
y[i] <- sample(1:3, size = 1, replace=T, prob = theta)
}
}
I Stan model code is:
data{
int<lower=1> N;
int J;
int K;
int<lower=1,upper=J> y[N];
vector[K] thresh;
}
parameters{
real alpha;
real<lower=0> sigma;
real<lower=0,upper=1> kappa;
}
model{
vector[J] theta;
alpha ~ normal(0, 5);
sigma ~ normal(0, 1);
kappa ~ beta(2, 2);
theta[1] = Phi( (thresh[1] - alpha)/sigma);
theta[2] = Phi( (thresh[2] - alpha)/sigma) - Phi( (thresh[1] - alpha)/sigma);
theta[3] = 1 - Phi( (thresh[2] - alpha)/sigma);
// likelihood
for(n in 1:N){
if(y[n] == 1){
vector[2] lp;
lp[1] = log(kappa);
lp[2] = log1m(kappa) + categorical_lpmf(1 | theta);
target += log_sum_exp(lp);
}
if(y[n] > 1){
target += log1m(kappa) + categorical_lpmf(y[n] | theta);
}
}
}
However, running this model does not return the correct parameter values. In particular, it looks like the inflation component, kappa
just draws from the prior.
Does anyone see an obvious mistake?
Thanks!
Code: ordinal-inflation.R (953 Bytes) ordinal-one-inflation.stan (752 Bytes)