Trouble transitioning from BUGS to Stan with a particular categorical distribution

Hi, I am new to Stan but a little familiar with jags and bugs, and I want to transition some codes from bugs to Stan. Recently I encountered a problem where I cannot figure how to deal with a special categorical distribution.


The parameter delta i (i as a subscript) can take any integer from 1 to J, and it follows a discrete (categorical) distribution with probability (p1,p2,…,pJ).
To reduce the number of parameters, only pJ and a hyperparameter omega are specified. So p1,p2,…p(J-1) can be expressed as
p _j=\frac{j^{\omega}-(j-1)^{\omega}}{(J-1)^{\omega}}(1-p _J), j=1,...,J-1


So I specified this in bugs as

model {
for (i in 1:I){

# some prior
theta[i] ~ dnorm(0, 1)
delta[i] ~ dcat(p[1:J])

# model
for (j in 1:J) {
eta[i,j] <- step(j - delta[i] - 0.5)
logit(p1[i,j]) <- theta[i] - b[j]
pp[i,j] <- eta[i,j] * 0.25 + (1 - eta[i,j]) * p1[i,j]
Y[i,j] ~ dbern(pp[i,j])
}
}

# the categorical distribution part
p[J] ~ dunif(0, 1)
temp <- (1 - p[J]) / pow(J-1, omega)
for (j in 1:(J-1)) {
p[j] <- (pow(j, omega) - pow(j-1, omega)) * temp
}
omega ~ dgamma(1, 1)

# some prior
for (j in 1:J) {
b[j] ~ dunif(-1, 3)
}
}

When I transiting the above codes into Stan, I cannot figure out how to specify pJ, can it be

parameters {
real<lower=0,upper=1> pJ;
}

or just

parameters {
real<lower=0,upper=1> p[J];
}

I think the second specification must be wrong, but I can’t work out a way to build a categorical distribution with parameters pJ and omega.


Thanks for your answering and please forgive me for my ignorance if this question is foolish.

1 Like

Hi and welcome to the forums! To declare a parameter p with J elements, p should be declared either as a vector

vector[J] p;

or as an array of reals. The synax for declaring arrays in Stan changed somewhat recently. If you are using an up-to-date version, the correct syntax is

array[J] real p;

However, note that if delta is truly a parameter (and not data or transformed data) then you are going to run into further issues, because Stan does not support discrete parameters. If delta is a parameter, then you will need to find a way to marginalize it out in order to fit this model in Stan. This marginalization will be feasible only if the number of elements in delta is small (i.e. if I is small).

Otherwise, if you can describe your modeling problem in a bit more detail, we might be able to figure out an alternative trick or a good slight variation on the model that can be fit with Stan. In my experience, it is relatively rare to encounter a modeling scenario where a solution with Stan isn’t possible due to the restriction on discrete parameters (and when such a scenario does arise, it’s often a good idea to be at least a little bit skeptical of whether JAGS or other MCMC engines supporting discrete parameters actually reliably yield correct inference).

2 Likes

Thanks for your kindly responding!

I found this model in artical “Bayesian IRT Guessing Models for Partial Guessing Behaviors”. I simplified the model to test the code. It is designed to evaluate the performance or ability of a person by means of a set of items (a test). Item response theory is a usual way to decribe this type of analysis. Suppose there is I persons answering J items. The probability of person i with abilty theta i getting item j correct is, for example,
expit\left( \theta _i-b_j \right)
(One parameter logistic model, 1PLM). b is a item difficulty parameter indicating the amount of effort needed to get the corresponding item correct.

In the above article, an indicator function is proposed to decide the item location (delta i) after where person i just randomly responses to the remaining items. Before item delta i (including delta i) person i response to item with a successful probability according to 1PLM. delta is the parameter talk about in this topic, which can take any integer from 1 to J, and follows a discrete distribution with probability (p1,p2,…,pJ).

So a person has two parameters, theta i and delta i. J is usually no more than 60. I, in this case, can be as large as 10000, or can be as samll as 1000. This article uses winbugs to estimate this model. I think this discrete distribution might be tricky.

Thanks for your help!

This looks like a change point model.

Here’s an attempt at translation of your BUGS model to Stan

data {
  int<lower=1> I;
  int<lower=2> J;
  array[I,J] int<lower=0, upper=1> Y;
}
parameters {
  vector[I] theta;
  row_vector<lower=-1, upper=3>[J] b;
  real<lower=0> omega;
  real<lower=0, upper=1> pJ;
}
model {
  // priors
  b ~ uniform(-1, 3);
  omega ~ gamma(1, 1);
  theta ~ std_normal();
  pJ ~ uniform(0, 1);

  vector[J] p;
  p[J] = pJ;
  real temp = (1 - pJ) / pow(J-1, omega);
  for (j in 1:(J-1)) {
    p[j] = (pow(j, omega) - pow(j-1, omega)) * temp;
  }

  matrix[I,J] p1 = inv_logit(rep_matrix(theta, J) - rep_matrix(b, I));

  for (i in 1:I) {
    // marginalize delta[i]
    vector[J] lps;
    for (delta in 1:J) {
      lps[delta] = categorical_lpmf(delta| p);
      for (j in 1:J) {
        if (j < delta + 0.5) {
          // eta == 0
          lps[delta] += bernoulli_lpmf(Y[i,j]| p1[delta]);
        } else {
          // eta == 1
          lps[delta] += bernoulli_lpmf(Y[i,j]| 0.25);
        }
      }
    }
    target += log_sum_exp(lps);
  }
}
1 Like

Thanks! I will certainly dive into it!

Note that for J = 60 and I = 10000 as indicated in your question, the code inside the inner loop that @nhuurre provided would execute over thirty million times per leapfrog step, which I tentatively guess will yield a prohibitive runtime. If you run into prohibitive runtimes, check back here and we can have a think about alternatives.

Thanks for the advice. It took approximately 30 hours to run the jags code when I = 60 and J = 1000. So I returned to Stan for more effective estimating. I will take effort to work this out. : )

Oh whoops. The workload inside the innermost loop can be vectorized which should a little bit.
Also, my code had a nasty indexing error, as that p1[delta] should have been p1[i,delta].

data {
  int<lower=1> I;
  int<lower=2> J;
  array[I,J] int<lower=0, upper=1> Y;
}
parameters {
  vector[I] theta;
  row_vector<lower=-1, upper=3>[J] b;
  real<lower=0> omega;
  real<lower=0, upper=1> pJ;
}
model {
  // priors
  b ~ uniform(-1, 3);
  omega ~ gamma(1, 1);
  theta ~ std_normal();
  pJ ~ uniform(0, 1);

  vector[J] p;
  p[J] = pJ;
  real temp = (1 - pJ) / pow(J-1, omega);
  for (j in 1:(J-1)) {
    p[j] = (pow(j, omega) - pow(j-1, omega)) * temp;
  }

  vector[J] cat_delta_p;
  for (j in 1:J) {
    cat_delta_p[j] = categorical_lpmf(j| p);
  }
  matrix[I,J] logit_p1 = rep_matrix(theta, J) - rep_matrix(b, I);

  for (i in 1:I) {
    vector[J] lps = cat_delta_p;
    vector[J] Y_ll;
    for (j in 1:J) {
      Y_ll[j] = bernoulli_logit_lpmf(Y[i,j]| logit_p1[i,j]);
    }
    for (delta in 1:J) {
      lps[delta] += sum(Y_ll[:delta]) + bernoulli_lpmf(Y[i,delta+1:]| 0.25);
    }
    target += log_sum_exp(lps);
  }
}
1 Like