Vectorized `ordered_logistic` compiles but then does not sample correctly

I just noticed that the ordered_logistic function compiles (rstan 2.19.2) when the LHS is an integer array but then does something weird in the sampler. I’m a bit confused as to what is going on here so any info would be appreciated, I wonder if stan should fail during compilation in this case? It is very easy for a user to make a mistake here, I think.

I’ve attached simplified examples below, with a small number of iterations since the vectorized case is so slow.

Model with vectorized ordered_logit:

library(rstan)
library(rethinking)

data(Trolley)

model_code1 <- "
data {
    int<lower = 1> n;
    int<lower = 2> num_responses;
    int<lower = 1, upper = num_responses> response[n];
}

parameters {
    ordered[num_responses - 1] cut_points;
}

model {
    vector[n] linear_predictor = rep_vector(0.0, n);
    response ~ ordered_logistic(linear_predictor, cut_points);
}
"

mod1 <- stan(model_code = model_code1,
             iter = 200,
             warmup = 100,
             data = list(n = nrow(Trolley),
                         num_responses = length(unique(Trolley$response)),
                         response = Trolley$response))

> print(mod1)

Inference for Stan model: bb983badb3bce0d3a8dd5481b8a453c5.
4 chains, each with iter=200; warmup=100; thin=1; 
post-warmup draws per chain=100, total post-warmup draws=400.

                   mean se_mean      sd      2.5%       25%       50%       75%     97.5% n_eff   Rhat
cut_points[1]     -1.11    0.65    0.93     -2.09     -1.58     -1.40     -0.90      0.42     2 114.56
cut_points[2]     -0.73    0.51    0.72     -1.27     -1.25     -1.07     -0.56      0.49     2  69.00
cut_points[3]     -0.19    0.47    0.67     -1.06     -0.68     -0.28      0.41      0.70     2  26.16
cut_points[4]      0.29    0.56    0.80     -0.99      0.00      0.40      0.87      1.20     2  32.86
cut_points[5]      0.96    0.62    0.89     -0.48      0.59      1.27      1.55      1.99     2  27.77
cut_points[6]      1.61    0.56    0.80      0.26      1.40      1.89      2.20      2.34     2  25.48
lp__          -22604.89 1796.57 2553.30 -25697.12 -24600.76 -23104.69 -20417.95 -19087.92     2  34.34

Samples were drawn using NUTS(diag_e) at Thu Sep  5 16:14:40 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Model with for loop:

model_code2 <- "
data {
    int<lower = 1> n;
    int<lower = 2> num_responses;
    int<lower = 1, upper = num_responses> response[n];
}

parameters {
    ordered[num_responses - 1] cut_points;
}

model {
    vector[n] linear_predictor = rep_vector(n, 0.0);
    for (i in 1:n) {
         response[i] ~ ordered_logistic(linear_predictor[i], cut_points);
    }
}
"

mod2 <- stan(model_code = model_code2,
             iter = 200,
             warmup = 100,
             data = list(n = nrow(Trolley),
                         num_responses = length(unique(Trolley$response)),
                         response = Trolley$response))

> print(mod2)

Inference for Stan model: a276a6f50d33f30b0a0984159954e96a.
4 chains, each with iter=200; warmup=100; thin=1; 
post-warmup draws per chain=100, total post-warmup draws=400.

                   mean se_mean   sd      2.5%       25%       50%       75%     97.5% n_eff Rhat
cut_points[1]     -1.92    0.00 0.03     -1.97     -1.94     -1.92     -1.90     -1.85   226 1.00
cut_points[2]     -1.26    0.00 0.03     -1.32     -1.28     -1.27     -1.24     -1.21   470 1.00
cut_points[3]     -0.72    0.00 0.02     -0.76     -0.73     -0.72     -0.70     -0.67   767 0.99
cut_points[4]      0.25    0.00 0.02      0.21      0.23      0.25      0.26      0.29   454 1.00
cut_points[5]      0.89    0.00 0.02      0.85      0.88      0.89      0.91      0.94   388 1.01
cut_points[6]      1.77    0.00 0.03      1.71      1.75      1.77      1.79      1.83   591 1.00
lp__          -18926.01    0.13 1.79 -18930.46 -18926.93 -18925.65 -18924.69 -18923.60   183 1.01

Samples were drawn using NUTS(diag_e) at Thu Sep  5 16:15:46 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
1 Like

Hi @jeffpollock9

This is related to this bug that was resolved for the 2.20 release.
In the meantime (until Rstan 2.20) you can use the solution given by @andrjohns here: Ordered_logistic_lpmf

PS: It was nice meeting you at Stancon. Its always nice to put a face to the name :) Too bad I was kind of in a hurry.

3 Likes

Hi @rok_cesnovar thanks for the reply and apologies I missed those links while searching around.

Was good to meet you too - will hopefully be able to grab a beer at stancon 2020!

1 Like

Hey all. I have a somewhat related issue with these data. I’m fitting a simplified version of a model from the 2nd edition of Statistical Rethinking. In Section 12.4, McElreath introduced an ordered categorical predictor, education, which has 8 levels ranging from "Elementary School" to "Graduate Degree". In McElreath’s style of notation, the full version of the model was

\begin{align} R_i & \sim \operatorname{Ordered-logit} (\phi, \kappa) \\ \phi_i & = \beta_E \sum_{j=0}^{E_i - 1} \delta_j + \beta_A A_i + \beta_I I_i + \beta_C C_i \\ \kappa_k & \sim \operatorname{Normal}(0, 1.5) \\ \beta_A, \beta_I, \beta_C, \beta_E & \sim \operatorname{Normal}(0, 1) \\ \delta & \sim \operatorname{Dirichlet}(\{ 2, 2, 2, 2, 2, 2, 2 \}), \end{align}

where R is the ordinal response variable, E is our new ordered categorical predictor, and A through C are other predictors. I’m dropping those other predictors here to streamline the model and the code, which simplifies my model for \phi_i to

\phi_i = \beta_E \sum_{j=0}^{E_i - 1} \delta_j.

I’m currently working through the rstan code, with this version running successfully:

library(tidyverse)
library(tidybayes)
library(rstan)

data(Trolley, package = "rethinking")

edu_levels_vec <- c("Elementary School", "Middle School", "Some High School", "High School Graduate",
                    "Some College", "Bachelor's Degree", "Master's Degree", "Graduate Degree")

stan_data <- Trolley |> 
  mutate(edu_new = factor(edu, levels = edu_levels_vec)) |> 
  select(response, edu_new) |> 
  compose_data(alpha = rep(2, times = 7))

model_code <- '
data {
  int n;
  array[n] int response;
  array[n] int edu_new;
  vector[7] alpha;  // For the `delta` prior
}
parameters {
  ordered[6] kappa;
  real bE;
  simplex[7] delta;
}
model {
  vector[n] phi;
  vector[8] delta_j;
  delta_j = append_row(0, delta);
  
  for (i in 1:n) phi[i] = bE * sum(delta_j[1:edu_new[i]]);
  response ~ ordered_logistic(phi, kappa);
  
  kappa ~ normal(0, 1.5);
  bE ~ normal(0, 1);
  delta ~ dirichlet(alpha);
}
'

fit1 <- stan(
  model_code = model_code, 
  data = stan_data,
  cores = 3, chains = 3, seed = 12)

Now we get to the heart of my problem. I’m trying to vectorize the code for the phi, and can’t seem to get it right. Here’s my current attempt:

'
model {
  vector[n] phi;
  vector[8] delta_j;
  delta_j = append_row(0, delta);
  
  phi = bE * sum(delta_j[1:edu_new]);  // Only this line has chaged.
  response ~ ordered_logistic(phi, kappa);
  
  kappa ~ normal(0, 1.5);
  bE ~ normal(0, 1);
  delta ~ dirichlet(alpha);
}
'

When I try running the model this way, I get this error message:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0
Semantic error in 'string', line 17, column 27 to column 34:
   -------------------------------------------------
    15:    delta_j = append_row(0, delta);
    16:  
    17:    phi = bE * sum(delta_j[1:edu_new]);  // Only this line has chaged.
                                    ^
    18:    response ~ ordered_logistic(phi, kappa);
    19:  
   -------------------------------------------------

Range bound must be of type int. Instead found type array[] int.

I don’t understand what this is prompting me to do. Is there a way to vectorize phi?

It’s because you’ve declared the range as 1:edu_new, where edu_new is an array[] int - Stan doesn’t have a concept for the kind of value to return here.

For multiple indexing, you can either specify a range as int:int or pass an array of indexes as array[] int, but you can’t combine the two

I still don’t quite follow. Could you show me exactly what you mean in code?

https://mc-stan.org/docs/reference-manual/expressions.html#language-multi-indexing.section

So is it the case then that I cannot vectorize phi?

So is it the case then that I cannot vectorize phi?

Not using this indexing, at least.

Since the goal (appears to be) a person-wise summation of a subset of the delta_j vector, you can implement this using a matrix multiplication with an indicator matrix:

...
transformed data {
  matrix[n, 8] Xi = rep_matrix(0, n, 8);

  for (i in 1:n) {
    Xi[i, 1:edu_new[i]] = rep_row_vector(1.0, edu_new[i]);
  }
}
...
model {
  vector[n] phi = bE * Xi * append_row(0, delta);
  ...
}
1 Like

Okay. Thanks @andrjohns!