Error in model for ordinal data - probabilities parameter is not a valid simplex

Hello STAN experts,
I’m running a hierarchical model for predicting categorically distributed data – subjective ratings made at different time points on a 0-10 scale, before and after receiving social feedback. In the model I aim to estimate the effect the social feedback had on changes in participants’ ratings.
In the model the probabilities for each possible categorical outcome (i.e. 11 possible outcomes and 10 thresholds) are computed in the transformed parameters block. The model was primarily based on the examples presented in chapter 23 in John Kruschke’s book (Doing Bayesian Data Analysis) on predicting ordinal variables. Following Kruschke, I used a normal_cdf function for estimating the probabilities of each outcome. The first and last thresholds are fixed, and the non-end thresholds are free parameters. However, Kruschke’s example is written in JAGS, so I had to make two main adaptations for STAN:

  1. In JAGS each categorical outcome is distributed normally as follows – pr[i,K] <- pnorm( thresh[1] , mu , 1/sigmaˆ2 ). In my model variance was transformed based on this formula:
  2. In the JAGS model only one vector is used for the thresholds. This vector consists of fixed thresholds at its ends, and NaNs are placed in the non-end positions. I had to split the threshold vector into two separate components: a data vector for the fixed values, and a parameter vector for the non-end thresholds which are estimated.
    The code compiles successfully, but I’m constantly receiving the following error message:
    Chain 1: Rejecting initial value:
    Chain 1: Error evaluating the log probability at the initial value.
    Chain 1: Exception: categorical_lpmf: Probabilities parameter is not a valid simplex. sum(Probabilities parameter) = 3.111144369, but should be 1 (in ‘model1068159777f0_2f29cae781c6aa44e4c61751dc2681b3’ at line 139)
    Chain 1: Rejecting initial value:
    Chain 1: Error evaluating the log probability at the initial value.
    Chain 1: Exception: categorical_lpmf: Probabilities parameter is not a valid simplex. sum(Probabilities parameter) = 2.786957506, but should be 1 (in ‘model1068159777f0_2f29cae781c6aa44e4c61751dc2681b3’ at line 139)
    […]
    (this repeats several times).

The model is very long because of the loop for the theta probabilities computation – but note that the same calculation repeats in this part. I also attached the R code and data for 2 participants:

data{
int <lower=1> Nsubj ; // number of participants
int <lower=1> Ntotal ; // number of trials
int <lower=1> nYlevels ; // number of responses on scale (0-10 scale, so 11 responses)
int <lower=1> s[Ntotal] ; // index of subject at a given trial
real <lower=1> fixedThresh[nYlevels-9] ; // lower and upper thresholds are fixed, not stochastic

int yPreA[Ntotal] ; // participants’ ratings before feedback
int yJud[Ntotal] ; // social feedback
int yPost[Ntotal] ; // participants’ ratings after feedback
}

parameters {

vector[nYlevels-3] thresh ; // non-fixed thresholds

vector[Nsubj] muMuPreA ;
real mu_muMuPreAGroup ;
real <lower=0> sigma2_muMuPreAGroup ;

vector[Nsubj] muMuJud ;
real mu_muMuJudGroup ;
real <lower=0> sigma2_muMuJudGroup ;

vector<lower=0>[Nsubj] sigma2MuPreA ;
real <lower=0> alphaSigmaMuPreAGroup ;
real <lower=0> betaSigmaMuPreAGroup ;

vector<lower=0>[Nsubj] sigma2MuJud ;
real<lower=0> alphaSigmaMuJudGroup ;
real<lower=0> betaSigmaMuJudGroup ;

vector<lower=0>[Nsubj] tau2ThetaPreA ;
real<lower=0> alphaTauThetaPreA ;
real<lower=0> betaTauThetaPreA ;

vector<lower=0>[Nsubj] tau2ThetaJud ;
real<lower=0> alphaTauThetaJud ;
real<lower=0> betaTauThetaJud ;

}

transformed parameters {

vector[Nsubj] precisPreA = inv(sigma2MuPreA+tau2ThetaPreA) ;
vector[Nsubj] precisJud = inv(sigma2MuJud+tau2ThetaJud) ;

vector[nYlevels] thetaPreA[Ntotal] ;
vector[nYlevels] thetaJud[Ntotal] ;
vector[nYlevels] thetaPost[Ntotal] ;

for (i in 1:Ntotal) {

thetaPreA[i,1] <- normal_cdf(fixedThresh[1], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) ;
thetaJud[i,1] <- normal_cdf(fixedThresh[1], muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1) ;
thetaPost[i,1] <- normal_cdf(fixedThresh[1], (precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]]))) ;

thetaPreA[i,2] <- fmax(0, normal_cdf(thresh[1], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) -
normal_cdf(fixedThresh[1], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1)) ;
thetaJud[i,2] <- fmax(0, normal_cdf(thresh[1],muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1) -
normal_cdf(fixedThresh[1],muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1)) ;
thetaPost[i,2] <- fmax(0, normal_cdf(thresh[1],(precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]]))) -
normal_cdf(fixedThresh[1],(precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]])))) ;

for (k in 3:nYlevels-2) {
thetaPreA[i,k] <- fmax(0, normal_cdf(thresh[k-1], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) -
normal_cdf(thresh[k-2], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1)) ;
thetaJud[i,k] <- fmax(0, normal_cdf(thresh[k-1],muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1) -
normal_cdf(thresh[k-2],muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1)) ;
thetaPost[i,k] <- fmax(0, normal_cdf(thresh[k-1],(precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]]))) -
normal_cdf(thresh[k-2],(precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]])))) ;
}

thetaPreA[i,10] <- fmax(0, normal_cdf(fixedThresh[2], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) -
normal_cdf(thresh[nYlevels-3], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1)) ;
thetaJud[i,10] <- fmax(0, normal_cdf(fixedThresh[2],muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1) -
normal_cdf(thresh[nYlevels-3],muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1)) ;
thetaPost[i,10] <- fmax(0, normal_cdf(fixedThresh[2],(precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]]))) -
normal_cdf(thresh[nYlevels-3],(precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]])))) ;

thetaPreA[i,nYlevels] <- normal_cdf(fixedThresh[2], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) ;
thetaJud[i,nYlevels] <- normal_cdf(fixedThresh[2], muMuJud[s[i]], (sqrt(precisJud[s[i]]))^-1) ;
thetaPost[i,nYlevels]<-normal_cdf(fixedThresh[2],(precisPreA[s[i]]*muMuPreA[s[i]]+precisJud[s[i]]*muMuJud[s[i]])/
(precisPreA[s[i]]+precisJud[s[i]]), sqrt(1/(precisPreA[s[i]]+precisJud[s[i]]))) ;
}
}

model {

// priors:

// thresholds:

for (k in 1:nYlevels-3) {
thresh[k] ~ normal(k+1.5, 2) ;
}

// Sigma for Mu of Mu-Trials:

alphaSigmaMuPreAGroup ~ gamma(0.01, 0.01) ;
betaSigmaMuPreAGroup ~ gamma(0.01, 0.01) ;
sigma2MuPreA ~ inv_gamma(alphaSigmaMuPreAGroup, betaSigmaMuPreAGroup) ;

alphaSigmaMuJudGroup ~ gamma(0.01, 0.01) ;
betaSigmaMuJudGroup ~ gamma(0.01, 0.01) ;
sigma2MuJud ~ inv_gamma(alphaSigmaMuJudGroup, betaSigmaMuJudGroup) ;

// Mu for Mu of Mu-Trials:

sigma2_muMuPreAGroup ~ inv_gamma(1, 1) ;
mu_muMuPreAGroup ~ normal((1+nYlevels)/2 , nYlevels) ; //
muMuPreA ~ normal(mu_muMuPreAGroup, sqrt(sigma2_muMuPreAGroup)) ;

sigma2_muMuJudGroup ~ inv_gamma(1, 1) ;
mu_muMuJudGroup ~ normal((1+nYlevels)/2 , nYlevels) ; //
muMuJud ~ normal(mu_muMuJudGroup, sqrt(sigma2_muMuJudGroup)) ;

// tau:

betaTauThetaPreA ~ gamma(0.01, 0.01) ; //
alphaTauThetaPreA ~ gamma(0.01, 0.01) ;
tau2ThetaPreA ~ inv_gamma(alphaTauThetaPreA, betaTauThetaPreA) ;

betaTauThetaJud ~ gamma(0.01, 0.01) ;
alphaTauThetaJud ~ gamma(0.01, 0.01) ;
tau2ThetaJud ~ inv_gamma(alphaTauThetaJud, betaTauThetaJud) ;

for (i in 1:Ntotal) {
yPreA[i] ~ categorical(thetaPreA[i, 1:nYlevels]) ;
yJud[i] ~ categorical(thetaJud[i, 1:nYlevels]) ;
yPost[i] ~ categorical(thetaPost[i, 1:nYlevels]) ;
}
}

I would greatly appreciate your thoughts about the possible source of thisyPreAJudPost_forForum.R (7.1 KB) yPreAJudPost_forForum.stan (5.7 KB) yPreAPreFbJudPostPreBOrd_GROUP_1-11_testSubj.csv (1.1 KB) error.
Thanks,
Ofir Shany

What those errors mean is that the arguments passed to at least one of these categorical distribution calls:

do not define valid probability distributions. How the categorical distribution is defined in Stan, the parameters (thetaPreA[i, 1:nYlevels], etc.) need to be simplexes (vectors with non-negative elements summing to one).

Check here for docs on this: https://mc-stan.org/docs/2_19/functions-reference/categorical-distribution.html

It’s entirely possible Jags uses a different parameterization. Look up what Jags is doing and see what it matches in Stan, if anything.

You should also start with a simpler model :D, especially if this is a simplified version of what you’re working on. Maybe there’s something from earlier in the book you can build on?

1 Like

Thanks for the advice! I have simplified the model as much as I could - so now it estimates the parameters for only one set of scores (this is similar to the most basic example presented by Kruschke). I’m still getting the same error. What could I be missing?

Here’s the simplified model (still not that simple…):

data{

int <lower=1> Nsubj ; // number of participants

int <lower=1> Ntotal ; // number of trials

int <lower=1> nYlevels ; // number of responses on scale (0-10 scale, so 11 responses)

int <lower=1> s[Ntotal] ; // index of subject at a given trial

real <lower=1> fixedThresh[nYlevels-9] ; // lower and upper thresholds are fixed, not stochastic

int yPreA[Ntotal] ; // participants’ ratings before feedback

}

parameters {

vector[nYlevels-2] thresh ; // non-fixed thresholds

vector[Nsubj] muMuPreA ;

real mu_muMuPreAGroup ;

real <lower=0> sigma2_muMuPreAGroup ;

vector<lower=0>[Nsubj] sigma2MuPreA ;

real <lower=0> alphaSigmaMuPreAGroup ;

real <lower=0> betaSigmaMuPreAGroup ;

vector<lower=0>[Nsubj] tau2ThetaPreA ;

real<lower=0> alphaTauThetaPreA ;

real<lower=0> betaTauThetaPreA ;

}

transformed parameters {

vector[Nsubj] precisPreA = inv(sigma2MuPreA+tau2ThetaPreA) ;

vector[nYlevels] thetaPreA[Ntotal] ;

for (i in 1:Ntotal) {

thetaPreA[i,1] <- normal_cdf(fixedThresh[1], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) ;

thetaPreA[i,2] <- fmax(0, normal_cdf(thresh[2], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) -

normal_cdf(fixedThresh[1], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1)) ;

for (k in 3:nYlevels-2) {

thetaPreA[i,k] <- fmax(0, normal_cdf(thresh[k], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) -

normal_cdf(thresh[k-1], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1)) ;

}

thetaPreA[i,10] <- fmax(0, normal_cdf(fixedThresh[2], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) -

normal_cdf(thresh[nYlevels-2], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1)) ;

thetaPreA[i,nYlevels] <- normal_cdf(fixedThresh[2], muMuPreA[s[i]], (sqrt(precisPreA[s[i]]))^-1) ;

}

}

model {

// thresholds:

for (k in 2:nYlevels-2) {

thresh[k] ~ normal(k+0.5, 2) ;

}

// Sigma for Mu of Mu-Trials:

alphaSigmaMuPreAGroup ~ gamma(0.01, 0.01) ;

betaSigmaMuPreAGroup ~ gamma(0.01, 0.01) ;

sigma2MuPreA ~ inv_gamma(alphaSigmaMuPreAGroup, betaSigmaMuPreAGroup) ;

// Mu for Mu of Mu-Trials:

sigma2_muMuPreAGroup ~ inv_gamma(1, 1) ;

mu_muMuPreAGroup ~ normal((1+nYlevels)/2 , nYlevels) ; //

muMuPreA ~ normal(mu_muMuPreAGroup, sqrt(sigma2_muMuPreAGroup)) ;

// tau:

betaTauThetaPreA ~ gamma(0.01, 0.01) ; //

alphaTauThetaPreA ~ gamma(0.01, 0.01) ;

tau2ThetaPreA ~ inv_gamma(alphaTauThetaPreA, betaTauThetaPreA) ;

for (i in 1:Ntotal) {

yPreA[i] ~ categorical(thetaPreA[i, 1:nYlevels]) ;

}

}

The model fails because you need to constrain your thresholds so that they are always ordered. If you don’t fix the first and last threshold you can declare thresh as ordered[nYLevels] instead of vector[nYlevels]. However, when thresholds aren’t fixed sigmas become nonidentifable.
Ordering the thresholds while keeping the bounds fixed is a bit trickier but the simplex type helps here.

data {
  ...
  real fixedThreshMin;
  real fixedThreshMax;
} parameters {
  ...
  simplex[nLevels-1] thresh_raw;
} transformed parameters {
  ...
  vector[nLevels] thresh;
  thresh[1] = fixedThreshMin;
  for (i in 1:(nLevels-1)) {
    thresh[i+1] = thresh[i] + (fixedThreshMax-fixedThreshMin)*thresh_raw[i];
  }
}

Stan has a builtin ordered probit distribution. You can simplify your code by using that instead of computing theta probabilities by hand and then using categorical distribution.

for (i in 1:Ntotal) {
  real sigmaPreA = sqrt(sigma2MuPreA[s[i]] + tau2ThetaPreA[s[i]]);
  yPre[i] ~ ordered_probit(muMuPreA[s[i]]/sigmaPreA, thresholds/sigmaPreA);
}

The simplified model still has an identifiability problem. The values of sigma2MuPreA and tau2ThetaPreA are not constrained individually by the data, only their sum is. If you use this model it’s better to omit those and just make sigmaPreA a parameter instead.

1 Like

Now it works. Thanks a lot!