Stan segfaulted and idk really what to do

Here is the data I input to the model (it’s mostly junk from testing my exam, ignore the weird fact that no one got a question right):

{"infoPairCount":0,"infoPairQuestion":[],"infoPairStudent":[],"promptCount":44,"promptGrade":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"promptQuestion":[1,2,2,3,4,7,10,12,20,21,22,27,27,27,28,31,32,35,38,50,51,57,59,61,68,70,77,80,91,92,96,104,107,111,116,117,119,122,131,144,147,148,152,159],"promptStudent":[5,1,5,5,5,3,4,4,3,1,3,3,4,1,2,1,4,4,4,3,2,3,3,4,4,3,4,1,3,4,3,2,4,3,4,4,1,3,4,4,3,3,4,3],"questionCount":159,"questionExpertDifficultyRating":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"questionFreeResponse":[1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1],"questionOptionsAvailable":[1,1,1,1,1,1,1,1,1,1,1,1,5,5,5,5,1,1,1,1,1,1,1,6,1,4,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,5,5,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,5,1,1,2,3,4,1,5,5,5,5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,4,5,5,5,5,6,4,7,5,1,2,5,1,5,5,5,5,5,5,5,5,5,5,5,1],"studentCount":5}

And here is the model. Basic item response theory, will some eccentricities:

functions {
  real questionGuessRate(int optionsAvailable, int freeResponse) {
    if(freeResponse == 1) {
      return 0.1;
    } else {
      return (1.0/optionsAvailable);
    }
  }
  real probCorrectAnswer(real ability, real difficulty, real discrimination, real guessRate) {
    return guessRate + (1 - guessRate)*inv_logit(discrimination * (ability - difficulty));
  }
}

//GIVE ME REAL DATA STRUCTURES REEEEEEEEEE
data {
  int<lower=0> studentCount;

  int<lower=0> questionCount;
  int<lower=0,upper=1> questionFreeResponse[questionCount]; //Boolean indicating whether or not the question is free response
  int<lower=0> questionOptionsAvailable[questionCount]; //Number of options the user can pick from. Question could still have a free response component.
  real<lower=0,upper=1> questionExpertDifficultyRating[questionCount]; //Pre-estimated difficulty rating (by The Experts)

  int<lower=0> promptCount;
  int<lower=0,upper=1> promptGrade[promptCount]; //correctness of answer 'n'
  int<lower=1,upper=studentCount> promptStudent[promptCount]; //student for answer 'n'
  int<lower=1,upper=questionCount> promptQuestion[promptCount];  //underlying question that generated the prompt 'n'
  //note: the prompt could have different text or starting values, but we're treating it as the same for analysis

  int<lower=0> infoPairCount; //Number of information pairs we need generated.
  int<lower=0, upper=studentCount> infoPairStudent[infoPairCount]; //Student in info pair 'n'
  int<lower=0, upper=questionCount> infoPairQuestion[infoPairCount]; //Question in info pair 'n'
}

parameters {
  real studentAbility[studentCount]; //ability of student j
  real questionDifficulty[questionCount]; //difficulty of question k
  real<lower=0> questionDiscrimination[questionCount]; //discrimination of question k
}

model {
  studentAbility ~ normal(1, 1);
  questionDifficulty ~ normal(1, 3);
  questionDiscrimination ~ lognormal(1, 1);
  for (n in 1:promptCount) {
    int question = promptQuestion[n];
    real guessRate = questionGuessRate(questionOptionsAvailable[question], questionFreeResponse[question]);
    promptGrade[n] ~ bernoulli(probCorrectAnswer(studentAbility[n], questionDifficulty[question], questionDiscrimination[question], guessRate));
  }
}

generated quantities {
  real infoPairVal[infoPairCount]; //Information in question x for student y
  for (n in 1:infoPairCount) {
    real diff = questionDifficulty[infoPairQuestion[n]];
    real disc = questionDiscrimination[infoPairQuestion[n]];
    real guessR = questionGuessRate(questionOptionsAvailable[infoPairQuestion[n]], questionFreeResponse[infoPairQuestion[n]]);
    real chanceRight = probCorrectAnswer(studentAbility[infoPairStudent[n]], diff, disc, guessR);
    real chanceWrong = 1 - chanceRight;
    infoPairVal[n] = (square(disc)*square(chanceRight - guessR)*chanceWrong*inv_square(1-guessR))/chanceRight;
  }
}

Output:

  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
id = 0 (Default)
data
  file = /tmp/56777d5c-0359-43ac-a242-2828ec045dfa.json
init = 2 (Default)
random
  seed = -1 (Default)
output
  file = /tmp/e5801f94-9ae1-4fe2-b1c4-49f15652ed31.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)

Rejecting initial value:
  Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.
: signal: segmentation fault (core dumped)

I hate posting production stan code on forums, even when its basic linear regression stuff like this, but I really have no easy way of solving this problem and also really want to report bugs

what interface and version are you running?

@mitzimorris
cmdstan. I run this in a dockerfile and it normally grabs the latest version of stan and cmdstan upon run. So, whatever version “git clone cmdstan” on the master branch gets me.

OK, that segfault shouldn’t happen - will try and figure how it’s happening.

nonetheless, the error message before the segfault is significant:

 Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.

there’s a problem with the model as written. for basic IRT models with discrimination, take a look at the Stan User’s manual:

1 Like

I reverted back to the old stanc2 compiler and it gave me a more helpful warning; I was reading past the end of the studentAbility array. Instead of accessing studentAbility[n] I needed to access studentAbility[promptStudent[n]].

2 Likes

yup, the indexing is tricky.

having looked at your model, I think you should be using bernoulli_logit in the model block and maybe you don’t need that function you’ve written - you probably don’t need to compute the inverse logit.