Stan segfaulted and idk really what to do

llc · March 10, 2020, 12:46am

Here is the data I input to the model (it’s mostly junk from testing my exam, ignore the weird fact that no one got a question right):

{"infoPairCount":0,"infoPairQuestion":[],"infoPairStudent":[],"promptCount":44,"promptGrade":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"promptQuestion":[1,2,2,3,4,7,10,12,20,21,22,27,27,27,28,31,32,35,38,50,51,57,59,61,68,70,77,80,91,92,96,104,107,111,116,117,119,122,131,144,147,148,152,159],"promptStudent":[5,1,5,5,5,3,4,4,3,1,3,3,4,1,2,1,4,4,4,3,2,3,3,4,4,3,4,1,3,4,3,2,4,3,4,4,1,3,4,4,3,3,4,3],"questionCount":159,"questionExpertDifficultyRating":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"questionFreeResponse":[1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1],"questionOptionsAvailable":[1,1,1,1,1,1,1,1,1,1,1,1,5,5,5,5,1,1,1,1,1,1,1,6,1,4,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,5,5,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,5,1,1,2,3,4,1,5,5,5,5,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,4,5,5,5,5,6,4,7,5,1,2,5,1,5,5,5,5,5,5,5,5,5,5,5,1],"studentCount":5}

And here is the model. Basic item response theory, will some eccentricities:

functions {
  real questionGuessRate(int optionsAvailable, int freeResponse) {
    if(freeResponse == 1) {
      return 0.1;
    } else {
      return (1.0/optionsAvailable);
    }
  }
  real probCorrectAnswer(real ability, real difficulty, real discrimination, real guessRate) {
    return guessRate + (1 - guessRate)*inv_logit(discrimination * (ability - difficulty));
  }
}

//GIVE ME REAL DATA STRUCTURES REEEEEEEEEE
data {
  int<lower=0> studentCount;

  int<lower=0> questionCount;
  int<lower=0,upper=1> questionFreeResponse[questionCount]; //Boolean indicating whether or not the question is free response
  int<lower=0> questionOptionsAvailable[questionCount]; //Number of options the user can pick from. Question could still have a free response component.
  real<lower=0,upper=1> questionExpertDifficultyRating[questionCount]; //Pre-estimated difficulty rating (by The Experts)

  int<lower=0> promptCount;
  int<lower=0,upper=1> promptGrade[promptCount]; //correctness of answer 'n'
  int<lower=1,upper=studentCount> promptStudent[promptCount]; //student for answer 'n'
  int<lower=1,upper=questionCount> promptQuestion[promptCount];  //underlying question that generated the prompt 'n'
  //note: the prompt could have different text or starting values, but we're treating it as the same for analysis

  int<lower=0> infoPairCount; //Number of information pairs we need generated.
  int<lower=0, upper=studentCount> infoPairStudent[infoPairCount]; //Student in info pair 'n'
  int<lower=0, upper=questionCount> infoPairQuestion[infoPairCount]; //Question in info pair 'n'
}

parameters {
  real studentAbility[studentCount]; //ability of student j
  real questionDifficulty[questionCount]; //difficulty of question k
  real<lower=0> questionDiscrimination[questionCount]; //discrimination of question k
}

model {
  studentAbility ~ normal(1, 1);
  questionDifficulty ~ normal(1, 3);
  questionDiscrimination ~ lognormal(1, 1);
  for (n in 1:promptCount) {
    int question = promptQuestion[n];
    real guessRate = questionGuessRate(questionOptionsAvailable[question], questionFreeResponse[question]);
    promptGrade[n] ~ bernoulli(probCorrectAnswer(studentAbility[n], questionDifficulty[question], questionDiscrimination[question], guessRate));
  }
}

generated quantities {
  real infoPairVal[infoPairCount]; //Information in question x for student y
  for (n in 1:infoPairCount) {
    real diff = questionDifficulty[infoPairQuestion[n]];
    real disc = questionDiscrimination[infoPairQuestion[n]];
    real guessR = questionGuessRate(questionOptionsAvailable[infoPairQuestion[n]], questionFreeResponse[infoPairQuestion[n]]);
    real chanceRight = probCorrectAnswer(studentAbility[infoPairStudent[n]], diff, disc, guessR);
    real chanceWrong = 1 - chanceRight;
    infoPairVal[n] = (square(disc)*square(chanceRight - guessR)*chanceWrong*inv_square(1-guessR))/chanceRight;
  }
}

Output:

  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
id = 0 (Default)
data
  file = /tmp/56777d5c-0359-43ac-a242-2828ec045dfa.json
init = 2 (Default)
random
  seed = -1 (Default)
output
  file = /tmp/e5801f94-9ae1-4fe2-b1c4-49f15652ed31.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)

Rejecting initial value:
  Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.
: signal: segmentation fault (core dumped)

I hate posting production stan code on forums, even when its basic linear regression stuff like this, but I really have no easy way of solving this problem and also really want to report bugs

mitzimorris · March 10, 2020, 3:55am

what interface and version are you running?

llc · March 10, 2020, 4:20am

@mitzimorris
cmdstan. I run this in a dockerfile and it normally grabs the latest version of stan and cmdstan upon run. So, whatever version “git clone cmdstan” on the master branch gets me.

mitzimorris · March 11, 2020, 5:05am

OK, that segfault shouldn’t happen - will try and figure how it’s happening.

nonetheless, the error message before the segfault is significant:

 Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.

there’s a problem with the model as written. for basic IRT models with discrimination, take a look at the Stan User’s manual:

llc · March 12, 2020, 7:27pm

I reverted back to the old stanc2 compiler and it gave me a more helpful warning; I was reading past the end of the studentAbility array. Instead of accessing studentAbility[n] I needed to access studentAbility[promptStudent[n]].

mitzimorris · March 12, 2020, 7:52pm

yup, the indexing is tricky.

having looked at your model, I think you should be using bernoulli_logit in the model block and maybe you don’t need that function you’ve written - you probably don’t need to compute the inverse logit.

Topic		Replies	Views
Segmentation fault with cmdstan CmdStan paralellization	5	1247	January 26, 2021
R constantly crashed after a few iterations Modeling	0	405	December 4, 2018
Failure to start because of initial values Modeling	16	3526	July 31, 2017
Segfault in simple exponential model General	6	822	July 9, 2018
Divergent transitions & BFMI low in a state-space model Modeling	6	1037	August 17, 2017

Stan segfaulted and idk really what to do

Related topics