Simple linear predictor model crashing immediately

I’m attempting to fit a relatively straightforward hierarchical model on this dataset.
(See small snapshot below)

00

Basically, we want to estimate BA and notBA counts from category, person, and session as simple linear predictors.

The data variables are created in R like this:

NumMarkers <- length(unique(long_present_simple$category))
NumSubPops <- length(unique(long_present_simple$person))
NumObservations <- dim(long_present_simple)[1]
SpeakerSubPop <- long_present_simple$person
SpeakerAge <- long_present_simple$session
MarkerType <- long_present_simple$category
NumUtterancesAB <- rep(sum(long_present_simple$notBA, na.rm=TRUE) + sum(long_present_simple$notBnotA, na.rm=TRUE), NumObservations)
NumUtterancesNotAB <- rep(sum(long_present_simple$BA, na.rm=TRUE) + sum(long_present_simple$BnotA, na.rm=TRUE), NumObservations)
CountsAB <- rep(sum(long_present_simple$notBA, na.rm=TRUE), NumObservations)
CountsNotAB <- rep(sum(long_present_simple$BA, na.rm = TRUE),NumObservations)
StdDev <- 1

And the Stan model is here:

data {
  int<lower=0> NumMarkers;                              //Number of marker categories in data
  int<lower=0> NumSubPops;                              //Number of groups in the data
  int<lower=0> NumObservations;                         //Number of marker-dyad observations
  int<lower=0> SpeakerSubPop[NumObservations];          //Group number for each observation
  int<lower=0> SpeakerAge[NumObservations];             //Group number for each observation
  int<lower=0> MarkerType[NumObservations];             //Marker number for each observation
  int<lower=0> NumUtterancesAB[NumObservations];        //Number of times the first message didn't contain the marker
  int<lower=0> NumUtterancesNotAB[NumObservations];     //Number of times the first message did contain the marker
  int<lower=0> CountsAB[NumObservations];               //Number of times the first message didn't contain the marker but the reply did
  int<lower=0> CountsNotAB[NumObservations];            //Number of times the first message did contain the marker and the reply did
  real<lower=0> StdDev;                                 //SD for each normal dist in the hierarchy (sole free parameter)
}

parameters {      
  real eta_ab_pop;
  real eta_ab_age;
  real eta_pop_Marker[NumMarkers];                      //linear predictor for each marker's baseline

  real eta_ab_subpop[NumSubPops];
  real eta_ab_age_subpop[NumSubPops];

  vector[NumMarkers] eta_subpop_Marker[NumSubPops];            //lin. pred. for each marker+group baseline
  vector[NumMarkers] eta_ab_subpop_Marker[NumSubPops];         //lin. pred. for each marker+group alignment
  vector[NumObservations] eta_observation;              //lin. pred. for each marker+dyad baseline
  vector[NumObservations] eta_ab_observation;           //lin. pred. for each marker+dyad alignment
} 

transformed parameters {
  vector<lower=0,upper=1>[NumObservations] mu_notab;    //inv-logit transform of lin. pred. into probability space
  vector<lower=0,upper=1>[NumObservations] mu_ab;

  for (Observation in 1:NumObservations) {
    mu_notab[Observation] = inv_logit(eta_observation[Observation]);
    mu_ab[Observation] = inv_logit(eta_ab_observation[Observation] + eta_observation[Observation] + 
                                    eta_ab_age_subpop[SpeakerSubPop[Observation]] * SpeakerAge[Observation]);
  }

}

model {
  //top level alignment
  eta_ab_pop ~ normal(0, StdDev);
  eta_ab_age ~ normal(0, StdDev);

  eta_pop_Marker ~ uniform(-5,5);     //Note that this distribution may be changed if baselines are expected to be very high/low

  eta_ab_subpop ~ normal(eta_ab_pop, StdDev);
  //marker-group level distributions
  for(SubPop in 1:NumSubPops) {
    eta_subpop_Marker[SubPop] ~ normal(eta_pop_Marker, StdDev);
    eta_ab_subpop_Marker[SubPop] ~ normal(eta_ab_subpop[SubPop], StdDev);
    eta_ab_age_subpop[SubPop] ~ normal(eta_ab_age, StdDev);
  }

  //marker-dyad level distributions
  for(Observation in 1:NumObservations) {
    eta_observation[Observation] ~ normal(eta_subpop_Marker[SpeakerSubPop[Observation], MarkerType[Observation]], StdDev);
    eta_ab_observation[Observation] ~ normal(eta_ab_subpop_Marker[SpeakerSubPop[Observation], MarkerType[Observation]], StdDev);
  }

  //drawing reply usage counts given number of msg-reply pairs
  CountsAB ~ binomial(NumUtterancesAB, mu_ab);
  CountsNotAB ~ binomial(NumUtterancesNotAB, mu_notab);
}

We figure this model should be easy enough to fit, but R crashes almost immediately after beginning sampling on the full dataset (~6.7m observations), and takes >4 hours for a tiny subset (~150k observations). Is there something glaring in the model’s architecture that would give Stan / rstan trouble?

Thanks very much for the help!

Stan supports vectorized code, such as:

   eta_observation ~ normal(eta_subpop_Marker[SpeakerSubPop, MarkerType], StdDev);
   eta_ab_observation ~ normal(eta_ab_subpop_Marker[SpeakerSubPop, MarkerType], StdDev);

This is will gives a significant speedup. However there is a more problematic part of the code.
The model is called hierarchical, but the variance are “free”, precisely assigned to a StdDev parameter.
However each hierarchy in this type of model requires a hyper-prior parameter on the variances to restrict these. But they are missing in the model.
I’d recommend to use rstanarm, brms in your case. Both packages having implemented in a sophistic way the math, to avoid problems likes this.

Note the memory consumption of the autodiff stack: 24bytes + 8 bytes * N where N is the number of operands in each expression tree stack ( Carpenter, 2017). Since you have 6.7m observations it could be crashing due to memory consumption (please check that this is correct).

I might run in cmdstan and monitory the memory consumption of stan as a quick check to make sure that’s not the issue (using top in linux or whatever). Also R could limit the memory consumption as an application, not so sure, but these are two quick checks to see why it’s crashing.

Thanks for the input!

Re: vectorized code
Good to know. I’ve tried implementing some vectorized versions of the loops I previously had, but I’m getting the following error:

    SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for: 

  vector ~ normal(vector[], real)

Available argument signatures for normal:

  real ~ normal(real, real)
  real ~ normal(real, real[])
  real ~ normal(real, vector)
  real ~ normal(real, row vector)
  real ~ normal(real[], real)
  real ~ normal(real[], real[])
  real ~ normal(real[], vector)
  real ~ normal(real[], row vector)
  real ~ normal(vector, real)
  real ~ normal(vector, real[])
  real ~ normal(vector, vector)
  real ~ normal(vector, row vector)
  real ~ normal(row vector, real)
  real ~ normal(row vector, real[])
  real ~ normal(row vector, vector)
  real ~ normal(row vector, row vector)
  real[] ~ normal(real, real)
  real[] ~ normal(real, real[])
  real[] ~ normal(real, vector)
  real[] ~ normal(real, row vector)
  real[] ~ normal(real[], real)
  real[] ~ normal(real[], real[])
  real[] ~ normal(real[], vector)
  real[] ~ normal(real[], row vector)
  real[] ~ normal(vector, real)
  real[] ~ normal(vector, real[])
  real[] ~ normal(vector, vector)
  real[] ~ normal(vector, row vector)
  real[] ~ normal(row vector, real)
  real[] ~ normal(row vector, real[])
  real[] ~ normal(row vector, vector)
  real[] ~ normal(row vector, row vector)
  vector ~ normal(real, real)
  vector ~ normal(real, real[])
  vector ~ normal(real, vector)
vector ~ normal(real, row vector)
  vector ~ normal(real[], real)
  vector ~ normal(real[], real[])
  vector ~ normal(real[], vector)
  vector ~ normal(real[], row vector)
  vector ~ normal(vector, real)
  vector ~ normal(vector, real[])
  vector ~ normal(vector, vector)
  vector ~ normal(vector, row vector)
  vector ~ normal(row vector, real)
  vector ~ normal(row vector, real[])
  vector ~ normal(row vector, vector)
  vector ~ normal(row vector, row vector)
  row vector ~ normal(real, real)
  row vector ~ normal(real, real[])
  row vector ~ normal(real, vector)
  row vector ~ normal(real, row vector)
  row vector ~ normal(real[], real)
  row vector ~ normal(real[], real[])
  row vector ~ normal(real[], vector)
  row vector ~ normal(real[], row vector)
  row vector ~ normal(vector, real)
  row vector ~ normal(vector, real[])
  row vector ~ normal(vector, vector)
  row vector ~ normal(vector, row vector)
  row vector ~ normal(row vector, real)
  row vector ~ normal(row vector, real[])
  row vector ~ normal(row vector, vector)
  row vector ~ normal(row vector, row vector)

require real scalar return type for probability function.
  error in 'model57116ebfc172_twitter_align_redux' at line 69, column 82
  -------------------------------------------------
    67:   // }
    68:   
    69:   eta_observation ~ normal(eta_subpop_Marker[SpeakerSubPop, MarkerType], StdDev);
                                                                                         ^
    70:   eta_ab_observation ~ normal(eta_ab_subpop_Marker[SpeakerSubPop, MarkerType], StdDev);
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'twitter_align_redux' due to the above error.
In addition: Warning message:
In readLines(file, warn = TRUE) :
  incomplete final line found on '/Users/josephdenby/ldp-alignment/twitter_align_redux.stan'

I assume this has something to do with Stan’s data structures, which I admit I’m not super familiar with. Is this an easy fix?

Re: rstanarm, brms
Thanks for the suggestion - I will look into these options. My only concern is with transforming parameters - will I able to do the transformation I require with those simplified packages? Also, I’m not totally sure what you mean when you say that the variance being ‘free’ is an issue. Is that something I could implement easily within the stan code?