Getting a runtime ititalization failure error on a poisson GLM

Hello,

I’m trying to run the same Poisson GLM model from the python statsmodel package in Pystan. As I’ve added more than two variables, I’m starting to get an RuntimeError: Initialization failed. error.

The below is the model code and full error traceback.

I’m not sure why this is happening.

liability_code = """
data {
    int<lower=0> N;
    vector[N] BIlmt_model2;
    vector[N] multi_policy_count_model;
    vector[N] unit_value_model2;
    vector[N] yrs_owned_model2;
    vector[N] credit_c5_score;
    vector[N] credit_c6_score;
    vector[N] c6_2;
    vector[N] c6_3;
    vector[N] credit_52778_score;
    vector[N] nohit_52778;
    vector[N] thinfile_52778;
    vector[N] class_model2_Sport;
    vector[N] class_model2_Street;
    vector[N] class_model2_other;
    vector[N] Single_Driver_Age_model;
    vector[N] Driver_Age_model;
    int<lower=0> y1[N];
}
parameters {            
    real BIlmt_coeff;      
    real unit_value_model2_coeff;
    real yrs_owned_model2_coeff;
    real credit_c5_score_coeff;
    real credit_c6_score_coeff;
    real c6_2_coeff;
    real c6_3_coeff;
    real credit_52778_score_coeff;
    real thinfile_52778_coeff;
    real nohit_52778_coeff;
    real class_model2_Sport_coeff;
    real class_model2_Street_coeff;
    real class_model2_other_coeff;
    real Single_Driver_Age_model_coeff;
    real Driver_Age_model_coeff;
    real intercept;        // parameter for mean or intercept   
}
model {
    BIlmt_coeff ~ normal(0, 1);  
    unit_value_model2_coeff ~ normal(0, 1); 
    yrs_owned_model2_coeff ~ normal(0, 1); 
    credit_c5_score_coeff ~ normal(0, 1); 
    credit_c6_score_coeff ~ normal(0, 1); 
    c6_2_coeff ~ normal(0, 1); 
    c6_3_coeff ~ normal(0, 1); 
    credit_52778_score_coeff ~ normal(0,1);
    thinfile_52778_coeff ~ normal(0,1);
    nohit_52778_coeff ~ normal(0,1);
    class_model2_Sport_coeff ~ normal(0, 1); 
    class_model2_Street_coeff ~ normal(0, 1); 
    class_model2_other_coeff ~ normal(0, 1); 
    Single_Driver_Age_model_coeff ~ normal(0, 1); 
    Driver_Age_model_coeff ~ normal(0, 1); 
    intercept ~ normal(0,5);
    
    y1 ~ poisson(BIlmt_model2*BIlmt_coeff + multi_policy_count_model*.9 + \
                  unit_value_model2*unit_value_model2_coeff + yrs_owned_model2*unit_value_model2_coeff +\
                  yrs_owned_model2*yrs_owned_model2_coeff + credit_c5_score*credit_c5_score_coeff +\
                  credit_c6_score*credit_c6_score_coeff + c6_2*c6_2_coeff + c6_3*c6_3_coeff +\
                  class_model2_Sport*class_model2_Sport_coeff + class_model2_Street*class_model2_Street_coeff +\
                  class_model2_other*class_model2_other_coeff + \
                  Single_Driver_Age_model*Single_Driver_Age_model_coeff + \
                  Driver_Age_model*Driver_Age_model_coeff + credit_52778_score*credit_52778_score_coeff + \
                  thinfile_52778*thinfile_52778_coeff + nohit_52778*nohit_52778_coeff + intercept);   
}
"""

sm_liab = StanModel(model_code=liability_code)

sm_fit = sm_liab.sampling(data=data1, warmup=500, verbose=True)

print(sm_fit)

ERROR

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\stan_env\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\stan_env\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "stanfit4anon_model_ca67a6fa396bdaf9992be09ffdcd3b1c_5735096991081996350.pyx", line 373, in stanfit4anon_model_ca67a6fa396bdaf9992be09ffdcd3b1c_5735096991081996350._call_sampler_star
  File "stanfit4anon_model_ca67a6fa396bdaf9992be09ffdcd3b1c_5735096991081996350.pyx", line 406, in stanfit4anon_model_ca67a6fa396bdaf9992be09ffdcd3b1c_5735096991081996350._call_sampler
RuntimeError: Initialization failed.
"""

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-146-403e5a612b5d> in <module>
----> 1 sm_fit = sm_liab.sampling(data=data1, warmup=500, verbose=True)
      2 print(sm_fit)

~\AppData\Roaming\Python\Python37\site-packages\pystan\model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
    811         call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
    812         call_sampler_star = self.module._call_sampler_star
--> 813         ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
    814         samples = [smpl for _, smpl in ret_and_samples]
    815 

~\AppData\Roaming\Python\Python37\site-packages\pystan\model.py in _map_parallel(function, args, n_jobs)
     83         try:
     84             pool = multiprocessing.Pool(processes=n_jobs)
---> 85             map_result = pool.map(function, args)
     86         finally:
     87             pool.close()

~\AppData\Local\Continuum\anaconda3\envs\stan_env\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

~\AppData\Local\Continuum\anaconda3\envs\stan_env\lib\multiprocessing\pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

RuntimeError: Initialization failed.

Hey! I think you did mean to fit you model on the log scale as is conventional for a Poisson GLM, right? In that case you need to change y1 ~ poisson(...) to y1 ~ poisson_log(...). That should help Stan to find appropriate initial values.

1 Like

Thank you. It’s running and hasn’t error’d out yet. I will let you know what happens. I assume this will take a long time at 251,000 observations.

I’m afraid, yes. You could try fitting subsets of your data (few thousands) until you have everything working.

For speed-ups, you can also look into using y ~ poisson_log_glm(x, alpha, beta). You’d have to put all predictors into a matrix and all coefficients into a vector, though.

Also, if your predictors are correlated, you can probably greatly increase speed of your model by using a QR decomposed predictor matrix.

Cheers,
Max

2 Likes