Runtime error for multinomial mixture model


#1

I’m trying to adapt the Stan LDA example to be a simpler multinomial mixture model. I’m getting an error about “RuntimeError: Initialization failed.” without any information as to what is failing to initialize.

Here is the model I’m working on. I’m fairly new to Stan so it would be great if somebody could give me a hint where I’m going wrong. The main difference I think between what I’m doing and the LDA example is that I’m taking the dot_product of 2 vectors rather than doing it element-wise.

Model:

data = """
data {
   int<lower=1> K;               // num topics
   int<lower=2> V;               // num words
   int<lower=1> M;               // num docs
   vector<lower=0>[V] beta;     // fm prior
}
"""

transformed_data = """
transformed data {
   vector[V] y[M];
}
"""

parameters = """
parameters {
   simplex[V] phi[K];     // word dist for topic k
} 
"""

model = """
model {
    for (k in 1:K)  
        phi[k] ~ dirichlet(beta);     // prior
    for (m in 1:M) {
        real gamma[K];
        for (k in 1:K) 
            gamma[k] = log(dot_product(phi[k] , y[m]));
        increment_log_prob(log_sum_exp(gamma));  // likelihood
  }
}
"""

Running the model:

num_topics = 2
input_data = X
data_dict = {'K': num_topics,
             'V': input_data.shape[1],
             'M': input_data.shape[0],
               'beta': np.ones(input_data.shape[1]),
               'y': input_data}

fit = pystan.stan(model_code=data + transformed_data + parameters + model, 
                         data=data_dict, iter=1000, chains=2)

Error:

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\xxx\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "stanfit4anon_model_8967c01b6b0e586b4fad8569d19c6e56_5276136166146796457.pyx", line 368, in stanfit4anon_model_8967c01b6b0e586b4fad8569d19c6e56_5276136166146796457._call_sampler_star
  File "stanfit4anon_model_8967c01b6b0e586b4fad8569d19c6e56_5276136166146796457.pyx", line 401, in stanfit4anon_model_8967c01b6b0e586b4fad8569d19c6e56_5276136166146796457._call_sampler
RuntimeError: Initialization failed.
"""

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-80-f439a8def651> in <module>()
      9 
     10 fit = pystan.stan(model_code=data + transformed_data + parameters + model, 
---> 11                          data=data_dict, iter=1000, chains=2)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pystan\api.py in stan(file, model_name, model_code, fit, data, pars, chains, iter, warmup, thin, init, seed, algorithm, control, sample_file, diagnostic_file, verbose, boost_lib, eigen_lib, n_jobs, **kwargs)
    400                      sample_file=sample_file, diagnostic_file=diagnostic_file,
    401                      verbose=verbose, algorithm=algorithm, control=control,
--> 402                      n_jobs=n_jobs, **kwargs)
    403     return fit

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pystan\model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
    724         call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
    725         call_sampler_star = self.module._call_sampler_star
--> 726         ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
    727         samples = [smpl for _, smpl in ret_and_samples]
    728 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pystan\model.py in _map_parallel(function, args, n_jobs)
     79         try:
     80             pool = multiprocessing.Pool(processes=n_jobs)
---> 81             map_result = pool.map(function, args)
     82         finally:
     83             pool.close()

~\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

RuntimeError: Initialization failed.

#2

It appears that my issue was trying to:

  1. Use Jupyter Notebook (which is not compatible with pystan)
  2. Use pystan on windows (which should be compatible but I think my compiler is setup wrong)

I got this model to work by switch to rstan.


#3

You need to make sure the C++ toolchain is installed for your Jupyter notebooks. You can see that it is possible with. Several people have created Jupyter notebooks for PyStan and even put them in docker containers.

This can be problematic in some circumstances because of the incompatibility of the MSVC compiler on which Python depends and the Eigen matrix lib on which we depend (the compiler is non-standard—this isn’t Eigen’s fault!).