I am developing a meta-analytic model for crash studies that implements a new inference procedure for count-based culpability crash studies. However - I have encountered an issue that I think may be related to a numerical underflow issue and hope someone can point me in the right direction.

**The research question**

Briefly put, a culpability study involves a sample of crash-involved drivers split into four groups. The split is along two dimensions: Those who were and were not measured as positive on some factor (e.g. positive blood alcohol concentration), and those who were and were not judged as â€śculpableâ€ť for the accident (scored by researchers who are given information on the crash - but not on the driverâ€™s substance use). The assumption made is that the non-culpable drivers are a random sample from the road.

If the substance use has no impact on the risk of culpable accidents, the probability distribution across the four cell would be easy: Define a culpability share culp_share and a â€śsubstance-positiveâ€ť share of drivers on the road (road_share), and the probabilities would be culp_share*road_share, and (1-culp_share) * road_share, etcâ€¦ Assume that the average risk of accident in the substance-positive group (culp and non-culp combined) increases by a factor alpha that is multiplied into the baseline accident risk. This will increase the share of culpable, substance positive counts, reducing the probabilities in the other three cells. These cells will now have their â€śoriginalâ€ť shares multiplied by a factor 1/(alpha * road_share + (1-road_share)), and the substance-positive culpable share will be 1- sum(remaining three cells).

The model draws parameter values alpha, road_share and culp_share, calculates the cell probabilities, and models the counts from actual studies as a multinomial draw with these probabilities. With multiple studies, I use parameter vectors with the same prior distribution (Iâ€™m working on a hierarchical model, but reverted to a simpler model to first figure out the error Iâ€™m encountering)

**The Stan model error**

To test the model I use simulated data with multiple random â€śstudiesâ€ť drawn from the same multinomial probabilities. If I give it a single study, the model estimates nicely- but usually has a couple of error messages before each chain gets started (â€śRejecting initial value: Error evaluating the log probability at the initial value.â€ť). At my work computer (windows pc) it seems to manage three or four studies at a time, but anything more and initialization fails. On my home laptop (a Macbook Pro), it manages more (once in a while it is able to take 20 studies). When it fails, I get the error info: â€śInitialization between (-2, 2) failed after 100 attempts.

Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

[1] â€śError in sampler$call_sampler(args_list[[i]]) : Initialization failed.â€ťâ€ť

I think the problem may be that the share of substance positive drivers on the road is very low (for cannabis, the crude rates for â€śnon-culpableâ€ť counts across 6 studies ranges from 1.3% to 4.2%. My hunch is that low probabilities multiplied by low probabilities creates an underflow and gives cells zero probability, messing up the multinomial likelihood, and that the probability of this happening to *at least one study* is sufficiently high to stop the model from working.

Is there some standard â€śtrickâ€ť for working around or avoiding this issue? Iâ€™ve thought about using some inv_logit expression for the probabilities - but Iâ€™ve never used a Jacobian adjustment for the transform and donâ€™t know how I would then define the prior.

**Model code**

```
data {
int<lower=0> study_count_culp;
int<lower=0> culp_studies[study_count_culp,4]; // Study counts. column order: nonculp and thc-neg, nonculp and thc-pos, culp and thc-neg, culp and thc pos
}
parameters {
vector<lower=0, upper=1>[study_count_culp] road_share;
vector<lower=0>[study_count_culp] crash_risk;
vector<lower=0, upper=1>[study_count_culp] culp_share;
}
transformed parameters {
real temp_denom; //denominator for adjusting cell probabilities
simplex[4] culp_study_shares[study_count_culp];
// Calculate cell-probabilities for culpability crash studies.
for (i in 1:study_count_culp){
temp_denom = (1-road_share[i]*(1-crash_risk[i]));
culp_study_shares[i,1] = (1-culp_share[i])*(1-road_share[i])/temp_denom;
culp_study_shares[i,2] = (1-culp_share[i])*(road_share[i])/temp_denom;
culp_study_shares[i,3] = (culp_share[i])*(1-road_share[i])/temp_denom;
culp_study_shares[i,4] = 1-sum(culp_study_shares[i,1:3]);
}
}
model {
// Set the priors
road_share ~ normal(0.03,0.02);
crash_risk ~ normal(1.4,0.5);
culp_share ~normal(0.6, 0.2);
for (i in 1:study_count_culp){
culp_studies[i,] ~ multinomial(culp_study_shares[i]);
}
}
```

**Simulated data**

Repeated samplings from the same multinomial distribution, with probabilities based on alpha = 1.4, road_share = 0.03 and culp_share 0.6. For now, these are the same values as the mean of the priors. Iâ€™ve included 50 â€śstudiesâ€ť (each study is one line with four counts). Iâ€™ve included the matrix as a csv file - just choose the number of lines to pass into matrix form and feed into the model as data.

**Set-up**

Using the latest version of Stan and R - updated a couple of days ago. Issue appears on both Mac and Windows (but Mac handles more studies before failing).

sim_data.csv (691 Bytes)