I am creating a model to predict end-of-year colorectal cancer screening rates in certain populations (markets). The model uses end-of-month snapshots of numerators and denominators at the end of each month [variable month_number] (1-11 1/31…12/31) as well as a “final” snapshot of month_number = 12 which is not equidistant in time but simply represents a sort of post-year effort to get the numbers up. Each market has 2 years historical data as well as the current year-to-date snapshots.

So the basic model form is

```
Numerator | trials (Denominator) ~
0 + mo(month_number) + #the global accumulation curve
mo(month_number | market) + #market level variation from that curve
(1|market:year), #market-year variation
data = df, family = 'binomial')
```

This question is a bit open-ended but what I’m having a little trouble reconciling are the various ways I can handle differences between markets, which can have pretty drastically different shapes. In fact, the alpha=1 prior can be too restrictive to identify some markets which have very large month-12 jumps.

For example, we have:

```
0 + mo(month_number)*market + mo(month_number| market) ...
```

In this case we create an interaction term. I have over 100 markets so this involves me manually grouping them into ones that have similar profiles. There doesn’t seem to be a way to set a universal prior for all ‘simo’ terms though so I have to manually set each one. I’m also a bit confused on how the interaction terms work with the varying mo terms and if I could be creating identifiability issues there. In practice, though, this model seems to better capture the variation in markets than the first one.

We also have groupings. For example:

```
0 + mo(month_number, id = 'market:year')....
```

I’m a little bit confused on whether I should be adding this in or not, especially when I’m creating varying terms or with interaction terms. I would like to ensure that if my current data goes to month_number = 5 for example, that my predictions on 6-12 will be monotonic and higher than 5. Is this what ‘conditional monotonicity’ means? In my models without the id term, for example, it’s possible for me to get predictions below the current data point due to model calibration (though doesn’t happen much).

Also, am I right to understand that any priors I put on the intercept affect the scale parameter beta? Since my monotonic vector is length 11, would it be correct to divide by 11 as this parameter represents the average distance between categories (using logistic link for binomial in this case)?

Again, these aren’t exactly coding questions and a bit open ended. I’m not expecting anyone to address all of them but any clarification on things I may be misunderstanding will help me make progress on this problem.