Hello,
I am currently working on a model to estimate clicks based on a combination of ads and audiences. The dependent variable is the number of clicks per ad and audience, while the independent variable is the associated cost per ad and audience. Given the large number of ads and audiences, and limited data points, I am concerned about data sparsity and would like to pool the estimates to shrink them toward the mean. Specifically, I want to have one hyperprior for each audience and one for each ad. These hyperpriors should then be combined in some way to form the actual prior for the ad-audience pair (at least that is my idea).
In the context I am modeling, the coefficient b_{ad, audience} should be strictly positive, and the exponent sat_{ad, audience} should be bounded between 0 and 1, which leaves several choices. Currently, I multiply each hyperprior by 0.5 and sum them, though I could also multiply them, or potentially use a Dirichlet hyperprior to distribute across each of the individual priors. As for the likelihood function, I have not yet defined it. The output should be strictly positive and discrete unless I scale the output, so I am considering a Poisson likelihood function.
I am looking for feedback from more experienced Bayesian modelers about any potential issues with this approach. While I plan to check for R-hats, divergences, and energy plots, I’m curious whether there are any structural challenges with this setup that might present problems for an HMC sampler, or any inherent pathologies I might be overlooking. Any intuition on this would be greatly appreciated.
The model is as follows:
Where:
- y_{ad, audience} is the number of outcomes for a given ad and audience
- \text{cost}_{ad, audience} is the associated cost for that ad and audience
- b_{ad, audience} is a parameter for the relationship between the outcomes and cost
- sat_{ad, audience} is the saturation parameter for each ad and audience
The priors I am using are:
where \mathcal{N^+} represents a truncated normal.