Writing an "Explainable" Bernoulli model?

Hello Stanimals! I have a generic question I am trying to solve.

I have a retail dataset that consists of client-item pairs and whether or not the client purchased the item (Bernoulli). Furthermore, I have k types of feedback from each purchase:

  • price loved, price okay, price too high
  • fit too small, fit okay, fit perfect
  • not my style, style okay, loved style,
  • etc.

the typical feedback is: poor, okay, great.

So my question/ask is: I would like to build an “explainable” model that not only tells me the client-item Bernoulli probability of sale, but why they might have/have not purchased it.

I was thinking something like:

target += Bernoulli(sold| \theta[client-item_s])
target += Bernoulli(not_sold|1-\theta[client-item_ns])
target += Dirichlet(\theta[client-item_s] | \prob_price_s, \prob_fit_s, \prob_style_s,...)
target += Dirichlet(1-\theta[client-item_ns] | \prob_price_ns, \prob_fit_ns, \prob_style_ns,...)
target += Bernoulli(feedback_price[client-item_s] | \prob_price_s)
target += Bernoulli(feedback_price[client-item_ns] | \prob_price_ns)
...

where the _s and _ns suffixes refer to “sold” and “not sold”, respectively. Here, I decomposed the data into sold and not sold events and partitioned the client-item pairings appropriately,

I am modeling the feedback above separately: that is, we expect for a non purchase that the feedback for price is either “poor” or “good” and we would bucket them accordingly. Similarly for the other feedbacks.

The full model will also learn embeddings for the client and items so as to allow for unseen items and clients – this will help estimate if a client would buy an item in the future.

My question is: does this look right? I can’t remember ever seeing a Dirichlet used like this before, but it kinda looks like a multivariate Beta-Binomial but trying to “explain” things.

Of course, there will be priors on the prob_price, prob_fit, ...

Any ideas would be super helpful. This will be a HUGE model to fit, so getting this generative part completed will go a long way to helping implement it.

Thanks!

The pseudocode is inconsistent, so I’m not sure what you intend. For example, the Dirichlet distribution is over simplexes, but the Bernoulli distribution requires a probability, so I can’t figure out what you mean by these two lines:

target += Bernoulli(sold| \theta[client-item_s])
target += Dirichlet(\theta[client-item_s] | \prob_price_s, \prob_fit_s, \prob_style_s,...)

Also, the Dirichlet takes a single vector as the parameter, but you’re providing variadic arguments of unclear length.

I assume client-item_ns is not intended to be a subtraction even though that’s what you wrote?

Why isn’t the probability of not selling equal to one minus the probability of selling? I don’t get what you mean by these two lines:

target += Bernoulli(sold| \theta[client-item_s])
target += Bernoulli(not_sold|1-\theta[client-item_ns])

If you only have the feedback for purchased items, I don’t see how you can fit the model. Do you have any feedback on items that were not purchased?

Also, rather than treating each purchase decision as Bernoulli, don’t you want to link them somehow? If I’m buying a new phone, I don’t make a bunch of independent Bernoulli decisions, I make one categorical decision.

Apologies I realize I did not explain this very well. Let me try again:

I have data corresponding to each client item pair.
I have if it was sold (0 or 1)
regardless of if it was sold I have feedback corresponding to if

  • the price was too high or not
  • it fit correctly or not
  • the quality of the clothing was okay or not
  • the size was correct or not
  • the style was appropriate or not

I have a set of client features for each client and a set of features for each item.

I would like to determine both:
the probability that a client would purchase an item
“explain” why (or why not) they would purchase it.

For the first part I will use Bernoulli (since I want granular probabilities at the client item pairing). For the latter I want to determine “why” they buy it or not based on a “mixture” of probabilities

Like: the probability that client c will purchase item i is 62%, and that is made up of

  • price = 34%
  • size = 12%
  • quality = 40 %
  • fit = 8%
  • style = 6%

from that I surmise that it is a high quality item that is appropriately priced.

Upon fitting the model I should be able to estimate the probability of sale and the explanation for it for client item pairings that weren’t in the original dataset (provided the client and the item themselves were in the training set),

Does this seem reasonable?

Sorry for not responding earlier—I just saw this now.

Did you mean something like a logistic regression based on the covariates you listed?

I don’t know how you’re going to get the contributions in the form of probabilities like this. If you use a logistic regression, as would be standard here, this isn’t quite the right interpretation because inv_logit(a + b) != inv_logit(a) + inv_logit(b).

Yeah the more I think about it the more I need to rethink about it!