Yes this clarifies a lot. As expected, it doesn’t make things simpler haha.

From your explanation, I’m distilling the following unobserved variable: the true stock available for the wholesale buyer to buy. This poses a challenge: modelling an integer latent variable requirea integrating over all potential states of thentlatent variable, which is expensive if there are many latent states. And in this case, there are many many many, I imagine even for a modest total available stock N. In Stan you can’t integrate by sampling this latent state, but you have to considtr each potential state and assign some probability to it, either based on a prior or a more elaborate model. So that’s the first challenge.

The second challenge is that you then want to have a model for 1) the wholesales buyer’s / fruit stand salesman’s preference for buying fruit types, conditional on available stock, and 2) the customers’ preference conditional on what the wholesale buyer / fruit stand has on offer. Now, if I understand correctly, you know exactly what the wholesale buyer / fruit stand bought. So far so good.

Then third, you want to make inference about the wholesale buyer’s / fruit stand salesman’s preference or strategy for buying fruit types to depens on sales during the last two weeks. If I understand correctly, the actual (true) sales are also known. That would be great so as not to have also integrate over all possible states of some unknown sales pattern.

Have I got it right so far? If you want to pursue this, I’d start “simple” with a toy model assuming you know the true stock available to the fruit stand, code up everything from that point forward in the sales process. And if that makes sense, relax the assumption of knowing the true stock by integrating over all potential stock configurations. I guess the latter will be very computationally expensive.

Because preferential buying is involved when the fruit stand stocks, you can’t use a multivariate hypergeometric distribution, as it allows no preference (it’s purely random). But it should be possible to generalise this by weighting certain types of fruits by means of a simplex vector from a dirichlet distribution. You’d have to figure out the PMF yourself though. I wouldn’t be surprised if it turns out to be very similar to a compound dirichlet-multinomial distribution.

I now also realise that this last step in the sales process (ie customers buying) does not have any parameters because you basically have perfect knowledge of the situation. The only role of the data on sales to customers is to inform the fruit stands patterns in stocking preferences, correct?