Using Horseshoe priors for unknown censored variables


I have a supply chain model, where our historical data are sales from a store, S(t). We do not know how much inventory the store had at any given time, so we cannot compute the demand D(t) (which is what we want), we only know that D(t) >= S(t) for all t. Note, we do NOT know which of the sales were censored.

I was wondering: could I use a horseshoe prior on each point, denoting if it was censored? I guess my prior for them would be the belief that there wasn’t enough inventory to fulfill the demand on that day.

I think the actual prior would be time based: given that the previous sales were censored, there is a higher probability that the current sales would be, but that is another topic.




I think that to help you we would need to know more details about your situation. Does “demand” represent the number of items that would have been sold had the invetory been unlimited? Do you have some other data to ground your predictions?

In general I guess you would need to add some more knowledge about the situation to be able to infer demand - either more types of data (do you have inventory data for at least some stores?) or more assumptions (e.g. assume the demand curve has a certain shape or that it is correlated across all stores).

A good way to think about this is to imagine you want to simulate realistic data for your model - how would you do that? Once you have a simulator it is usually straightforward to move to Stan code. And simulators are also important to verify that your model works, so its not wasted effort.


So I came up with a super trivial example of what happens in practice (in python):

import numpy as np
import matplotlib.pyplot as plt
N = 100
mu = 5
tt = np.arange(N)
demand = np.random.poisson(mu,N)

Generates the “demand” of a product (random draws from a Poisson distribution). Now, we have some inventory such as

inventory = 2.5*mu - demand + replenishment

where replenishment is the new inventory we get every so often. Let us assume it is the mean of our sales every time period:

replenishment = mu * np.ones_like(demand)

Then “sales” will be the minimum of either demand or inventory, with a zero boundary, so

sales = np.maximum(np.minimum(demand,inventory),np.zeros_like(demand))

And if we plot the sales and demand on the same plot (depending on the draws from the Poisson), we see that, sometimes, there isn’t enough inventory to satisfy the sales for that time:image

Where the red dots denote demand, and the blue line is actual sales.

If we take the MLE estimate for each dataset (over time), we see that the estimates are different for both series:



Where (we hope) the estimate converges to 5 (our mean).

So my question is, could we use a sparse prior on each of the data to denote the probability that a sale for that time was censored?


I would like to add: we do NOT know the inventory series, nor the replenishment schedule. We only “see” the sales. But we want to get the mean for the “demand”.


Without strong priors, if you only know b and a > b, there’s not much you can say about a.

If the quantities are continuous, you can take a latent parameter a (for the true demand here) and define it as

real<lower = b> a;

If you have a whole array, you’ll have to find the manual description of dealing with varying interval bounds—we haven’t vectorized that construction in the langauge yet.


Not knowing the inventory leaves you with a tough row to hoe. The problem is that for any set of observations you can easily come up with an inventory-limited scenario and a demand-limited scenario that fit the data (almost) equally well. The only thing you have going for you is your assumption that demand is Poisson distributed, while inventory is deterministic. Under these assumptions, sales with Poisson-like variability suggest that demand is the limiting factor, while sales with much less variability suggest that inventory is the limiting factor.

If I were working on this problem, I’d consider two models, both with a quasi-Poisson likelihood. One of them would have a prior that demand is strictly less than inventory, and the dispersion parameter is close to 1 (or maybe strictly greater than 1). The other would have a prior that demand is strictly greater than inventory, and the dispersion parameter is close to zero. If you’re lucky, one of these models will fit a lot better than the other.

Of course, that still leaves a scenario in which inventory and demand are so well matched that the time series switches back and forth between inventory-limited and demand-limited regimes. Coming up with a model for that scenario would be tricky. Probably the best you can hope for is to do some simulations to at least come up with some diagnostics to alert you if that scenario might be plausible.


Thanks Bob. I guess, I would like to know if a > b. . If it is, then I could use that information (similar to how survival analysis uses censored information). Basically, the other option (naive) would be to drop the point (as I don’t want it contaminating my model).


Thanks, yeah I am operating along the same conclusions. I am now looking at LOO-CV analysis for each point and doing a “hacky” anomaly detection method: If three successive (or k successive) draws all have some low probability, then those 3 (k) will be censored.

Something like that. I will post when I have done more work.