CNN determining priors for Bayesian Hierarchical model

Hello all. At the well-known fashion designing company I work at, we currently use a CNN to determine which manufacturers on our database are the “best fits” to manufacture a given new design.

This was done by embedding the images of manufacturer’s clothing samples in a CNN. So now when we create a new design, we simply input an image of the new design, and the CNN outputs a manufacturer image that is the most similar to the input image. This is status quo in this industry.

NOW they have tasked me to expand this system to make the manufacturer recommendations more accurate. Besides the manufacturer’s sample images, we also have many other data columns (see attached CSV) such as minimum order requirement, average price per unit, geographic location, customization level ability, etc.

They want me to create a BHM that somehow uses the original CNN manufacturer prediction (and obviously the manufacturer’s associated data) as a “prior” to this BHM. In addition to our primary data (that is the same organization style as the attached sample data), we also have a MUCH larger (and much less) descriptive dataset of manufacturers that only share a fraction of the same information as the primary dataset (note: all manufacturer names in smaller dataset can be cross referenced to the larger and less descriptive dataset).

I have never made a BHM before, so i am struggling on deciding what hierarchy architecture is best for this task, and whats the best way/system to organize/determine the priors based on the output of an adaptable CNN. I have been considering so many different options, I am in a state of decision paralysis. I need your advice for what model architecture to go with. Note that i have been given complete freedom for how i design the model, so there is lots of flexibility to deviate.

But it must be a prediction system. The only other thing they require is that they be able to tweak exactly what they want for a given design such as order size, and preferred cost per unit, geographical region, amount of customization, etc. All things that can be extrapolated from the described data. All ideas are greatly appreciated.

The pipeline i am thinking right now is:
Input>CNN>postgres>BHM>Postgres(as vector to do nearest neighbor on manufacturer vector space)>reccomendation(the nearest neighbors)

manufacturers_data_06_10.csv (668.0 KB)

This sounds like a really interesting problem!

You mention you have total flexibility about model architecture but from how you describe the setup, it sounds like the decision to go with a BHM approach is already locked in.

Are you trying to solve only a prediction problem or attempting to identify causal inferences?

Is accurately quantifying the tails of the distribution (for example, estimating edge cases) the motivating business case?

What is it about the BHM approach that makes it the best way forward vs just throwing more hidden layers and GPUs at it? Is it anything more than upper management hearing that BHMs are more “accurate”? (I’ve been there and that’s not fun)

I ask all these questions as a neighborhood friendly bayesian, not trying to discourage, just trying to make sure our drug is the one you need before we dive in lol