Continuing the discussion from A different kind of non-central hypergeometric distribution?:
I generate data for a “Sequential Categorical Model” as follows:
(in python)
import numpy as np
num_skus = 10
true_weights = softmax(np.random.normal(0,1,size=(num_skus,)))
num_selected_events = 100000
available_sku_indicies_this_selection = np.zeros((num_selected_events, num_skus),dtype=int)
number_available_skus_this_selection = []
selected_indicies_array = []
number_times_available = np.zeros((num_skus,),dtype=int)
number_times_selected = np.zeros((num_skus,),dtype=int)
for i in range(num_selected_events):
# pick number of skus that are available - at least 2
n = max(2,int(num_skus*np.random.beta(20,10)))
number_available_skus_this_selection.append(n)
# pick which skus are available
skus = np.sort(np.random.choice(num_skus,n,replace=False))
number_times_available[skus] += 1
available_sku_indicies_this_selection[i,:n] = [x+1 for x in skus]
# reweight probabilites
p = softmax(true_weights[skus])
s = np.random.choice(skus,p=p)
number_times_selected[s] += 1
selected_indicies_array.append(np.where(skus==s)[0][0]+1)
prior_vector = np.log(number_times_selected/np.sum(number_times_selected))
which selects a subset of the 10
items and renormalizes the true_weights
to determine the selection probability.
With this data
stan_data = {
'num_skus': num_skus,
'num_selected_events': num_selected_events,
'available_sku_indicies_this_selection': available_sku_indicies_this_selection,
'number_available_skus_this_selection': number_available_skus_this_selection,
'selected_indicies': selected_indicies_array,
}
and the following Stan model:
data {
int<lower=1> num_skus;
int<lower=1> num_selected_events;
int<lower=0, upper=num_skus> available_sku_indicies_this_selection[num_selected_events, num_skus]; // padded with zeros
int<lower=1> number_available_skus_this_selection[num_selected_events];
int<lower=0,upper=num_skus> selected_indicies[num_selected_events]; //padded with zeros
}
parameters {
vector[num_skus] log_weights;
}
model {
log_weights ~ std_normal();
for (n in 1:num_selected_events) {
target += categorical_logit_lpmf(
selected_indicies[n] |
log_weights[available_sku_indicies_this_selection[n,1:number_available_skus_this_selection[n]]]
);
}
}
everything fits fine, but the weights that are recovered are not the generative true_weights
but instead
softmax(prior_vector)
.
Can anyone see a discrepancy in the model? I am doing inference on the exact data generating process, but not recovering the true_weights
is this a problem with the model, or something else?