This is a silly question, but I have tried different reparametrizations and priors without any success.
I have an ordered data vector as an input. This ordered data vector is scaled by an unknown parameter
theta. When I give Stan the true value of
theta, I can correctly infer the parameters
T associated with the distribution of the unscaled data vector(divided by
vector<lower = 0.0>[n] sorted_vector_scaled_by_theta;
//log_theta ~ uniform(1, 1000);
//target += -log_theta;
target +=log_likelihood(n, delta, T, (1.0/theta)* sorted_vector_scaled_by_theta);
Where log_likelihood is custom likelihood function of sorted_vector_scaled_by_theta given delta, n and T.
I have tried different priors for theta: uniform, log uniform, log normal,… but none of those can recover the true values of delta and T. Can anyone help me with this modeling problem?.
P:D: I have thought to model
log(sorted_vector_scaled_by_theta) instead and then every element is shifted in the same value:
Assuming that domain expertise does not permit you to place a highly informative prior on
theta, the solution almost surely depends on what your function
log_likelihood is. Perhaps when
theta is a free parameter the model is not well identified?
log_likelihood is just a loop over the unscaled
sorted_vector_scaled_by_theta(that is why I divided by
theta before passing the vector). The funny thing is that the model runs and converges:
Inference for Stan model: stan_model_one_population_theta.
4 chains, each with iter=5000; warmup=2500; thin=1;
post-warmup draws per chain=2500, total post-warmup draws=10000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
hyper_parameter 229.38 5.70 424.95 0.95 16.80 68.38 248.46 1446.66 5562 1
delta 0.02 0.00 0.05 0.00 0.00 0.01 0.03 0.16 4297 1
theta 7.84 0.01 0.67 6.58 7.38 7.82 8.28 9.24 4947 1
T 56.24 2.71 167.52 5.31 12.46 23.17 48.05 320.94 3831 1
lp__ -87.94 0.02 1.47 -91.70 -88.64 -87.59 -86.86 -86.18 3691 1
Samples were drawn using NUTS(diag_e) at Tue May 4 16:36:48 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
Bu the values are wrong, from my true simulated data theta=200, delta =100, T=0.042.
If I use other priors choice I get parameter value like 10^30 and Rhat of Nan.
Does this imply non identifiability?
I still don’t understand what
log_likelihood is doing.
In general, if changing the priors causes huge swings in your inference, it suggests that you might have an identifiability problem.
Maybe I can illustrate the sort non-identifiability I’m talking about with a very simplified example. Suppose we have a data vector
x. And suppose that we wish to model
x as normally distributed with mean
mu and standard deviation
sigma, where both
sigma are to be estimated from the data:
mu ~ ... // prior for mu
sigma ~ ... //prior for sigma
x ~ normal(mu, sigma);
But now suppose that instead of data
x I have data
x_scaled, which are scaled by some unknown multiplicative constant
real<lower = 0> inv_theta; // I parameterize by the inverse of theta to
// avoid a nontrivial Jacobian adjustment
x = x_scaled*inv_theta;
inv_theta ~ ... // prior for inv_theta
mu ~ ...
sigma ~ ...
x ~ normal(mu, sigma);
Notice that for any value of
inv_theta, I can get a likelihood equivalent to the previous model by scaling both
inv_theta. The likelihood is completely agnostic about what value of
inv_theta I pick. ALL of the information in the model about the correct value to choose for
inv_theta is contained in the priors on
inv_theta. The data have absolutely nothing to say about this choice. And so for this model, you would never expect to be able to infer the correct values of
inv_theta from data. Maybe something similar is going on in your model, but again it depends on the actual likelihood function that you’re using.
I fully understand your example: if
x ~ normal(mu, sigma) then this is equivalent to
x ~ (1.0/inv_theta) *normal(inv_theta*mu, inv_theta*sigma). In my case, it is more complex since there is a one-to-one function g(.) such that
g(sorted_vector_scaled_by_theta[i+1])-g(sorted_vector_scaled_by_theta[i])~exponential(.). I will check if there is a change of variables that produces the same likelihood(non identifiable problem.) Thanks.