This is a silly question, but I have tried different reparametrizations and priors without any success.
I have an ordered data vector as an input. This ordered data vector is scaled by an unknown parameter theta
. When I give Stan the true value of theta
, I can correctly infer the parameters delta
and T
associated with the distribution of the unscaled data vector(divided by theta
).
data{
int<lower=2> n;
vector<lower = 0.0>[n] sorted_vector_scaled_by_theta;
}
parameters{
real<lower=0.001> theta;
real<lower=0> hyper_parameter;
real<lower=0> delta;
real<lower=0>T;
}
model{
theta~exponential(10);
//log_theta= log(theta);
//log_theta ~ uniform(1, 1000);
//theta~lognormal(20, 100);
//log_theta~student_t(3,1,3);
//target += -log_theta;
hyper_parameter~gamma(0.001, 0.001);
delta~exponential(hyper_parameter);
target +=log_likelihood(n, delta, T, (1.0/theta)* sorted_vector_scaled_by_theta);
}
Where log_likelihood is custom likelihood function of sorted_vector_scaled_by_theta given delta, n and T.
I have tried different priors for theta: uniform, log uniform, log normal,… but none of those can recover the true values of delta and T. Can anyone help me with this modeling problem?.
P:D: I have thought to model log(sorted_vector_scaled_by_theta)
instead and then every element is shifted in the same value: log(theta)
Assuming that domain expertise does not permit you to place a highly informative prior on theta
, the solution almost surely depends on what your function log_likelihood
is. Perhaps when theta
is a free parameter the model is not well identified?
1 Like
The function log_likelihood
is just a loop over the unscaled sorted_vector_scaled_by_theta
(that is why I divided by theta
before passing the vector). The funny thing is that the model runs and converges:
Inference for Stan model: stan_model_one_population_theta.
4 chains, each with iter=5000; warmup=2500; thin=1;
post-warmup draws per chain=2500, total post-warmup draws=10000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
hyper_parameter 229.38 5.70 424.95 0.95 16.80 68.38 248.46 1446.66 5562 1
delta 0.02 0.00 0.05 0.00 0.00 0.01 0.03 0.16 4297 1
theta 7.84 0.01 0.67 6.58 7.38 7.82 8.28 9.24 4947 1
T 56.24 2.71 167.52 5.31 12.46 23.17 48.05 320.94 3831 1
lp__ -87.94 0.02 1.47 -91.70 -88.64 -87.59 -86.86 -86.18 3691 1
Samples were drawn using NUTS(diag_e) at Tue May 4 16:36:48 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Bu the values are wrong, from my true simulated data theta=200, delta =100, T=0.042.
If I use other priors choice I get parameter value like 10^30 and Rhat of Nan.
Does this imply non identifiability?
I still don’t understand what log_likelihood
is doing.
In general, if changing the priors causes huge swings in your inference, it suggests that you might have an identifiability problem.
Maybe I can illustrate the sort non-identifiability I’m talking about with a very simplified example. Suppose we have a data vector x
. And suppose that we wish to model x
as normally distributed with mean mu
and standard deviation sigma
, where both mu
and sigma
are to be estimated from the data:
data {
int<lower=0> N;
vector[N] x;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
mu ~ ... // prior for mu
sigma ~ ... //prior for sigma
x ~ normal(mu, sigma);
}
Easy enough.
But now suppose that instead of data x
I have data x_scaled
, which are scaled by some unknown multiplicative constant inv_theta
.
data {
int<lower=0> N;
vector[N] x_scaled;
}
parameters {
real mu;
real<lower=0> sigma;
real<lower = 0> inv_theta; // I parameterize by the inverse of theta to
// avoid a nontrivial Jacobian adjustment
}
model {
vector[N] x;
x = x_scaled*inv_theta;
inv_theta ~ ... // prior for inv_theta
mu ~ ...
sigma ~ ...
x ~ normal(mu, sigma);
}
Notice that for any value of inv_theta
, I can get a likelihood equivalent to the previous model by scaling both mu
and sigma
by inv_theta
. The likelihood is completely agnostic about what value of inv_theta
I pick. ALL of the information in the model about the correct value to choose for inv_theta
is contained in the priors on mu
, sigma
, and inv_theta
. The data have absolutely nothing to say about this choice. And so for this model, you would never expect to be able to infer the correct values of mu
, sigma
or inv_theta
from data. Maybe something similar is going on in your model, but again it depends on the actual likelihood function that you’re using.
1 Like
I fully understand your example: if x ~ normal(mu, sigma)
then this is equivalent to x ~ (1.0/inv_theta) *normal(inv_theta*mu, inv_theta*sigma)
. In my case, it is more complex since there is a one-to-one function g(.) such that g(sorted_vector_scaled_by_theta[i+1])-g(sorted_vector_scaled_by_theta[i])~exponential(.)
. I will check if there is a change of variables that produces the same likelihood(non identifiable problem.) Thanks.