Hello everyone,
In essence, I am trying to make a massive model comparison job tractable.
I have a model “myModel” that I want to describe some data y. I want the model to behave in a piecewise fashion, changing its behaviour between breakpoints defined in u. However, I do not know in advance how many breakpoints there are (this is part of what I am interested in, as well as their position).
Hence, I have an integer parameter n (the number of breakpoints) that can be any value between 1 and max_n. As n is the number of breakpoints the value of the breakpoints in u are only relevant if they are the first n values.
As n is an integer it has to be marginalized out, hence the sum over the log_likelihoods.
My question is whether this is a valid way of having the values of u conditionally used (based on n) or if this is going to cause havoc with sampling and convergence because the values of u at indexes lower than n are not going to have any impact on the likelihood yet they have been sampled.
data {
int<lower=1> max_n; // Maximum possible value of n
int<lower=1> n_data; // Number of data points
vector[n_data] y; // Observed data
}
parameters {
vector[max_n] u; // Independent uniform samples for the maximum n
vector[max_n] full_vector;
}
transformed parameters {
real log_lik[max_n]; // Log-likelihood for each possible n
for (int n = 1; n <= max_n; n++) {
for (int i = 1:n_max) {
if (i <= n) {
full_vector[i] = u[i];
} else {
full_vector[i] = 301; // Placeholder for infinity
}
}
// Compute likelihood given the data and full_vector
// Normal likelihood with mean as that is my model that takes vector u
log_lik[n] = 0; // Reset
for (int i = 1:n_data; i++) {
log_lik[n] += normal_lpdf(y[i] | myModel(u), sigma); // Example likelihood
}
}
}
model {
// Uniform prior on the parameters u
u ~ uniform(0, 1);
sigma ~ uniform(0, 10)
// Marginalize over all possible n values using the likelihood
target += log_sum_exp(log_lik);
}