Custom Likelihood function with hierarchical model

Hi, I’m trying to run a hierarchical model. The model involves fitting a line to a set of observed x and y data. In this case x is not a predictor variable, but x and y covary linearly (their errors are assumed to be independent). The likelihood function and number of parameters for this problem can be significantly reduced with some analytical integration (see https://bayes.wustl.edu/etj/articles/leapz.pdf). As such, I want to have my model use a custom likelihood function.

The idea is that my data should predict some group level ‘intensity’ parameter (the slope of the line multiplied by a constant). This is informed by the individual level parameters i.e. I_{individual} ~ StudentT(\nu, \mu=I_{site}, \sigma).

My code runs but gives completely incorrect results for a test dataset. I think this might have to do with the syntax I’ve used. The line:
likelihood_lpdf(S2xx|S2xy,S2yy,xbar,ybar,n,vari,theta,b_scaled,p_fail)
isn’t really what I want it should be more like
likelihood_lpdf(S2xx,S2xy,S2yy,xbar,ybar,n,vari|theta,b_scaled,p_fail)
but this isn’t good syntax in stan.

I’m also not entirely sure that I’m dealing with correctly in the likelihood function itself. Technically each of the elements in the vector should be a different parameter in the hierarchical model. I’m trying to vectorize the problem by taking each of these parameters and taking the product of the resulting vector of log probabilities (i.e. the probability of all of these parameters at the same time given the data should be equivalent to an AND statement). My code is below:


functions {
  real likelihood_lpdf(data vector S2xx, data vector S2xy, data vector S2yy, data vector xbar, data vector ybar, data vector n, data vector vari, vector theta, vector b, vector p_fail) {
    return log(sqrt(dot_product((1 - p_fail) .* exp(-(n ./ (2 * vari)) .* ((S2xx .* sin(theta) .* sin(theta)) - (2 * S2xy .* cos(theta) .* sin(theta)) + (S2yy .* sin(theta) .* sin(theta))) + (cos(theta) .* (b - ybar + xbar .* tan(theta)) .* (b - ybar + xbar .* tan(theta)))) + p_fail * 0.00001, (1 - p_fail) .* exp(-(n ./ (2 * vari)) .* ((S2xx .* sin(theta) .* sin(theta)) - (2 * S2xy .* cos(theta) .* sin(theta)) + (S2yy .* sin(theta) .* sin(theta))) + (cos(theta) .* (b - ybar + xbar .* tan(theta)) .* (b - ybar + xbar .* tan(theta)))) + p_fail * 1e-12)));
  }
}
data {
    int<lower=1> N;
    vector[N] S2xx;
    vector[N] S2yy;
    vector[N] S2xy;
    vector[N] xbar;
    vector[N] ybar;
    vector[N] B_lab;
    vector[N] vari;
    vector[N] NRM0;
    vector[N] n;
}

parameters {
    vector<lower=-pi()/2,upper=pi()/2>[N] theta;
    vector<lower=0,upper=2>[N] b;
    vector<lower=0,upper=1>[N] p_fail;
    real <lower=-pi()/2,upper=pi()/2> theta_site;
}

transformed parameters {
    vector[N] intensity = -tan(theta) .*B_lab;
    vector[N] b_scaled = b .* NRM0;
    real intensity_site= -tan(theta_site) * mean(B_lab);
}

model {
    intensity ~ student_t(3,intensity_site,6);
    target += likelihood_lpdf(S2xx|S2xy,S2yy,xbar,ybar,n,vari,theta,b_scaled,p_fail);
}

On the syntax, the conditional notation requirement is just notation - it has no effect on calculation. It occurs when you name a function _lpdf. If you don’t like this notation, change the function name.

I can’t penetrate what you’re trying to do with the likelihood calculation but you say:

While it is (imho) far more likely that you have made a mistake in your likelihood function, it is possible that you are not getting the expected results due to computational issues. Recall that an AND statement in probability space is multiplication, however when using log_probabilities AND is addition because log(ab)=log(a)+log(b). Similarly you might be helped computationally by recalling log(sqrt(c))=0.5log(c). When you reformulate in terms of log_probabilities, it is also a really good idea to look at the built-in functions in the STAN reference manual that computationally optimize some common operations (“composed functions”), like lmultiply: log(sqrt(c))=0.5*log(c)=lmulitply(0.5,c)