Extract pointwise log-likelihood from a user defined Stan loglikelihood function

Hi,

How can we extract the point wise log-likelihood for loo package from a user defined log likelihood function in stan. I already can do sampling from my user defined function. If we have written a function for log likelihood and why model output does not save the point wise loglikelihood values?. Can not we save point wise loglikelihood by using target += function_lpdf()? or do we need to write the generated quantities block in out stan model to collect the point wise log likelihood?.

Thanks.

1 Like

The latter. You still need to use generated quantities to save the pointwise contributions to the log likelihood because the target syntax doesn’t result in anything getting saved, it just increments the total log probability computed in the model block.

1 Like

Thanks @jonah . If my function it self calculate the loglikelihood and out put the final answer, how I can save point wise estimates in generated quantiles block. here is my function:

functions{

  real tenismodel_lpdf(matrix dat, vector[] theta, vector[] alpha){

  vector[rows(dat)] prob;
  real  t;
  real  x;
  real out;
  real a;
  real f;
  real w;
  real r;
  real m;
  
    a <- theta[1, 1];
    f <- theta[1, 2];
    w <- alpha[1, 1];
    r <- alpha[1, 2];
    m <- alpha[1, 3];
    
  for (i in 1:rows(dat)){
    
    t <- dat[i, 1];
    x <- dat[i, 2];

    
    if(x == -1){
      
      prob[i] <- f; 
      
    }else if(t == 1){
      
      prob[i] <- a^x * f^(1 - x);

    }else if(fmod(t, 2) == 0){
      
      prob[i] <- ((1- a - f) * (r)^(t/2.0 - 1.0) * (r)^(t/2.0 - 1.0) * (m)^x * (w)^(1-x)); 
    
    }else {
       prob[i] <- ((1- a - f) * (r)^(t/2.0 - 0.5) * (r)^(t/2.0 - 1.5) * (m)^(1 - x) * (w)^x);
    }
  }

  out <- sum(log(prob));
  return out;

 }
}


You would loop over all the observations like in the example here:

But instead of the bernoulli_logit_lpmf function used in that example you would use your log likelihood function

3 Likes

Hi @jonah , I think I fixed the error. Thanks for your help.

1 Like

Hi @jonah I have a small question related to this. When we save point wise likelihood while running our model, does it takes more time than without saving them?. For me it takes long time.

Yeah unfortunately it can take longer because of the extra computations.

1 Like

Hi @jonah ,

Yes, it does. I have around 1 million data, so getting log likelihood at each iteration is computer intensive. I am curious why stan does not output any statistics that we can use for model comparison?. As far as I read we do not have any statistics like WAIC to compare different models.

The loo package, which is part of the Stan project and part of the Stan ecosystem in R, provides this functionality, but in general it is recommended to use PSIS-LOO rather than WAIC for model comparison based on out-of-sample predictive performance.

1 Like

Yes, I read about Loo package. But in-order to use that I need to get the point wise log likelihood. In my problem, this is highly computer intensive. It will take log time to collect point wise log likelihood. Instead of that, do you think we can use “lp__” statistic provide in model fit in rstan output?

No. You need the pointwise log likelihood. Note that if your model block builds the target density by evaluating the log likelihood pointwise, then recomputing the pointwise log likelihood in generated quantities is comparatively a fairly trivial amount of computation assuming it’s coded as efficiently as the model block. In the model block, the log-likelihood needs to be evaluated and autodiffed hundreds of times per iteration. The generated quantities don’t need to be autodiffed and they execute one per iteration rather than every leapfrog step.

With that said, there might be scenarios where Stan provides vectorized (or even GPU-optimized) functions to compute the target that are faster than the equivalents that you’ll need to use in generated quantities.

3 Likes

Thanks @jsocolar . I will look into this.

As @jsocolar says, PSIS-LOO is recommended over WAIC. Having said that, I think there are special cases where they can give pretty much comparable results for model comparison, as described in the last paragraph in part 21 Cross-validation FAQ
However, I think you would need to make sure that you had many times more observations than parameters. In my limited experience with larger data (>200,000 rows), when this is the case, the two methods yield comparable results (in my particular case, for a variety of hierarchical logistic regression models, WAIC was pretty close to -2*elpd_loo; but that is just one example dataset). Another problem with WAIC though, is that it doesn’t give you good diagnostics for when it fails. It fails silently. It’s also worse for more flexible models. For those reasons, I have always just thought it was better to use LOO, even with large amounts of data.

2 Likes

@jd_c thanks for this explanation. This helps me. In my case I do have tons of data than parameters. Anyways I need to collect point wise log likelihood to do the comparison.

I agree with everything @jd_c and @jsocolar said. Here are a few other things to consider if you’re really stuck:

2 Likes

@jonah thank you very much for this explanation. Actually I was thinking about my problem. I defined a user define log likelihood function. But this function returns sum(log(prob)). So can not I save point wise loglikelihood inside my defined log likelihood function in stan?. Then I can return them along with summation of all. Is this possible.

This is my function (I removed middle part to make it simple):


functions{

  real model_lpdf(matrix dat, vector[] theta, vector[] alpha){
    
 for (i in 1:rows(dat)){
    
    t <- dat[i, 1];
    x <- dat[i, 2];
    s <- ServerID[i];
    re <- ReceiverID[i];

    if(x == -1){
      
      prob[i] <- f; 
      
    }else if(t == 1){
      
      prob[i] <- a^x * f^(1 - x);

    }else if(fmod(t, 2) == 0){
      
      prob[i] <- ((1- a - f) * (rr)^(t/2.0 - 1.0) * (rs)^(t/2.0 - 1.0) * (mr)^x * (wr)^(1-x)); 
    
    }else {
       prob[i] <- ((1- a - f) * (rr)^(t/2.0 - 0.5) * (rs)^(t/2.0 - 1.5) * (ms)^(1 - x) * (ws)^x);
    }
  }

  out <- sum(log(prob));
  return out;

 }
}

1 Like

Those values need to be saved, which means using a transformed parameter or generated quantity. So even if you have a function that returns the pointwise values in the model block you’d unfortunately still need to call it again in transformed parameters or generated quantities (preferably the latter because of what @jsocolar said about autodiff).

But I think with 1 million data points you’re probably going to have memory issues when subsequently running loo() on the pointwise loglikelihood matrix with dimensions posterior_sample_size x 1000000. So you might need to use one of the other options I mentioned in my previous post.

2 Likes

@jonah Thank you very much. Sure, I will look into them. I should find a way to store them without taking long time. Thanks again.

1 Like