Using _lupmf for multivariate likelihood in reduce_sum

Hello all. So I see that Stan doesn’t allow for usage of _lupmf functions outside of the model block or outside of a user defined probability distribution function.

My issue is that I am trying to use multinomial_lupmf in a reduce_sum but I need to pass in an int [,] as my first parameter (to be able to loop over the integer observables), but any user defined _lpmf function needs type int [].

Is there a way around this?

My example follows:

real partial_sum_lpmf(
    int[ , ] selected_slice,
    int start, 
    int end,
    matrix eta
    )
    {
    real ret_val = 0;
    for (n in start:end) {
      ret_val += multinomial_lupmf(selected_slice[n-start+1] | softmax(eta[n]'));
    }

    return ret_val;
  }

and I would call it in the model block as
target += reduce_sum_lpmf(partial_sum, observed, grainsize, eta)

1 Like

Hey,

you need to call it as

target += reduce_sum(partial_sum_lupmf, observed, grainsize, eta);

and I would just a dummy int[] for the slice argument and pass in the int[,] as some of the other arguments.

1 Like

Sorry yeah that was a typo (how I called it).

Thanks for the tip! Also, I can pass it in as data does that help speed anything up (in terms of the autodiff tape?)

Do you mean for the dummy variable? For any unused args it makes no difference in terms of the AD tape. It also does not make a difference for int, int[], int[,], … as those are never treated as autodiff.

1 Like

Could you clarify what this code would look like? I don’t fully understand the example. Thanks.

Yes, the more variables you make primitives (aka data), the faster the autodiff will be. The general rule is that transformed parameters that only depend on data and transformed data should be defined in the transformed data block.

1 Like

I’m trying to implement this example and I’m getting a weird error telling me I am missing a parentheses:

  real partial_sum_lpmf(int[,] selected_slice, int start, int end, matrix ptilde ){
    real ret_val = 0 ;
    for (n in start:end) {
      ret_val += multinomial_lupmf( selected_slice[n-start+1] | ptilde[n] ) ;
    }  
    return (ret_val) ;
  }
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'model213066584665_space_partitioning_SDM_v4' at line 5, column 34
  -------------------------------------------------
     3:     real ret_val = 0;
     4:     for (n in start:end) {
     5:       ret_val += multinomial_lupmf(selected_slice[n-start+1] | ptilde[n]) ;
                                         ^
     6:     }
  -------------------------------------------------

PARSER EXPECTED: "("
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'space partitioning SDM v4' due to the above error.

Using the latest cmdstanr (2.30), I get this more informative error message:

     3:      real ret_val = 0 ;
     4:      for (n in start:end) {
     5:        ret_val += multinomial_lupmf(selected_slice[n-start+1] | ptilde[n] ) ;
                          ^
     6:      }  
     7:      return (ret_val) ;
   -------------------------------------------------

Ill-typed arguments supplied to function 'multinomial_lupmf':
(array[] int, row_vector)
Available signatures:
(array[] int, vector) => real
  The second argument must be vector but got row_vector

The fix is to transpose (I’ve also converted to our standard formatting).

functions {
  real partial_sum_lpmf(int[ , ] slice, int start, int end, matrix ptilde) {
    real ret_val = 0 ;
    for (n in start:end) {
      ret_val += multinomial_lupmf(slice[n - start + 1] | ptilde[n]') ;
    }  
    return ret_val;
  }
}

If possible, declaring ptilde as an array of vectors rather than a matrix would be more efficient because it doesn’t require the transposition.

It’s too bad our multinomial_lupmf isn’t vectorized yet, or this could be coded as a one-liner.

2 Likes