Rstan - generate multinomial distributions (ragged vectors)


#1

I’m a complete novice in Rstan, and trying to generate multiple vectors from multinomial distributions. Since this should create ragged vectors, which is not supported in Rstan to my best knowledge, I bind all of them into one long vector.

Here’s my current R code to run the stan file,

    n <- 50
    numInst_train <- sample(10:20, size = n, replace = T)
     
    rstanfit <- stan(file = "rcode/MultiBayes.v1.stan",
                     data = list(n = n,
                                 numInst = numInst_train,
                                 m = sum(numInst_train)), 
                     chains = 1, 
                     iter = 10,
                     init = list(chain1 = list(hp_pi = unlist(rep(1 / numInst_train, numInst_train)),
                                               temp = rep(0, sum(numInst_train)))))

and this below is my stan file;

    data {
      // information of input
      int n; // the number of samples
      int numInst[n]; // the number of instances in a bag
      int m; // Total number of instances
    }
    parameters{
      vector[m] temp;
      vector[m] hp_pi;
    }
    model{
      int delta[m]; // indicator of primary instances
      int pos = 1;
      for(jj in 1:n){
        segment(delta, pos, numInst[jj]) ~ multinomial(segment(hp_pi, pos, numInst[jj]));
        pos += numInst[jj];
      }
      for(jj in 1:m){
        temp[jj] ~ normal(delta[jj], 1);
      }
    }

Therefore, I’ve got an error message that says

  Error evaluating the log probability at the initial value.
Exception: multinomial_lpmf: Number of trials variable[1] is -2147483648, but must be >= 0!  (in 'model630cdd604c_MultiBayes' at line 15)

I’m using R 3.5.0, Windows, and stan 2.17.0.

It would be appreciated if anyone gives me your input here.


#2

You are using delta in the model block before its elements have been filled in.


#3

Hi Ben Goodrich, I appreciate your reply! I’ve tried to declare delta in the parameter block, but it gives me an error since integer type variables cannot be defined in it. Therefore, if I declare delta in the parameter block and change it as vector type, then multinomial statement becomes not valid since integer ~ multinomial() is correct grammar. How do you think I can handle this?


#4

Your thought process is not consistent with Stan’s language / algorithms. The reason why you are not allowed in the Stan language to declare an integer unknown in the parameters block is because the NUTS algorithm requires that the posterior kernel be differentiable with respect to all the unknowns. You cannot differentiate with respect to an integer. So, trying to evade that error message by defining it as a vector (of real numbers) or defining it as an integer in the model block is not going to overcome the fact that it is impossible for NUTS to draw from the posterior distribution you have in mind.

The actual solution is to marginalize out the discrete unknowns so that the posterior distribution NUTS is drawing from actually is differentiable with respect to the remaining parameters. Then, if you want, you can draw from the full conditional distribution of the discrete unknowns in the generated quantities block. There is a whole chapter on this in the manual.


#5

That was really insightful advice. Thanks a lot!