How to perform basic operations in rstan?

Jonas_Bauer · March 3, 2022, 12:47pm

Hey all,
for a simple Multinomial-Dirichlet model I want to normalize data, but iteratively (i.e. in every iteration). However, I am already failing to tell R/Rstan data / constant appropriately. Since the multinomial_lpmf is defined for int’s only, I want to use the int operators descibes here: 2.1 Integer-valued arithmetic operators | Stan Functions Reference.
R however complains immediately that “operator” is unknown. In summary: How to apply simple arithmetic operations?

// The input data is a vector 'y' of length 'N'.
data {
  int<lower=0> N;
  int<lower=0> G;
  int counts[N,G];
  vector<lower=0>[G] alpha;
  real r;
  real Tstart;
}

// The parameters accepted by the model. Our model
parameters {
  simplex[G] theta;
}
transformed parameters {
 real T = Tstart;
 T = r*T;
}

// the model to be estimated
model {
   for(i in 1:N){
    
    target += dirichlet_lpdf(theta | alpha);
    target += multinomial_lpmf(int operator%/%(counts[i], T) | theta);
   //void print (T); 
  }
}

I have attached a csv that contains the data for counts. r and Tstart shall define a geometric sequence with T[j] = r^j * Tstart (or less costly by saying T = r * T in every iteration).

Bonus question: What about void print (T)? Can I print something within the execution of compiled stan files?
rna_data.csv (1.2 KB)

Your support is highly appreciated! Thanks in advance!

BG Jonas

andrjohns · March 3, 2022, 1:37pm

The int operator%/% notation in the documentation is just to indicate that %/% is an operator, rather than a function to be called. In other words, you just want to use x %/% y.

As for the model, the integer division operator is only compatible when both inputs are integers:

int %/% int

(i.e., type int), however in your model you’re attempting to call:

int[] %/% real

Where counts[i] is one-dimensional array of integers (int[]), and T is a real.

Also, I’m not entirely sure what you’re attempting when you refer to ‘normalising at each iteration’. It looks like at every iteration, you want to divide the integer responses for each observation (i.e., counts[i]), by Tstart * r). But none of these quantities are parameters, and so would not change at all during sampling. It would likely provide a pretty good speed-up of your model if you just pre-computed this division and passed it as data

Jonas_Bauer · March 3, 2022, 1:46pm

First of all, thanks for the loightning-fast answer.

Regarding x %/% y … I thought so. But even with counts[i] %/% 10 I get an “PARSER EXPECTED: <expression>” error. If it was typical normalization, you were of course right. But the idea is simulated anealing with a decrasing normalization to mitigate steep gradients (not for this particular data set though). Hence, I could round the T to an integer if the issue above is clarified.

andrjohns · March 3, 2022, 2:07pm

When you call counts[i] %/% 10 you’re still not using a valid signature, unfortunately the %/% operator is not vectorised, so you’ll have to loop over each integer in counts[i]:

model {
  target += dirichlet_lpdf(theta | alpha);

  for(i in 1:N) {
    int counts_normalised[G];
    for(g in 1:G) {
      counts_normalised[g] = counts[i, g] %/% 10;
    }
    target += multinomial_lpmf(counts_normalised | theta);
  }
}

Note that I’ve also moved the dirichlet_lpdf outside of the loop, as including it within the loop is essentially weighting the posterior N times.

Just to also quickly clarify, the key thing here is that T is constant throughout all iterations (in this model at least), so there is no benefit to performing the division in the transformed parameters/parameters blocks.

If this more of a dummy model, and you’re intending to construct T using something declared in the parameters block, then you’re likely to run into a different set of issues. Taking a continuous parameter and discretising it to return an integer would result in discontinuities in the respective parameters, and result in seriously poor sampling performance

andrjohns · March 3, 2022, 2:21pm

Forgot to include a reference for the issues with sampling! There’s some more information in this section of the manual: 3.7 Step-like functions | Stan Functions Reference

Jonas_Bauer · March 3, 2022, 4:02pm

Okay, this helps a lot! I thought it is vectorized. Is there any other option to avoid the loop?
E.g. normalized = floor(count ./ T) and defining count as matrix before?
This way temperature could be a real defined in parameters and I do not have to discretize it (referring to your warning; here count and normalized represent data and I would say it does not affect the gradients/performance). Below you see what I got so far (btw I still get the error PARSER EXPECTED: <expression>).

One last question until I would happily close this topic: How can I create an iterative (more precisely geometric) parameter adjustment? For instance, can I initialize real temperature = Tstart and then declare in every iteration something like temperature *= r? If not, is there an accessible iteration variable, e.g. i, that counts how many samples have been drawn? It is so embarassing to ask this, but I am not even able to define foo += 1 myself in stan.

data {
  int<lower=0> N;
  int<lower=0> G;
  int counts[N,G];
  vector<lower=0>[G] alpha;
  real<lower=0, upper=1> r; 
  int<lower=1> Tstart; 
}

// The parameters accepted by the model.
parameters {
  simplex[G] theta;
}

transformed parameters {
 //matrix[N,G] normalized;
 //normalized = to_matrix(counts) / T;
 real temperature = r * Tstart;
}

// the model to be estimated
model {
  
  for(i in 1:N) {
    int normalized[G];
    for(g in 1:G) {
      normalized[g] = counts[i,g] %/% 1;
    }
    target += multinomial_lpmf(normalized | theta);
  }
  target += dirichlet_lpdf(theta | alpha);
}

andrjohns · March 3, 2022, 4:44pm

Unfortunately there is no way to access the iteration number within a Stan model.

For your model, the simplest approach to constructing the normalised integer outcomes is to perform the count %/% (Tstart * r) division in R and pass the result as an additional argument to your stan model

Jonas_Bauer · March 4, 2022, 4:23pm

Thanks for all your support. I will leave it like that.
Just to make this clear: For the model gescribed above, normalizing beforehand might work. But I require iteration-specific normalization. Your suggestion would mean:

Compile the stan code
Normalize in R
Run one iteration using rstan
Do 2. and 3. roughly 10.000 times
This is madness.

I try now to implement my own function

real geometric_rng(real temperature,real r){
    return(temperature*r);
  }

and apply it iteratively on the parameter temperature in generated quantities.

Anyway, the marked solution still leads to an parsing error for me. Even without any temperature (trying to divide by 10). Are you sure your recommendation works?

Topic		Replies	Views
Import dataset with NA column as an integer in rstan Modeling rstan	3	397	August 16, 2022
SYNTAX PROBLEM: modifying data in model block or define int in transformed parameter block Modeling rstan	2	440	June 16, 2022
Rstan - generate multinomial distributions (ragged vectors) Modeling	4	830	May 17, 2018
Problem using mulitonial Modeling	5	471	June 27, 2018
Multinomial logit in rstan Modeling	1	507	August 8, 2019

How to perform basic operations in rstan?

Related topics