Observe() statement

How to write observe() statements in STAN for example in the below program at line 9 contains observe statement.

I am beginner to probabilistic programming. Found this example while reading a paper on probabilistic programming.

Ref : Probablistic Programming by Aditya Nori

1: bool c1, c2;
2: int count = 0;
3: c1 = Bernoulli(0.5);
4: if (c1) then
5:   count = count + 1;
6: c2 = Bernoulli(0.5);
7: if (c2) then
8:   count = count + 1;
9: observe(c1 || c2);
10:return(count)

Probably the fastest way to see what Stan is about (and figure out if there’s something like the observe statement) would be to look through either the Rstan or PyStan tutorials.

Rstan: https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html
PyStan: https://pystan.readthedocs.io/en/latest/getting_started.html

I’m not sure what the observe statement does there honestly.

Observe statement blocks runs which do not satisfy the Boolean expression c1||c2 and does not permit those executions to happen.

statement observe(x) is very related to the statement assume(x) used in program verification literature.

observe(x) is equivalent to the while-loop while(!x) skip

Above program can be re-written as below :

1: bool c1, c2;
2: int count = 0;
3: c1 = Bernoulli(0.5);
4: if (c1) then
5:     count = count + 1;
6: c2 = Bernoulli(0.5);
7: if (c2) then
8:     count = count + 1;
9: while !(c1 || c2) {
10:     count = 0;
11:     c1 = Bernoulli(0.5);
12:     if (c1) then
13:          count = count + 1;
14:     c2 = Bernoulli(0.5);
15:     if (c2) then
16:          count = count + 1;
17:}
18:return(count);

Oh okay. I’m assuming this means generated a bernoulli random number and store it in c1? If so I think this is probably a different thing from Stan.

Random number generation is not allowed in the Stan model or transformed parameters blocks. Stan is specifically built for Bayesian inference using HMC. HMC doesn’t allow for randomness in the log density accumulation and we’re limited to sampling continuous random variables.

1 Like

but in generated quantities block it is possible.

Right and also in transformed data I think.

But both of those blocks are sort of optional. They’re more for convenience than anything. You can do the necessary preprocessing (transformed data) or postprocessing (generated quantities) outside of Stan.

It’s not allowed in the model block, which makes me think these two languages have different assumptions behind them.

In the paper I referred above the it is mentioned that any probabilistic language should have support for

1. probabilistic assignment 
       for Eg : x ~ binom(n,p)
2. observe statement
       for Eg : observe(x) 

That’s why I am pretty confident that STAN should have support for it.

x ~ binomial(n,p)

in Stan translates to

target += binomial_lpmf(x | n, p);

which is not an assignment in the traditional sense :D.

Check out: https://mc-stan.org/docs/2_20/reference-manual/sampling-statements-section.html .

Your link won’t load for me but I found this which I think is either the same paper or closely related.

This observe function creates a conditional distribution so I guess the closest equivalent in Stan is a sampling statement. However, observe uses discrete logic that cannot really be expressed in Stan.

Let’s take a look at the Skill Rating In Online Games example found in section 4 of the paper I linked.

The idea is that three players, A, B, and C, play two games, first A verses B and then B versus C, and the we’ll model their skills. The code is

float skillA, skillB,skillC;
float perfA1,perfB1,perfB2,
      perfC2,perfA3,perfC3;
skillA = Gaussian(100,10);
skillB = Gaussian(100,10);
skillC = Gaussian(100,10);
 // first game:A vs B, A won
perfA1 = Gaussian(skillA,15);
perfB1 = Gaussian(skillB,15);
observe(perfA1 > perfB1);
 // second game:B vs C, B won
perfB2 = Gaussian(skillB,15);
perfC2 = Gaussian(skillC,15);
observe(perfB2 > perfC2);
 // third game:A vs C, A won
perfA3 = Gaussian(skillA,15);
perfC3 = Gaussian(skillC,15);
observe(perfA3 > perfC3);

Here the calls to observe tell the model who wins each game.

But performance supposed to be the player’s total score, right? It’s known so why not just model it directly instead of only “observing” the winner. I’d write a Stan model like this:

data {
  real perfA;
  real perfB1;
  real perfB2;
  real perfC;
} parameters {
  real skillA;
  real skillB;
  real skillC;
} model {
  skillA ~ normal(100, 10);
  skillB1 ~ normal(100, 10);
  skillB2 ~ normal(100, 10);
  skillC ~ normal(100, 10);
  perfA ~ normal(skillA, 15);
  perfB1 ~ normal(skillB, 15);
  perfB2 ~ normal(skillB, 15);
  perfC ~ normal(skillC, 15);
}

If performance is not directly observable but a hypothetical latent variable that only exists to recognize the fact that the more skilled player does not always win it should be integrated out. Stan code is now

parameters {
  real skillA;
  real skillB;
  real skillC;
} model {
  skillA ~ normal(100, 10);
  skillB ~ normal(100, 10);
  skillC ~ normal(100, 10);
  // first game, 0 means B won, 1 means A won
  1 ~ bernoulli(normal_cdf(skillA, skillB, sqrt(2)*15));
  // second game, 0 means C won, 1 means B won
  1 ~ bernoulli(normal_cdf(skillB, skillC, sqrt(2)*15));
}

But fine, if you really want to keep perfs as estimated unknowns you can declare the observed constraints in the parameters block

parameters {
  real skillA;
  real skillB;
  real skillC;
  real perfA;
  real<upper=perfA> perfB1; // A won the first game
  real perfB2;
  real<upper=perfB2> perfC; // B won the second game
} model {
  skillA ~ normal(100, 10);
  skillB ~ normal(100, 10);
  skillC ~ normal(100, 10);
  perfA ~ normal(skillA, 15);
  perfB1 ~ normal(skillB, 15);
  perfB2 ~ normal(skillB, 15);
  perfC ~ normal(skillC, 15);
}

The latter two models give almost (but strangely not exactly!) the same results as the paper. Here we see that Stan takes quite different approach to probabilistic modeling.

4 Likes

Probabilistic programming is defined differently in different contexts. A large group driven by theoretical computer scientists in the UK one the last decade or so focuses on languages where probability distributions, as represented by variables, are first class objects. Any such language then needs operations that implement the natural operators on probability distributions, such as expectation, push-forward, and conditioning (which is what that “observe” statement is trying to do). The problem is that these operations cannot be implemented accurately in practice, so the languages designed with these operations in mind are very much limited. Some limit consideration to a small class of distributions where the operations can be done analytically, some use fragile algorithms to try to approximate the operations, etc.

Stan takes a different approach, focusing instead of what operations might be feasible in practice. In that vein Stan is not concerned with manipulating many elementary probability distributions but rather building and working with a single probability distribution. The Stan language defines a single joint probability density function over real-valued parameters and real or integer-valued data, and then the Stan libraries condition that joint density on the observed values before estimating expectation values of that conditional (and pushforwards of that conditional defined in the generated quantities block).

While this may seems like a strong limitation you can implement most of the operations in the more general languages within Stan; you just have to pose your equation so that the answer is the expectation value of a conditional distribution (or pushforward thereof). Really the only thing you lose are the intermediate probabilistic objects – Stan conditions the entire joint distribution at once instead of building the conditional out of many small, intermediate conditionals – and even then most of those intermediate objects can be recovered by specifying appropriate expectation values.

2 Likes

Stan doesn’t work that way, so you can either say Stan’s not a probabilistic programming language or reject the definition in the paper. :-)

The operational equivalent of the “observe” step comes when you specify data input in the data block, model it in the model block, then actually observe it through the sampling() command (in RStan—other interfaces name things slightly differently).