Hi all,
I’m a new Stan user learning the paradigm for expressing models as Stan programs.
Here’s one scenario I’m struggling with: a “privacy algorithm” example from Cameron Davidson-Pilon’s “Probabilistic Programming & Bayesian Methods for Hackers” book. I’m trying to translate this pymc3 example into Stan but it doesn’t seem to fit well within the paradigm.
In this scenario, researchers ask each student whether they cheated on an exam. To respond, each student flips a coin:
- If heads, they reply truthfully whether they cheated.
- If tails, they flip the coin again, and reply yes if heads, no if tails.
Given data of number of students and observed probability of “yes” answers, I want to estimate the true frequency of cheating.
Below is how I am approaching this with pystan
and the error I am getting: “attempt to assign variable in wrong block. left-hand-side variable origin=data”.
Could you please help me understand how to write such a model with a generated quantity correctly in Stan? How do I model the computation of pobs
? My confusion is two-fold:
- Perhaps I should move
pobs
declaration out ofdata
and intomodel
, but then I’m not sure how to pass in this observed frequency of cheating. - How to properly model a discrete integer, in this case the mixture sum of the students’ flips?
Python code and error follows.
Thanks!
flip_code = """
functions {}
data {
int<lower=0> N; // number of students
real pobs; // observed proportion of "yes" answers
//int x; // observed number of students who reply yes
// do we need to pass in observed value for each student, i.e. an int observed[N]?
}
transformed data {} // only evaluated once
parameters {
real<lower=0,upper=1> p; // true probability of cheating
}
transformed parameters { // executed once per iteration
}
model {
// for each student:
//int<lower=0,upper=1> truths[N]; // can't have variable constraints in model?
int truths[N]; // true cheating status
int first_flips[N]; // first coin flip result
int second_flips[N]; // second coin flip result
real val[N]; // observed value. real so that can do floating point division below.
p ~ uniform(0,1); // prior on true probability of cheating
for (i in 1:N) {
truths[i] ~ bernoulli(p);
first_flips[i] ~ bernoulli(0.5); // fair coin flip
second_flips[i] ~ bernoulli(0.5); // fair coin flip
val[i] = first_flips[i] * truths[i] + (1 - first_flips[i])*second_flips[i]; // observed value from privacy algorithm
}
pobs = sum(val) / N; // note that we force floating point division here
//x = sum(val);
}
generated quantities {}
"""
N=100.
x=35.
pobs=x/N
flip_data = {'N': N, 'pobs': pobs }
flip_fit = pystan.stan(model_code=flip_code, data=flip_data,
iter=1000, chains=4)
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-42-beba03c597ad> in <module>()
42
43 flip_fit = pystan.stan(model_code=flip_code, data=flip_data,
---> 44 iter=1000, chains=4)
/home/maxim/anaconda2/envs/infil/lib/python3.6/site-packages/pystan/api.py in stan(file, model_name, model_code, fit, data, pars, chains, iter, warmup, thin, init, seed, algorithm, control, sample_file, diagnostic_file, verbose, boost_lib, eigen_lib, n_jobs, **kwargs)
377 m = StanModel(file=file, model_name=model_name, model_code=model_code,
378 boost_lib=boost_lib, eigen_lib=eigen_lib,
--> 379 obfuscate_model_name=obfuscate_model_name, verbose=verbose)
380 # check that arguments in kwargs are valid
381 valid_args = {"chain_id", "init_r", "test_grad", "append_samples", "enable_random_init",
/home/maxim/anaconda2/envs/infil/lib/python3.6/site-packages/pystan/model.py in __init__(self, file, charset, model_name, model_code, stanc_ret, boost_lib, eigen_lib, verbose, obfuscate_model_name, extra_compile_args)
209 model_name=model_name,
210 verbose=verbose,
--> 211 obfuscate_model_name=obfuscate_model_name)
212
213 if not isinstance(stanc_ret, dict):
/home/maxim/anaconda2/envs/infil/lib/python3.6/site-packages/pystan/api.py in stanc(file, charset, model_code, model_name, verbose, obfuscate_model_name)
127 if result['status'] == -1: # EXCEPTION_RC is -1
128 error_msg = "Failed to parse Stan model '{}'. Error message:\n{}".format(model_name, result['msg'])
--> 129 raise ValueError(error_msg)
130 elif result['status'] == 0: # SUCCESS_RC is 0
131 logger.debug("Successfully parsed Stan model '{}'.".format(model_name))
ValueError: Failed to parse Stan model 'anon_model_9db1e01b1721a4e4e70a05b89eeea5d9'. Error message:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
truths[i] ~ bernoulli(...)
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
first_flips[i] ~ bernoulli(...)
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
second_flips[i] ~ bernoulli(...)
attempt to assign variable in wrong block. left-hand-side variable origin=data
ERROR at line 30
28: val[i] = first_flips[i] * truths[i] + (1 - first_flips[i])*second_flips[i];
29: }
30: pobs = sum(val) / N; // note that want to force floating point division here
^
31: //x = sum(val);
PARSER EXPECTED: <expression assignable to left-hand side>