Generate quantities using bernoulli_rng gives real value not int

My_Work · March 4, 2020, 2:33pm

Hi,

I’m confused about what bernoulli_rng function does. Based on Stan manual (p.306), it takes a real value (probability) and returns an integer (0 or 1).

int bernoulli_rng(real theta) Generate a Bernoulli variate with chance of success theta; may only be used
ingenerated quantities block

So I assume if I use it as they do here:

github.com

CCS-Lab/hBayesDM/blob/master/commons/stan_files/ra_noLA.stan#L90


      
                  real evGamble;  // they are left as arrays as an example for RL models.
                  real pGamble;
          
                  // loss[i, t]=absolute amount of loss (pre-converted in R)
                  evSafe     = pow(cert[i, t], rho[i]);
                  evGamble   = 0.5 * (pow(gain[i, t], rho[i]) - pow(loss[i, t], rho[i]));
                  pGamble    = inv_logit(tau[i] * (evGamble - evSafe));
                  log_lik[i] += bernoulli_lpmf(gamble[i, t] | pGamble);
          
                  // generate posterior prediction for current trial
                  y_pred[i, t] = bernoulli_rng(pGamble);
                }
              }
            }
          }

y_pred[i, t] = bernoulli_rng(pGamble);

I should feed it with a probability and get 0 or 1 (nothing or succes). However, I get real numbers \in (0,1). Why is that? What am I missing?

How can I predict the behaviour based on those generated quantities? Ie., how do I get the prediction of the model (which are discrete, in this case accept/reject)?

Thank you.

martinmodrak · March 5, 2020, 7:51am

I just tested it on my computer and I get integer values as expected. Could you post a minimal example where you get the real numbers?

I’ve used expose_stan_functions to be able to work with the function directly in R. Here’s what I get:

library(rstan)
cc <- "
functions {
  int wrap_bernoulli_rng(real x) {
    return bernoulli_rng(x);
  }
}
"

expose_stan_functions(stan_model(model_code = cc))

x <- wrap_bernoulli_rng(0.3)
x
typeof(x)

The output is as expected:

> x
[1] 0

> typeof(x)
[1] "integer"

Does your output differ?

My_Work · March 5, 2020, 8:41am

Yes, I can. I’m using pystan, I have a ntb for that (just do not know how to share it here, so I will post the code).

The STAN code is the one I posted before, ra_noLA.

The python code goes as follow:

# define conditions and path
import os, sys
import pystan
import pickle

import numpy as np
import pandas as pd

import time
from datetime import timedelta


model_folder = "C:/Users/user/stan_files/"
data_path = 'C:/Users/user/data/'


#prepare model
model = 'ra_noLA.stan'
model_path = os.path.join(model_folder, model)

try:
    with open(os.path.join(data_path, 'data_LA_exp1_testSTAN.txt'), "rb") as fp:   # Unpickling
        LA_data_prep_test = pickle.load(fp)
    print('loading data from folder')

except IOError:
    print("File not accessible")

# Time the process
start = time.time()

# Compile the model
start_compile = time.time()
stan_model_noLA = pystan.StanModel(file=model_path)
end_compile = time.time()
print('Duration of compilation: ', timedelta(seconds=end_compile - start_compile))


# run the model
start_model = time.time()
fit_noLA = stan_model_noLA.sampling(data=LA_data_prep_test, 
                    chains=4, iter=2000, warmup=1000, thin=1, init='random', verbose=True, 
                    control = {"adapt_delta":0.95, "stepsize":1, "max_treedepth":10}, n_jobs=-1)
end_model = time.time()
print('Duration of the model run is: ', timedelta(seconds=end_model - start_model))

# Create an ordered dictionary
summary_dict = fit_noLA.summary()
#print(summary_dict)

# Create a data frame out of the output
df_noLA = pd.DataFrame(summary_dict['summary'], 
                  columns=summary_dict['summary_colnames'], 
                  index=summary_dict['summary_rownames'])
    
end = time.time()

print('Duration of the entire process is: ', timedelta(seconds=end - start))

My data example is attached. data_LA_exp1_testSTAN.txt (7.4 KB)

Printing df_noLA.loc['y_pred[1,1]':'y_pred[1,256]', 'mean'] gives the values and type:

y_pred[1,1] 0.00025
y_pred[1,2] 0.00000
y_pred[1,3] 0.00050
y_pred[1,4] 0.00025
y_pred[1,5] 0.00075
…
y_pred[1,252] 0.81200
y_pred[1,253] 0.86200
y_pred[1,254] 0.90425
y_pred[1,255] 0.93550
y_pred[1,256] 0.94850
Name: mean, Length: 256, dtype: float64

martinmodrak · March 5, 2020, 9:02am

I do not use PyStan, but it appears, you are printing the summary of the posterior draws of y_pred, particularly the mean. The mean of bernoulli variables is expected to be real number. If you want the reaw 0,1 values you would need to access the draws and not the summary.

Does that make sense?

My_Work · March 5, 2020, 10:19am

Yes, it does make sense. I’m taking it from the fit_noLA.summary() where they are stored as well.
I checked the fit_noLA.extract('y_pred') and indeed, you are right, those are integers. So that is solved, thank you very much.

So let me clarify it for myself again. y_pred are stored for each trial, ie. y_pred[1,1], ..., y_pred[1,256] and what I get in the summary is an average across all the (n_of_chains)*(iters_without_warm_up)? That’s the mean I’m seeing?

So can I run a simulation within the code (that’ what I thought it is doing), where it takes the estimated parameters and regenerates the behaviour, ie. returns only a vector of 0 and 1 of the same length as the input?

martinmodrak · March 5, 2020, 11:40am

Correct.

You have one such simulation for each chain x iteration. The usual workflow is to compute any quantity of interest for each chain x iteration separately, giving you chain x iteration samples of the quantity letting you judge the uncertainty you have.

Bob_Carpenter · March 19, 2020, 7:27pm

In addition to the posterior mean vs. draws, y_pred is defined as a real value in the program:

  real y_pred[N, T];

Make that int if you want y_pred[i, t] to hold integers.

My_Work · March 20, 2020, 2:15pm

Thanks for that tip. I was just confused by it for a while, now I understand what is going on.

Topic		Replies	Views
Generated quantities from bernoulli_logit_glm model Modeling	3	514	November 12, 2021
Bernoulli_logit_rng not vectorized in RStan 2.18.2? Modeling	2	411	July 29, 2019
Newbie looking for help with creating ppc in generated quantities block with bernoulli_logit_glm_rng Modeling rstan	5	418	March 15, 2023
Available argument signatures for Bernoulli: Real return type required for probability function General rstan	2	435	November 16, 2022
Select a random integer index in Stan General rstan	6	833	August 19, 2020

Generate quantities using bernoulli_rng gives real value not int

Related topics