Simple Bernoulli hierarchical model: Am I doing this right?

I’ve been struggling with this for a week so any help would be greatly appreciated!!!

For data generation I have a very simple model where every row of data is a ‘subobservation’ and every 3 subobservations make up a complete ‘observation’.

For each observation, 1 of it’s 3 subobservations also contains a signal. This is the thing I’m trying to detect.

I finally put together a model that’s actually working but it seems weird though for two reasons:

  1. I’m using the observed data in 2 places, one in the likelihood function and the other in the prior for signal_subobservation_estimate. This seems circular but I can’t figure out any other way to condition this parameter on data. Is this the best way to do this?

  2. I’m having to normalize my likelihood parameters in order to fit into the Bernoulli distribution. Is this right or am I supposed to have another parameter like a Beta prior in-between the likelihood and my other parameters?

Here is the model I’m using:

data {
   int<lower=1> M;   //num observations
   int<lower=2> V;   // num features
   int<lower=1> S;   // num subobservations in each observation
   matrix[M*S,V] y;  // data

transformed data {
   int N = M*S;

   int<lower=0> y_ints[N,V];
   for (n in 1:N) {
      for (v in 1:V) {
         while (y_ints[n,v] < y[n,v]) {  
            y_ints[n,v] = y_ints[n,v] + 1;

parameters {
   vector<lower=0,upper=1>[V] signal_estimate;    
   vector<lower=0,upper=1>[V] noise_estimate;     
   simplex[S] signal_subobservation_estimate[M];      // subobservation dist per observation  

transformed parameters {

   vector[V] p[N];

   for (m in 1:M) {
      for (s in 1:S) {
         vector[V] p_unnormalized = noise_estimate + signal_estimate * signal_subobservation_estimate[m,s];
         p[(m-1)*S+s] = p_unnormalized / sum(p_unnormalized);

model {
   signal_estimate ~ beta(1,1);
   noise_estimate ~ beta(1,1);
   for (m in 1:M) {
      signal_subobservation_estimate[m] ~ dirichlet(y[(m-1)*S+1:(m-1)*S+S] * signal_estimate);
   for (n in 1:N) {
      y_ints[n] ~ bernoulli(p[n]);

Here is the process to generate the data:

observations = 100
subobservations = 3
datapoints = observations*subobservations
features = 5

latent_noise = rep(0.01,features)
latent_noise[1] = 0.9
latent_noise[3] = 0.9
latent_noise[4] = 0.5

latent_signal = rep(0,features)
latent_signal[2] = 1

latent_signal_subobservation = rep(c(0,1,0),observations)

y = matrix(nrow=datapoints, ncol=features)

y = y[,colSums(y)!=0]
features = dim(y)[2]

for (n in 1:dim(y)[1]) {

   y[n,] = rbinom(features,1,latent_noise) + rbinom(features,1,latent_signal * latent_signal_subobservation[n]) 


fake_data <- list(V = features,
                 y = y,
                 M = observations,
                 S = subobservations)

I’m not sure why you’re trying to use the Dirichlet for a Bernoulli model or what ignal_subobservation_estimate is supposed to do or why it’s an array of simplexes.

What are you trying to make hierarchical? There are two standard ways to build hierarchical models for groups of binary/binomial observations. I cover them in my repeated binary observations case study.

Thanks @Bob_Carpenter for taking a look at my problem. I re-read the binary trials case study and I think my problem is a bit different because there is a latent variable I’m trying to detect.

In my data I have 100 groups of 3 datapoints each (N ==300). One of the 3 datapoints contains additional signal, hence my trying to apply a dirichlet, and the other 2 are only noise. So I’m trying to simultaneously model the noise_estimate, the signal_estimate, and which datapoint contains the signal. Does that make sense? Does it sound like any model you’re familiar with?

y ~ signal_estimate * signal_subobservation_estimate+ noise_estimate

I’m not sure waht you mean by “which datapoint cointains the signal”. What’s the generating process for the data? I saw you tried to outline this in the first post, but I don’t know what “1 of it’s 3 subobservations also contains a signal”. What do the other “subobservations” contain? What’s a subobservation?

It sounds like it’s just a classification problem, but I’m not sure.