I’ve been struggling with this for a week so any help would be greatly appreciated!!!
For data generation I have a very simple model where every row of data is a ‘subobservation’ and every 3 subobservations make up a complete ‘observation’.
For each observation, 1 of it’s 3 subobservations also contains a signal. This is the thing I’m trying to detect.
I finally put together a model that’s actually working but it seems weird though for two reasons:

I’m using the observed data in 2 places, one in the likelihood function and the other in the prior for
signal_subobservation_estimate
. This seems circular but I can’t figure out any other way to condition this parameter on data. Is this the best way to do this? 
I’m having to normalize my likelihood parameters in order to fit into the Bernoulli distribution. Is this right or am I supposed to have another parameter like a Beta prior inbetween the likelihood and my other parameters?
Here is the model I’m using:
data {
int<lower=1> M; //num observations
int<lower=2> V; // num features
int<lower=1> S; // num subobservations in each observation
matrix[M*S,V] y; // data
}
transformed data {
int N = M*S;
//CONVERT THE OBSERVATIONS TO INTEGERS SO THEY CAN BE FED INTO THE BERNOULLI AND POISSION FUNCTIONS
int<lower=0> y_ints[N,V];
for (n in 1:N) {
for (v in 1:V) {
y_ints[n,v]=0;
while (y_ints[n,v] < y[n,v]) {
y_ints[n,v] = y_ints[n,v] + 1;
}
}
}
}
parameters {
vector<lower=0,upper=1>[V] signal_estimate;
vector<lower=0,upper=1>[V] noise_estimate;
simplex[S] signal_subobservation_estimate[M]; // subobservation dist per observation
}
transformed parameters {
vector[V] p[N];
for (m in 1:M) {
for (s in 1:S) {
vector[V] p_unnormalized = noise_estimate + signal_estimate * signal_subobservation_estimate[m,s];
p[(m1)*S+s] = p_unnormalized / sum(p_unnormalized);
}
}
}
model {
//priors
signal_estimate ~ beta(1,1);
noise_estimate ~ beta(1,1);
for (m in 1:M) {
//THIS PART SEEMS WEIRD
signal_subobservation_estimate[m] ~ dirichlet(y[(m1)*S+1:(m1)*S+S] * signal_estimate);
}
//model
for (n in 1:N) {
y_ints[n] ~ bernoulli(p[n]);
}
}
Here is the process to generate the data:
observations = 100
subobservations = 3
datapoints = observations*subobservations
features = 5
latent_noise = rep(0.01,features)
latent_noise[1] = 0.9
latent_noise[3] = 0.9
latent_noise[4] = 0.5
latent_signal = rep(0,features)
latent_signal[2] = 1
latent_signal_subobservation = rep(c(0,1,0),observations)
y = matrix(nrow=datapoints, ncol=features)
y = y[,colSums(y)!=0]
features = dim(y)[2]
for (n in 1:dim(y)[1]) {
// THE ACTUAL DATA GENERATION
y[n,] = rbinom(features,1,latent_noise) + rbinom(features,1,latent_signal * latent_signal_subobservation[n])
}
y[y>1]=1
fake_data < list(V = features,
y = y,
M = observations,
S = subobservations)