RSTAN compilation problem

Hi all,

I am a complete newbe to Stan and I have a problem.
I was able to run a model where the line causing trouble was int<lower=0,upper=1> y[N];.
Can somebody help me understanding why a real y does not compile?

data {
  int<lower=0> N;
  real<lower=0,upper=1> y[N];  // line causing trouble worked fine with int<lower=0,upper=1> y[N];.
  real beta_1;
  real beta_2;
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(beta_1,beta_2);
  y ~ binomial(N,theta);
}

I would like to use a real number to get a non dichtomous y.

Best
Simon

The issue is

y ~ binomial(N, theta);

y must be integer because the binomial distribution is only defined for integer values. If you want a non-dichotomous y you need to model it with a non-dichotomous distribution. I dunno, maybe beta distribution? Can you say more about the kind data you’re trying to model?

1 Like

Thanks. That makes sense.

I do have data about a drug treatement. Sometimes it works: 1 and sometimes it doesn’t 0. In the data we observe nearly 50% of working cases. I ran the model with int and an N of 40, but unfortunately Theta is around 0.06. This made me suspicious, as I was expecting a much higher Theta with a Bet(1,1) prior.

As far as I know this implies that the posterior probability of 1 is less than 1%.

Do you have some kind of idea why this is the case?

That does sound very strange. If the prior is Beta(1,1) and you have 20 successes and 20 failures then the posterior should be Beta(21,21).

Oh, wait. y is either zero or one, right? The binomial should be written as

  y ~ binomial(1,theta);

The first argument is the number of outcomes per each patient, i.e. the maximum value ypossible.

1 Like
model = "

// Binomial model with beta(1,1,) prior
data {
  int<lower=0> N;
  int<lower=0,upper=1> y[N];
  real bet_1;
  real beta_2;
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(bet_1,beta_2);
  y ~ binomial(N,theta);
}

"
y = c(1,1,......,0,0)
data_stan <- list("N"=length(y), "y"=y,
                  "bet_1"=1, "beta_2"=1)
posterior <- sampling(model, data = data_stan, warmup = 5000, iter = 10000, 
                      chains = 4, cores = 1, thin = 1, refresh = 0)

I am using RSTAN and this is the code I am running.
Am I missing some integral point here?

I am using RSTAN btw.

Simultaneous posts, ha. The binomial should be binomial(1, theta) not binomial(N, theta).

1 Like

Thank you! Am I getting this right: We have to set N equal to 1 as we observe only one result per patient?

y ~ binomial(1, theta)

is, when y is an array, just shorthand for

for (i in 1:N)
  y[i] ~ binomial(1, theta);

Binomial models the number of events that have probability theta. Each patient has only one (potential) event. If you combined all the data to a single number of events int y_total = sum(y) then you’d have total number of potential events equal to the number of patients y_total ~ binomial(N, theta).

1 Like