Basic model - Estimate variables in probit model and level height + noise simultaneously

As a true beginner to Stan I’ve been working on a very basic probit model. It works well, please see R code and Stan code below.

However, I want to extend the model where [0,1] is a latent state reflecting the probability of reaching a level with a specific height. Next to estimating the variables of the probit model, I also want to include the estimates of the height of the level (in simulated example below 3 +/- measurement noise).

How do I need to extend the current Stan model to efficiently add the “level_height” and “noise” term?

Thank you very much! Thomas

R code:

# Number of attempts  
N <- 300
# Input parameters
alpha <- 10
beta <- 1.5
level_height <- 3

# Random noise
e<-0.1
noise <- e*runif(N)-e/2
# Simulation of region where level becomes activated 
x_val <- alpha + (6*beta*runif(N)-3*beta)

# Active or inactive level
# Only 0's or 1's
y_act <-pnorm(x_val,alpha,beta)>runif(N); 

# Add height of level
# Includes level height + noise
y_level <-y_act*level_height + noise

Stan code (works well for y_act):

data {
int<lower=1> N;
vector[N] x;
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0> alpha;
real<lower=0.1,upper=10>beta;
}
model {
vector[N] eta;
eta = 1/beta * (x - alpha);
alpha ~ normal(0,10);
beta ~ cauchy(0,10);
for (i in 1:N)
{
  y[i] ~ bernoulli(Phi(eta[i]));
}
}

Stan code (starting point for “y_level”):

data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}
parameters {
real<lower=0> alpha;
real<lower=0.1,upper=10>beta;
real<lower=0.1,upper=10> levelheight;
real<lower=0.01> noise;
}
model {
vector[N] eta;
eta = 1/beta * (x - alpha);

alpha ~ normal(0,10);
beta ~ cauchy(0,10);
levelheight ~ normal(0,10);
noise ~ cauchy(0,5);

for (i in 1:N)
{
  y[i] ~ bernoulli(Phi(eta[i]));
}
}
1 Like

Hi, sorry, your question fell through despite being relevant and well written.

Generally Stan does not support discrete unobserved parameters, but, in this case this is not a big limitation. One fruitful way to think about the model is that it is a mixture of two distributions - one with mean 0 and the other with mean level_height. I would be surprised if uniformly distributed noise as you have in the simulation would actually capture your data well, but maybe you have a reason to believe this is the case? It could also cause problem Note that with such a noise there will be many observations where you know exactly whether they came from y = 0 or y = 1 and some y values would be impossible.

So I would find it more likely (and easier to implement) if you had normal noise. In this case, you could have something like:

...

parameters {
   ....
   real<lower=0> level_height;
   real<lower=0> sigma;
}
...


model {
....
for (i in 1:N)
{
  target += log_mix(Phi(eta[i]), 
                       normal_lpdf(y[i] | 0, sigma), 
                       normal_lpdf(y[i] | level_height, sigma)
  );
}
}

See 5 Finite Mixtures | Stan User’s Guide for more theory on mixture models in Stan.

Also note that the lower and upper bounds you use are somewhat suspicious - unless you have a physical reason to believe that say beta cannot be lower than 0.1 or larger than 10, it is usually more sensible to enforce soft constraints via prior. Hard constraints should be reserved for stuff like ensuring a parameter is positive (in which case a lower bound of 0 is preferable)

Best of luck with your model!

Dear Martin,

Thank you very much for your help! I think it would also be useful if I add a “generated quantities” block. I had some issues with priors previously where it didn’t fell within the predefined prior distribution (e.g. beta ~ normal(2,0.4), where the samples within all chains remained at +/- 4, despite removing hard constraints). Anyway, I will work on an improved version of the model given your useful suggestions!

Thanks again!

Best,
Thomas

This looks like there is a conflict between prior and the data - this is definitely at least worth some investigation. In any case putting a hard constraint to avoid prior-data conflict is usually not a good strategy, you are basically just saying “whatever the data says, I will prefer my prior information”, in which case you might as well not use the data at all…

Does that make sense?

Dear Martin,

Thank you again for the comments. For clarity I’ve added again my current R code and Stan code. The issues with fitting somehow remains. You had a good point regarding “noise”, so I have now defined “noise” differently in the R code.

R code:

# Number of attempts
N <- 300

# Input parameters
alpha <- 10
beta <- 1.5
level <- 3

# Random normal noise
e<-0.05
noise <- rnorm(N,0,e)

# Simulation of region where level becomes activated 
x_val <- alpha + (6*beta*runif(N)-3*beta)

# Active or inactive level
y_act <-pnorm(x_val,alpha,beta)>runif(N);

# Add height of level
y_level <-y_act*level + noise

Some background information regarding the Stan code below: All parameters have a lower boundary of 0. For “beta” I have chosen to set it at “0.1” due to the definition of “eta” to prevent searching close to 1/0 (infinity). I had the impression that it worked better. Additionally, to prevent that “noise” converges to “level_height”, so to have issues related to parameter identifiability I also put a hard upper constraint for “noise” based on the data (with “maxNoise”). The priors are rather informative and close to the simulated values, so I have the impression they are defined correctly.

Where can I improve my coding to adequately fit the parameters in Stan?

Thank you!

Stan code (x–> “x_val” & y–> “y_level”):

data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}
transformed data {
real maxNoise;
maxNoise = 0.1*max(y);
}
parameters {
real<lower=0> alpha;
real<lower=0.1> beta;
real<lower=0> level_height;
real<lower=0, upper=maxNoise> noise;
}
model {
vector[N] eta;
eta = 1/beta * (x - alpha);
alpha ~ normal(12,2);
beta ~ normal(2,0.2);
level_height ~ normal(4,2);
noise ~ normal(maxNoise/2,maxNoise);
for (i in 1:N)
{
 target += log_mix(Phi(eta[i]), 
                       normal_lpdf(y[i] | 0, noise), 
                       normal_lpdf(y[i] | level_height, noise)
  );
}
}

Great, you are welcome :-)

Could you be more specific? Is this a problem with compiling the model? Or do you get divergences or other warnigns when fitting? Or does fitting proceed without warnings, but the fitted values seem wrong?

I played with the code a bit and it seems the main problem is in initialization: Stan does, by default, initialize the parameters between -2 and 2 on the unconstrained scale, so for real<lower=0> x this turns out to be between exp(-2) and exp(2) (see 10.2 Lower bounded scalar | Stan Reference Manual for a bit more details). This can however produce values that are in the extreme tails of your priors - especially for alpha more than half of the inits will be more than 5 * sd from the prior mean. This then causes numerical instabilities from which some chains never recover. When using init centered on the prior for alpha and beta, e.g. init = function() { list(alpha = 10 + rnorm(1), beta = 2 + rnorm(1, sd = 0.2)) all of my chains behave nicely.

This sounds a bit weird… I wouldn’t expect identifiability issues with large noise (but maybe I am wrong). I would first check if everything else in the model is correct.

Note that you can use triple backticks (```) to format code nicely. I took the liberty to edit your post for you.

Best of luck with the model!