Fitting a uniform distribution

Hi there,

I am still exploring Stand and tried to fit a uniform distribution to some synthetic data (see code below), however Stan complains about divergencies and the posteriors look odd to me:

Why is that? Is there a proper way to fit uniform distributions to data?

Cheers!

Pystan Example
import numpy as np
import pystan


y = np.random.uniform(0.8, 1.2, 50)

model = pystan.StanModel(model_code= """
data {
    int<lower=0> N;
    vector[N] y;
}
parameters {
    real<lower=0.5, upper=1> alpha;
    real<lower=1, upper=1.5> beta;
}
model {
    y ~ uniform(alpha, beta);
}
""")

fit = model.sampling(data={'N': len(y), 'y': y}, iter=10000, chains=4)

Probably use a (positive_)ordered type for the parameters, and try without hard constraints (put weakly informative normals on each element of the ordered vector).

Thanks for the suggestions, but they are not changing anything.

why this particular model?
you need to put some priors on your parameters.

as @hhau said, you should model alpha and beta as ordered or positive_ordered.
and as you saw - that’s not good enough.

with 50 draws from uniform[0.8, 1.2], 4 chains of 1000 iterations will recover the bounds:

4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

       mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
alpha  0.81    0.00 0.01  0.78  0.81  0.81  0.82  0.82   163 1.01
beta   1.19    0.00 0.01  1.19  1.19  1.19  1.20  1.21   148 1.01
lp__  43.65    0.15 1.57 39.77 42.87 43.98 44.85 45.60   114 1.02

but almost 3000+ of the draws result in a divergence - the pairs() plot shows you where the problems occur

image

I suggest reading Betancourt’s tutorial on divergences.
https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html

again, the real question is what is the data-generating process you want to model?

cheers,
Mitzi

3 Likes

I think you are getting divergent transitions because the sampler steps into a part of parameter space that makes the likelihood invalid, which is where the lower bound is greater than the minimum of the data, and/or the upper bound is smaller than the maximum of the data. The following Stan program seems to work fine and reports no divergent transitions.

data {
  int<lower=0> N;
  vector[N] y;
  real y_min;
  real y_max;
}

parameters {
  real <upper = y_min> alpha;
  real <lower = y_max> beta;
}

model {
  alpha ~ normal(0.8, 1) T[, y_min];
  beta ~ normal(1.2, 1) T[y_max, ];
  y ~ uniform(alpha, beta);
}

where min_y and max_y are fairly self explanatory, and could be found in the transformed data block instead of externally.

3 Likes

Sorry, I am just learning, not trying to model a particular process or anything :-)

Indeed, the divergences make sense and good idea with the bounds!

I tried introducing “measurement uncertainty”, but that would fit properly neither:

data {
    int<lower=0> N;
    vector<lower=0>[N] y_obs;
}
parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    vector<lower=0>[N] y;
    real<lower=0> sigma;
}
model {
    y_obs ~ normal(y, sigma);
    y ~ normal(alpha, beta);
}

I don’t know if there is a sensible way to incorporate measurement error into this kind of model. You want to do something like this:

data {
  int <lower = 0> N;
  vector [N] y_obs;
}

parameters {
  vector <lower = 0> [N] y;
  real <lower = 0> sigma;
  real <upper = min(y)> alpha;
  real <lower = max(y)> beta;
}

model {
  y_obs ~ normal(y, sigma);
  y ~ uniform(alpha, beta);

  sigma ~ normal(0, 1);
  alpha ~ normal(0.8, 1);
  beta ~ normal(1.2, 1);
}

but this model probably needs some kind of complicated Jacobian adjustment to account for the varying support of alpha and beta. Additionally, I would suspect the use of min and max might introduce a non differentiable point into the posterior, which is problematic. I also think that this model is likely to be unidentifiable because of the strange way sigma / alpha / beta interact. Simulating data from this model and trying to estimate the simulated parameters with Stan demonstrates these problems.