Fitting a uniform distribution

germannp · December 14, 2018, 3:49pm

Hi there,

I am still exploring Stand and tried to fit a uniform distribution to some synthetic data (see code below), however Stan complains about divergencies and the posteriors look odd to me:

Why is that? Is there a proper way to fit uniform distributions to data?

Cheers!

Pystan Example

import numpy as np
import pystan


y = np.random.uniform(0.8, 1.2, 50)

model = pystan.StanModel(model_code= """
data {
    int<lower=0> N;
    vector[N] y;
}
parameters {
    real<lower=0.5, upper=1> alpha;
    real<lower=1, upper=1.5> beta;
}
model {
    y ~ uniform(alpha, beta);
}
""")

fit = model.sampling(data={'N': len(y), 'y': y}, iter=10000, chains=4)

hhau · December 14, 2018, 4:38pm

Probably use a (positive_)ordered type for the parameters, and try without hard constraints (put weakly informative normals on each element of the ordered vector).

germannp · December 14, 2018, 5:41pm

Thanks for the suggestions, but they are not changing anything.

mitzimorris · December 14, 2018, 6:40pm

why this particular model?
you need to put some priors on your parameters.

as @hhau said, you should model alpha and beta as ordered or positive_ordered.
and as you saw - that’s not good enough.

with 50 draws from uniform[0.8, 1.2], 4 chains of 1000 iterations will recover the bounds:

4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

       mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
alpha  0.81    0.00 0.01  0.78  0.81  0.81  0.82  0.82   163 1.01
beta   1.19    0.00 0.01  1.19  1.19  1.19  1.20  1.21   148 1.01
lp__  43.65    0.15 1.57 39.77 42.87 43.98 44.85 45.60   114 1.02

but almost 3000+ of the draws result in a divergence - the pairs() plot shows you where the problems occur

I suggest reading Betancourt’s tutorial on divergences.
https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html

again, the real question is what is the data-generating process you want to model?

cheers,
Mitzi

hhau · December 14, 2018, 10:32pm

I think you are getting divergent transitions because the sampler steps into a part of parameter space that makes the likelihood invalid, which is where the lower bound is greater than the minimum of the data, and/or the upper bound is smaller than the maximum of the data. The following Stan program seems to work fine and reports no divergent transitions.

data {
  int<lower=0> N;
  vector[N] y;
  real y_min;
  real y_max;
}

parameters {
  real <upper = y_min> alpha;
  real <lower = y_max> beta;
}

model {
  alpha ~ normal(0.8, 1) T[, y_min];
  beta ~ normal(1.2, 1) T[y_max, ];
  y ~ uniform(alpha, beta);
}

where min_y and max_y are fairly self explanatory, and could be found in the transformed data block instead of externally.

germannp · December 16, 2018, 10:26am

Sorry, I am just learning, not trying to model a particular process or anything :-)

germannp · December 16, 2018, 10:34am

Indeed, the divergences make sense and good idea with the bounds!

I tried introducing “measurement uncertainty”, but that would fit properly neither:

data {
    int<lower=0> N;
    vector<lower=0>[N] y_obs;
}
parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    vector<lower=0>[N] y;
    real<lower=0> sigma;
}
model {
    y_obs ~ normal(y, sigma);
    y ~ normal(alpha, beta);
}

hhau · December 19, 2018, 1:30pm

I don’t know if there is a sensible way to incorporate measurement error into this kind of model. You want to do something like this:

data {
  int <lower = 0> N;
  vector [N] y_obs;
}

parameters {
  vector <lower = 0> [N] y;
  real <lower = 0> sigma;
  real <upper = min(y)> alpha;
  real <lower = max(y)> beta;
}

model {
  y_obs ~ normal(y, sigma);
  y ~ uniform(alpha, beta);

  sigma ~ normal(0, 1);
  alpha ~ normal(0.8, 1);
  beta ~ normal(1.2, 1);
}

but this model probably needs some kind of complicated Jacobian adjustment to account for the varying support of alpha and beta. Additionally, I would suspect the use of min and max might introduce a non differentiable point into the posterior, which is problematic. I also think that this model is likely to be unidentifiable because of the strange way sigma / alpha / beta interact. Simulating data from this model and trying to estimate the simulated parameters with Stan demonstrates these problems.

Topic		Replies	Views
Divergences in a simple uniform distribution Modeling fitting-issues	2	430	June 8, 2020
Prooblem about Model and Distribution General	6	622	April 18, 2019
Sample posterior distribution of correlation for bivariate uniform distribution Modeling rstan	2	543	December 16, 2021
Defining discrete uniform parameter Modeling rstan , specification	2	759	January 26, 2022
Sampler returns the same value for each parameter in Weibull Model Modeling fitting-issues	1	280	October 25, 2023

Fitting a uniform distribution

Related Topics