Ideas for modeling systematically skewed outliers?

arya · May 22, 2017, 8:16pm

Hi All,

I have (log) apartment prices along with apartment sizes (sq meters) for 12 different neighborhoods that I’d like to model using a hierarchical regression. For the most part, the data seems to follow a linear or quadratic trend except for a few outliers which I’d like to model (see image below). When the outliers do occur, they are far below the predicted regression mean, and usually, but not always at one of three specific levels (Mil1, Mil2, and Mil3 below). I think those are tax milestones, e.g. 1 million is the highest you can sell for without incurring any taxes. I have also found some decent predictors (not pictured) that help identify when a sale will be an outlier, e.g. when the economy is bad there are more of these outliers (maybe people getting worried and trying to liquify their investments).

I’m looking for ideas for a sensible generative process to describe this data. My ultimate goal is to get the predictive distribution of new houses. One thing I’ve tried is modeling the data as bi-modal (code below). One mode comes from the regression function, and another is a high-variance normal distribution with a much lower mean e.g. 14. The problem with this model is that the mixture probability is the same for all points given the economy variable (z). This means that the huge outliers affect the regression fit just as much the small outliers that may not even be outliers. I’m thinking I need more of a k-means type assignment probability that separate for each point so that the large outliers can only affect the outlier distribution and not the non-outlier distribution. Does anyone have any better ideas or tips for doing that?

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> L;
  matrix[N, K] x;               //basic housing predictors
  matrix[N, L] z;               //predictors of outlier
  real y[N];
}
transformed data {
  matrix[N, K] Q_ast;
  matrix[K, K] R_ast;
  matrix[K, K] R_ast_inverse;
  // thin and scale the QR decomposition
  Q_ast = qr_Q(x)[, 1:K] * sqrt(N - 1);
  R_ast = qr_R(x)[1:K, ] / sqrt(N - 1);
  R_ast_inverse = inverse(R_ast);
}
parameters {
  vector[L] gamma;
  vector[K] theta;               //regression coefficients for non-outliers (include intercept)
  real<lower=0,upper=0.5> sigma;          //std. error for non-outliers
  
  real<lower=12, upper=15> mu_outlier;
  real<lower=0, upper=0.5> sigma_outlier;
}
model {
  gamma[1] ~ normal(-7.64, 10.0);
  gamma[2] ~ normal(6.72, 10.0);
  //mu_outlier ~ normal(13.5, 1.0);
  theta[1] ~ normal(15.51, 0.01);
  theta[2] ~ normal(0.19, 0.01);
  theta[3] ~ normal(-0.01, 0.01);
  //sigma ~ normal(0.1, 0.01);
  //sigma_outlier ~ gamma(10, 20);
  
  for (n in 1:N) {
    target += log_mix(inv_logit(z[n]*gamma),
                      normal_lpdf(y[n] | mu_outlier, sigma_outlier),
                      normal_lpdf(y[n] | Q_ast[n]*theta, sigma));
  }
}
generated quantities {
  vector[K] beta;
  beta = R_ast_inverse * theta; // coefficients on x
}

sakrejda · May 22, 2017, 8:48pm

wouldn’t it make sense to do something multiplicative (well, additive on this log scale) where the outliers are selling at a fraction of the value they would otherwise sell at? Outliers might also just be down at the sell now price. Both of these might be outlier-type-specific.

avehtari · May 23, 2017, 6:59am

I think mixture model would be natural here. You could use as many mixtures as you have groups in your plot. Mixture components M1, MI2, MI3 would have informative prior as they don’t have much variation and they are quite well separated from normal observations. For other outliers you could use some skewed distribution, so that you can separate reasonably well it from the normal observations.Check Stan manual for examples of coding a mixture model.

Aki

arya · May 23, 2017, 3:16pm

Thanks Aki. Which skewed distribution would you recommend?

avehtari · May 23, 2017, 8:40pm

It seens that OtherOutliers are not so well defined, and you might be able to model the normals and non-MI’s together with skewed t-distrbution (one mixture component less). Otherwise I recommend plotting some conditional histograms to get ideas what distribution might be useful.

Aki

arya · May 25, 2017, 5:24am

So it doesn’t look like the skew-t distribution is available yet. It’s still an open issue it seems. Has anyone tried using it in a Stan model file as a custom density?

Unfortunately, none of the distributions with support on the real line that are available in Stan are skewed AND have heavy tails, so I think it’d be pretty useful. I’d be willing to take a crack at implementing it myself, if the developers think it’s a newbie-friendly issue.

avehtari · May 25, 2017, 11:44am

Here’s a quick implementation of Skewed generalised t (see, e.g. https://cran.r-project.org/web/packages/sgt/vignettes/sgt.pdf). There’s no argument checking, but in my experiments I used such constraints for parameters that it was valid all the time. The priors below are weakly informative for my specific case. In your case you might want to keep the parameter p fixed (providing skewed t distribution), and constrain lambda (l) to be negative (to constrain the long tail towards smaller values).

functions {
  real sgt_log(vector x, real mu, real s, real l, real p, real q) {
    // Skewed generalised t
    int N;
    real lz1;
    real lz2;
    real v;
    real m;
    real r;
    real out;
    N <- dims(x)[1];
    lz1<-lbeta(1.0/p,q);
    lz2<-lbeta(2.0/p,q-1.0/p);
    v<-q^(-1.0/p)*((3*l^2+1)*exp(lbeta(3.0/p,q-2.0/p)-lz1)-4*l^2*exp(lz2-lz1)^2)^(-0.5);
    m<-2*v*s*l*q^(1.0/p)*exp(lz2-lz1);
    out<-0;
    for (n in 1:N) {
      r<-x[n]-mu+m;
      if (r<0)
      	     out<-out+log(p)-log(2*v*s*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s)^p*(l*(-1)+1)^p)+1)^(1.0/p+q));
      else
      	     out<-out+log(p)-log(2*v*s*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s)^p*(l*(1)+1)^p)+1)^(1.0/p+q));
    }
    return out;
  }
}
data {
  int<lower=0> N;
  vector[N] x;
}
parameters {
  real mu; 
  real<lower=0> sigma; 
  real<lower=-.99,upper=0> l; 
  real<lower=1,upper=10> p; 
  real<lower=3.0/p,upper=p*50> q; 
}
model {
  mu ~ normal(0,9.0/sqrt(N));
  l ~ normal(0,.5);
  p ~ lognormal(log(2),1);
  q ~ gamma(2,0.1);
  x ~ sgt(mu, sigma, l, p, q);
}

arya · May 25, 2017, 5:48pm

Thanks Aki! That distribution worked like a charm! Would be nice to get it as a default distribution in Stan, especially to make it vectorizable.

avehtari · May 25, 2017, 7:00pm

I think implementing an additional distribution as an internal Stan function belongs to the “newbie-friendly” or as Stan repo issue label says “new dev friendly” category.

Aki

Bob_Carpenter · May 26, 2017, 4:25am

You can judge for yourself by looking at the non-skewed t and skewed normals:

github.com

stan-dev/math/blob/develop/stan/math/prim/scal/prob/skew_normal_lpdf.hpp

#ifndef STAN_MATH_PRIM_SCAL_PROB_SKEW_NORMAL_LPDF_HPP
#define STAN_MATH_PRIM_SCAL_PROB_SKEW_NORMAL_LPDF_HPP

#include <stan/math/prim/scal/meta/partials_return_type.hpp>
#include <stan/math/prim/scal/meta/operands_and_partials.hpp>
#include <stan/math/prim/scal/err/check_finite.hpp>
#include <stan/math/prim/scal/err/check_not_nan.hpp>
#include <stan/math/prim/scal/err/check_positive.hpp>
#include <stan/math/prim/scal/err/check_consistent_sizes.hpp>
#include <stan/math/prim/scal/fun/size_zero.hpp>
#include <stan/math/prim/scal/fun/erf.hpp>
#include <stan/math/prim/scal/fun/erfc.hpp>
#include <stan/math/prim/scal/fun/owens_t.hpp>
#include <stan/math/prim/scal/fun/value_of.hpp>
#include <stan/math/prim/scal/meta/is_constant_struct.hpp>
#include <stan/math/prim/scal/meta/include_summand.hpp>
#include <stan/math/prim/scal/fun/constants.hpp>
#include <stan/math/prim/scal/meta/VectorBuilder.hpp>
#include <stan/math/prim/scal/meta/scalar_seq_view.hpp>
#include <boost/random/variate_generator.hpp>

This file has been truncated. show original

and

github.com

stan-dev/math/blob/develop/stan/math/prim/scal/prob/student_t_lpdf.hpp

#ifndef STAN_MATH_PRIM_SCAL_PROB_STUDENT_T_LPDF_HPP
#define STAN_MATH_PRIM_SCAL_PROB_STUDENT_T_LPDF_HPP

#include <stan/math/prim/scal/meta/is_constant_struct.hpp>
#include <stan/math/prim/scal/meta/partials_return_type.hpp>
#include <stan/math/prim/scal/meta/operands_and_partials.hpp>
#include <stan/math/prim/scal/err/check_consistent_sizes.hpp>
#include <stan/math/prim/scal/err/check_finite.hpp>
#include <stan/math/prim/scal/err/check_not_nan.hpp>
#include <stan/math/prim/scal/err/check_positive_finite.hpp>
#include <stan/math/prim/scal/fun/size_zero.hpp>
#include <stan/math/prim/scal/fun/constants.hpp>
#include <stan/math/prim/scal/fun/square.hpp>
#include <stan/math/prim/scal/fun/value_of.hpp>
#include <stan/math/prim/scal/fun/lbeta.hpp>
#include <stan/math/prim/scal/fun/lgamma.hpp>
#include <stan/math/prim/scal/fun/digamma.hpp>
#include <stan/math/prim/scal/meta/length.hpp>
#include <stan/math/prim/scal/fun/grad_reg_inc_beta.hpp>
#include <stan/math/prim/scal/fun/inc_beta.hpp>

This file has been truncated. show original

Both these look like they way over-include; we split them apart at some point, but weren’t careful with includes. We need to clean that up!

IvanUkhov · August 11, 2020, 1:52pm

Thank you for this code snippet! I have two questions.

Is there any reason you are not using a proper prior for sigma?
The number of degrees of freedom is given by 2 q. Then, following your recommendation about gamma(2, 0.1), should the prior for q not be gamma(2, 0.2)?

avehtari · August 11, 2020, 5:01pm

The reason was that it was a quick experiment and had a brief moment to share the code. I recommend to use a proper prior and add argument checking.

For my quick experiment I didn’t think carefully about the priors. If gamma(2, 0.2) better reflects your prior information, I recommend to use that.

I have not used this since the quick experiment. If you are able to do something useful with
Skewed generalised t or learn something useful about the priors, it would nice if you can post it here.

Topic		Replies	Views
Skewed distribution Modeling	25	4840	June 21, 2021
Gaussian vs. skew-normal model selection brms loo	19	7336	August 26, 2019
Skew_normal pp_ckeck looks better than skew_generalized_t Modeling fitting-issues , specification , brms	5	519	April 14, 2023
Multivariate Skewed Model for Simple Dataset Modeling	6	907	April 8, 2019
Posterior predictive checks - kurtosis and skew brms	18	4030	October 9, 2019

Ideas for modeling systematically skewed outliers?

Related topics