# Problem estimating heteroskedastic model

Hello,

I am stumped by a problem that I thought upfront I understood. It is a generic linear modeling problem involving a specific type of heteroskedasticity; the issue isn’t specific to Stan, but I am hoping there is a way of ‘fixing’ estimation via Stan. Consider the following data generating model:
y_i=ax_i+(b_0+b_1x_i)y_i^\star
where
y_i^\star=hG_i+e\epsilon_i.
We assume that y^\star has unit variance via the assumptions that e=\sqrt{1-h^2} and that \epsilon is (unobserved) white noise with unit variance. Expanding, we have
y_i=ax_i + b_0 h G_i + + b_1 h x_i G_i + b_0 e \epsilon_i + b_1 e x_i \epsilon_i.

We can replace the products of coefficients and rewrite this as:
y_i=ax_i+\pi_0 G_i+\pi_1 G_i x_i +\lambda_0 \epsilon_i+\lambda_1 x_i \epsilon_i.
Estimation should be fairly straightforwardly (or so I thought) as
E(y_i)=ax_i+\pi_0 G_i+\pi_1 G_i x_i
since E(\epsilon)=0 and
V(y_i)=V((\lambda_0+\lambda_1 x_i)\epsilon)=(\lambda_0+\lambda_1 x_i)^2
since \epsilon has unit variance.

We implement in stan (via rstan) as:

stanmodelcode <- "
data {
int<lower=0> N;
real y[N];
real g[N];
real x[N];
}
parameters {
real a;
real pi0;
real pi1;
real lambda0;
real lambda1;
}
transformed parameters {
vector[N] mu;
vector[N] sigma;
for (i in 1:N) mu[i]=a*x[i]+pi0*g[i]+pi1*x[i]*g[i];
for (i in 1:N) sigma[i]=(lambda0+lambda1*x[i]);
}
model {
for (i in 1:N) y[i] ~ normal(mu[i],sigma[i]);
a ~ normal(0,10);
pi0 ~ normal(0,10);
pi1 ~ normal(0,10);
lambda0 ~ normal(0,10);
lambda1 ~ normal(0,10);
}
"


We take G to be standard normal. The issue has to do with the distribution of x. If x is drawn from a uniform distribution, all is well (focusing on recovery of \pi_1 and \lambda_1 parameters). An illustration:

library(rstan)
sm<-stan_model(model_code=stanmodelcode,model_name='het_weirdness')
simdata<-function(x,b0=.5,b1=.2,h=sqrt(.6),a=.5) {
N<-length(x)
e<-sqrt(1-h^2)
G<-rnorm(N,0,1)
eps<-rnorm(N,0,1)
ystar<-h*G+e*eps
y<-a*x+(b0+b1*x)*ystar
df<-data.frame(x=x,y=y,g=G)
list(pi1=b1*h,lambda1=b1*e,df=df)
}

set.seed(8675309)
##
x.unif<-runif(2500,-1.5,1.5)
L<-simdata(x.unif)
pi1<-L$pi1 lambda1<-L$lambda1
df<-L$df dat<-list(N=nrow(df),y=df$y,x=df$x,g=df$g)
fit.u<-sampling(sm,data=dat,pars=c('mu','sigma'),include=FALSE,chains=2,cores=2)
par(mfrow=c(1,2))
hist(extract(fit.u,'pi1')[],xlim=c(0,.2))
abline(v=pi1,col='red',lwd=2)
hist(extract(fit.u,'lambda1')[],xlim=c(0,.2))
abline(v=lambda1,col='red',lwd=2)


So far, so good [note: there are some warnings that I need to resolve and I’m open to suggestions there, but estimation seems to converge to truth]. Now, the trouble: if x is instead sampled from the standard normal, estimation goes awry.

x.norm<-rnorm(2500)
L<-simdata(x.norm)
pi1<-L$pi1 lambda1<-L$lambda1
df<-L$df dat<-list(N=nrow(df),y=df$y,x=df$x,g=df$g)
fit.n<-sampling(sm,data=dat,pars=c('mu','sigma'),include=FALSE,chains=2,cores=2)
par(mfrow=c(1,2))
hist(extract(fit.n,'pi1')[],xlim=c(0,.2))
abline(v=pi1,col='red',lwd=2)
hist(extract(fit.n,'lambda1')[],xlim=c(0,.2))
abline(v=lambda1,col='red',lwd=2)


Note that the estimate of \lambda_1 is especially bad (the posterior is well below the true value).

We’re trying to figure out how to resolve this. It didn’t seem a priori like estimation should be so sensitive to the distribution of x. Yet, here we are. Frequentist MLE has the same problem. A collaborator has experimented with Mplus (wherein \epsilon is treated as a latent variable) and tells us that estimation via that software does not have this problem. Our preference is to not go that route; it seems like this is something we should be able to resolve via some tweak to the above. Any help greatly appreciated!!!

Ben

1 Like

I looked at this a bit and I do not have a complete explanation but I hope I can make some useful observations. I am a novice with Stan, so treat my input accordingly.

1. The model does not have any limit on the value of sigma, it is simply calculated from other quantities
sigma[i]=(lambda0+lambda1*x[i]);


and lambda0, lambda1 and x can all be negative. The lack of a limit on sigma shows up in the initialization when negative values are tried in y[i] ~ normal(mu[i],sigma[i]); and an error is thrown.

1. The uniform distribution that you tried does return the correct value of lambda1 but expanding its limits to [-4, 4] leads to a low estimate of lambda1 similar to using a standard normal for x.
x.unif<-runif(2500,-4,4)
#... other code ...
summary(fit.u)$summary[, "mean"] a pi0 pi1 lambda0 lambda1 lp__ 5.017469e-01 3.826576e-01 1.496183e-01 3.444115e-01 6.381666e-02 1.722301e+03  1. Using a narrower normal for x than was originally tried similarly leads to a more reasonable estimate of lambda1. x.norm<-rnorm(2500, 0, 0.3) #...other code ... summary(fit.n)$summary[, "mean"]
a          pi0          pi1      lambda0      lambda1         lp__
0.5099905    0.3868222    0.1427578    0.3152581    0.1104819 1649.6851628


Stan does seem to have more trouble fitting with x derived from a standard normal than from runif(2500, -4, 4), having hundreds of divergences with adapt_delta = 0.8.

Musings of a novice: I am surprised the model even runs when x can be -4, lambda1 should be about 0.12 and lambda0 is 0.3 so that sigma is negative. I guess rather than throwing an error, the fit process learns to stay away from regions with impossible values and keeps lambda1 small.

Later addition: lambda1 drops even more if x is drawn from runif(2500, -6, 4)

           a          pi0          pi1      lambda0      lambda1         lp__
0.5016476    0.3818108    0.1506006    0.4067132    0.0289117 1259.5690834

2 Likes

Here’s some simulated data with x drawn from standard normal distribution. The lines show mean ± sd for the generating process. The noteworthy feature is at x=-\frac{b_0}{b_1}=-2.5 where the variance vanishes and the model prediction is deterministic.

Typical posterior draw given by Stan looks like this The zero-variance point is always estimated to the left of all the data points. The model cannot deal with it being inside the range of x.

I doubt your real data has any zero-variance region so I think you’ll have to come up with a model that doesn’t predict such nonsense.
The minimal change I’d make is to add a small amount of irreducible noise to all data points. Normal noise adds in quadrature so it’s

parameters {
...
real<lower=0> lambda2;
}
model {
...
for (i in 1:N)
sigma[i] = hypot(lambda0+lambda1*x[i], lambda2);
...
}


where hypot(a,b) is just shorthand for sqrt(a^2+b^2). This model is still multimodal because the likelihood is invariant under the change of sign \left(\lambda_0,\lambda_1\right)\leftrightarrow\left(-\lambda_0,-\lambda_1\right). A lower=0 constraint on either lambda0 or lambda1 should fix that.

5 Likes

Premise: I did not read in full the post