I’d like to be able to run the below error-in-variables regression program using data with a 500k x 200 ish design matrix (genetic association studies require humongous samples because individual polymorphisms have tiny, tiny effects; also there are lots of relevant confounds to control for).

Running the model on a subset of 70k observations takes just under a week using 24 cores (Xeon E7-4830 v3 @ 2.10GHz) and a terabyte of RAM, with 24 chains and 2000 iterations per chain. I could quadruple these numbers (hardware-wise) if necessary, but I have a few questions before I get too invested!

#### 1. Is it a terrible idea to try to use Stan at this scale? Is anyone else successfully running models of this size?

#### 2. If it’s not a terrible idea, are their any suggestions for improving performance?

- the first thing that comes to mind is using a QR parameterization rather than the below perhaps?
- also curious about optimal # iterations : #chains

Thanks.

#### Stan program

```
data {
int<lower=1> J;
int<lower=0> P; // number of covariates
vector[J] y;
int<lower=1,upper=3> w[J];
simplex[3] pVec;
simplex[3] piMat[3];
row_vector[P] z[J]; // covariates
}
parameters {
real<lower=0> sigma_y; // variance of Y
real alpha; // intercept
real beta; // slope
vector[P] bCov;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
sigma_y ~ normal(0,10); // truncated to [0,infty)
for (p in 1:P) {
bCov[p] ~ normal(0,10);
}
for (j in 1:J) { // for each observation
real lp[3]; // will contain marginal probabilities
for (k in 1:3) { // support of x
lp[k] = categorical_lpmf(k | pVec)
+ categorical_lpmf(w[j] | piMat[k])
+ normal_lpdf(y[j] | alpha + beta * k + z[j]*bCov, sigma_y);
}
target += log_sum_exp(lp);
}
}
```