I have rewritten my question to be more concrete. The goal is to model simple “factor exposures” in a way that accounts for systematic measurement error, both in x and y. At a high level, the market return is an average of returns of its constituents. A general aim in finance is to find the exposure of some factor to the market’s moves. That is, what is the beta such that factor = beta_factor * mkt + noise. I want to introduce the complexity that we know that factor and mkt are mismeasured during certain periods of time, and I want to account for that mismeasurement.
I think I have the right idea for a Bayesian model. Suppose the market is partitioned into different quality instruments, so that each factor is indexed by q. We observe:
matrix[Q, T] X; // returns across time, grouped by quality. There are Q qualities and T times.
vector[T] mkt; // market returns
vector[T] u_t; // vector encapsulating the scale of the error thought to be present in the market. Near zero
// for most times, around 1 or 2 for the financial crisis and similar periods.
matrix[Q, T] wts; // a vector of weights allowing one to recover mkt from X.
Define a function new_mkt(X, wts) which computes the market return given the factor returns, across times. The weights contain information about how each factor should be weighted (by size).
The informal model is:
X_true[q, 1:T] ~ N(0, sigma_1);
X[q, 1:T] ~ N(X_true, .000001 * g(u_t)); // g is some linear transformation of u_t. For now take it to be
// the constant function g = 1. I am unable to make even this
// work yet.
mkt_true = new_mkt(X_true, wts);
X_true[q, 1:T] ~ normal( beta[q] * mkt_true, sigma);
This seems to achieve what I want in that it properly relates the uncertainty in each factor to the uncertainty of the true market return, and ultimately defines the sought-after beta to be the relationship between these uncertain factors. Note that for now I am choosing to have a model for a very, very tiny measurement error. If I can get this model to work I then want to let g(u_t) allow the measurement error to be big only for a few select times.
The formal model I am running is the following:
functions {
vector new_mkt(matrix quals, matrix wts){
vector[cols(quals)] nmkt;
matrix [rows(quals), cols(quals)] mod_quals;
mod_quals = quals .* wts;
for (i in 1:cols(mod_quals)){
nmkt[i] = mean(mod_quals[1:rows(mod_quals),i]);
}
return nmkt;
}
}
data {
int<lower=1> TT; // number of stacked returns
int<lower=1> Q; // number of qualities
matrix[Q, TT] X; // quality returns
vector[TT] mkt; // market return
vector[TT] s_t; // measure of measurement uncertainty
matrix[Q, TT] wts; // defines exact relationship of mkt as weighted average of X
}
parameters {
vector [Q] beta;
real<lower=0> sigma;
matrix[Q, TT] X_true; // unknown true factor returns
}
transformed parameters {
vector[TT] mkt_true; // unknown true market return, a simple function/transformation of X_true
mkt_true = new_mkt(X_true, wts);
}
model {
beta ~ normal(1,1);
sigma ~ normal(0, .1);
for(i in 1:Q) {
X_true[i, 1:TT] ~ normal(0, 1);
}
for(i in 1:Q) for(t in 1:TT) {
X[i, t] ~ normal(X_true[i, t], .0000000001); // for now, constraining X to be a nearly exact measurement
}
for(i in 1:Q) for(t in 1:TT) {
X[i, t] ~ normal(beta[i] * mkt_true[t], sigma);
}
}
Diagnostics
This model runs, but yields estimates for beta which are wrong (based on simpler models with zero measurement error), with extremely high Rhats for the beta terms and sigma and sigma2. Something is going on. Am I doing something strange or is this a genuinely difficult model for Stan?