I am trying to fit a model for the N data points with 3 covariates and 1 response. This is the basic model used as one of the Guassian in a mixture of two Gaussian with another Gaussian to capture the outliers from the following model:
covariates: x, u, v
response: y
Model 1:
In which I fit for the parameters \{a_0, a_1, a_2, b_0, b_1, b_2, \sigma_0, \sigma_1, \sigma_2\}. I have five separate data sets and the fit looks good for each dataset. The final model I had in mind was to add a level for the variation between datasets to allow a better shrinkage.
Model 2:
People have used simple binning of the data in the sense that they binned in terms of v and then bin the data in terms of u and fit the model y = ax + b + \epsilon and looked at the change in a, b, \sigma as a function of the u, v by fitting the linear model to these estimates after the first fit. The v has a timelike nature and u is related to a property of the environment (there are k types of environments, at each time slice).
As an effort to make a model similar, I am trying to fit the linear model for y(x) but in each of these environments at different times using a random slope and intercept model that follows the linear structure in terms of u, and v as a baseline at each level, one level with a linear term for u and one level with a linear term in v, within a threelevel model (i.e. using binned u and v as grouplevel covariates).
There are 4 types of environments at a given time so if there are 5 epochs, Iâ€™d have 20 level2 parameters where variation in each parameter with a similar environment follows the same distribution which is in addition to the linear baseline in terms of the 4 binned environments. There are also 5 level3 parameters to allow for variation at the level of time in addition to the linear baseline in terms of 5 binned time. This would be a fourlevel model if I include another level for different datasets.
Where \tau_{a,\bar{u_i}} is the same for those with same environments, so there are k=4 such parameters. Also, \bar{u}, \bar{v} are the binned versions of the original u, v. So \bar{u}_i is the median or mean value of the bin where i'th data points falls into. An equation similar to what is written for a(\bar{u}_i, \bar{v}_i) is used for b(\bar{u}_i, \bar{v}_i) and \sigma_{int}(\bar{u}_i, \bar{v}_i).
Question:

Which one of these models is correct and if both/none are correct what would you recommend? Also, are there any benefits in adding these extra levels instead of using the data like in model 1?

Also, should I expect to get similar results from both approaches in terms of the slope for u and v, namely the a_1, a_2, b_1, b_2, \sigma_1, \sigma_2 or they will be washed out because of the extra random variation terms added at different levels?