Nested model with uneven group membership



I am trying to fit a simple linear regression where the response has a contribution from a substance under test and the laboratory it was tested in e.g. y~(substance + lab[substance])*x
(where x and y are vectors of inputs and outcomes).

Using the different methods from pages 230-232 of the manual I can fit y for:

  1. All substances (but labs ignored)
  2. A single substance all labs.

When I try to combine these models I have a problem because for each substance a different subset of labs carried out the tests (dataStructure.csv). (283 Bytes) There are 5 substances, 12 labs and each lab tested 1-3 materials (5-8 data points per lab/substance combination).

I am trying to get an estimate of the ‘true’ material contribution as well as the deviation due to lab. The aim is to use the ‘true’ value for a substance as the starting point of estimation for an untested but related substance. The lab component is material dependant.

It seems like it should be a simple but it doesn’t have the balanced structure seen in the worked STAN examples I could find. Can anyone suggest a suitable approach? I have 1000 substances in total and they are expected to have related behaviour.


I am not really sure what your problem is, but I think it might be easier for you to not work with Stan directly but use the brms frontend which AFAIK let’s you express exactly the model you need with a formula syntax so that you avoid the hassle of writing your own Stan code.

If you for some reason need to directly work with Stan, spars/long form data structures should solve the problem - if you are having trouble with those, you should post your model code and explain where precisely are you stuck.