Nested model with uneven group membership

I am trying to fit a simple linear regression where the response has a contribution from a substance under test and the laboratory it was tested in e.g. y~(substance + lab[substance])*x
(where x and y are vectors of inputs and outcomes).

Using the different methods from pages 230-232 of the manual I can fit y for:

  1. All substances (but labs ignored)
  2. A single substance all labs.

When I try to combine these models I have a problem because for each substance a different subset of labs carried out the tests (dataStructure.csv). (283 Bytes) There are 5 substances, 12 labs and each lab tested 1-3 materials (5-8 data points per lab/substance combination).

I am trying to get an estimate of the ‘true’ material contribution as well as the deviation due to lab. The aim is to use the ‘true’ value for a substance as the starting point of estimation for an untested but related substance. The lab component is material dependant.

It seems like it should be a simple but it doesn’t have the balanced structure seen in the worked STAN examples I could find. Can anyone suggest a suitable approach? I have 1000 substances in total and they are expected to have related behaviour.

I am not really sure what your problem is, but I think it might be easier for you to not work with Stan directly but use the brms frontend which AFAIK let’s you express exactly the model you need with a formula syntax so that you avoid the hassle of writing your own Stan code.

If you for some reason need to directly work with Stan, spars/long form data structures should solve the problem - if you are having trouble with those, you should post your model code and explain where precisely are you stuck.

It’s much easier to help if you provide your model code. As is, it’s hard to tell how you tried to code the model.

At the time I posted I didn’t have a model beyond what was in the manual (I just couldn’t see how to express the problem). In the intervening time I came across:

https://rpubs.com/kaz_yos/stan-multi-2

Which basically described the model I was trying to fit (I didn’t know that was what the problem was called) and was straight forward to implement.