Hey Ben,
thanks for the tip. In the meantime i found this thread: Mixed Logit Model. The models discussed there have basically the same structure but indeed utilize the internal softmax function and I adapted the model provided by @James_Savage to my case. So far it doesn’t seem an awful lot faster, but hopefully the internal softmax variant will result in less divergence issues. The bottleneck regarding speed is probably just in the amount of data, given that there are more than 300k ‘observations’; It seems I just have to find more computing power and/or be patient.