Does anyone have general pointers for regularisation in a fairly abstract sense? Thoughts or references would be really welcome. Given a vector of observables y ~ MVN ( f(A,x), g(A,x) ) – the mean vector and covariance matrix of y depend in some way on a matrix of parameters A and vector of latent variables x – what are appropriate ways to induce sparsity in A so that certain parameters are not arbitrarily regularized more / less based on other arbitrary aspects of the system, e.g. scaling choices. I was originally thinking that the coefficient Ai should induce a penalty proportional to absolute val of Ai divided by the (non penalised) gradient wrt Ai, but that doesn’t seem to work. Could be messed up implementation but I assume more likely messed up thinking.
Thanks – it looks like a nice overview, but also as far as I can see just for the linear regression context. Perhaps @sara-vanerp came across wilder things in her travels though?
Yes, I have mainly focused on the linear regression context so far. I am actually currently working on using regularisation in structural equation models. I have worked this out a bit for a multiple group factor model in chapter 6 of my PhD thesis, using the regularized horseshoe and spike-and-slab prior. In this context, the penalisation is mainly used to find the least restrictive model that is still identified. The main issue I ran into here is that this requires a lot of fine tuning of the priors, so I am still looking at better/more general approaches myself!
Is this similar to what you would like to do?
Yeah I’m interested in general tuning free approaches I guess. Conceptually, I have two parameters A and B, for each parameter, the entropy of the multivariate residuals ranges from E0 (entropy with parameter fixed to 0) to E1 (entropy with parameter free, not penalised). I’m interested in what it would take to penalise the parameters A and B such that they are penalised equally and arrive at the same point on my E scale, somewhere between 0 and 1. As it rattles around my brain more the more it seems hopeless, but I’d still love to read any conceptual work that elaborated these kind of ideas properly in a general context…
I am not sure I understand exactly what you are trying to do. But if you want to penalise parameters equally, you might want to use something like a group lasso? This can be done by using a scale mixture of multivariate normal priors that takes into account the grouping of the parameters, see this paper for the exact prior.
The point is lasso only works sensibly under assumptions of equal scaling, I guess I’m interested in how the problem should be formulated without that. But I don’t think I’m doing a great job explaining :)
That’s true. I’m afraid I don’t know an automatic way around that… Would be curious to hear if you found a way though!