Could exploitation based adaptive sampling be performed before using Projection Predictive variable selection?

Hello, I have been having this question in mind for so long now, and couldn’t manage to find a clear answer.
If I am going to use projection predictive as a sensitivity analysis tool since I am interested in finding out the most effective parameters, but I also want to do this with the least possible number of data samples because it is very expensive to generate just one sample. This means I need to sample in an adaptive way. If I used exploitation (with also exploration) based adaptive sampling, would that be an implicit bias I am introducing too early to the reference model, which might affect its reliability, and so the variable selection ? In another words, does Projection predictive require a sufficient randomly simulated data?

My first thoughts were to use it to help in reducing the dimension of the problem while doing the adaptive sampling, but it could be costly to do this in each iteration, and I am not sure if this makes sense, since as I understood from this paper, the reference model should be itself good in the first place, to use it for variable selection after.

This one’s for @avehtari, but I can help prime the pump with some questions.

Do you mean parameter samples here? If p(\theta \mid y) is your posterior with parameters \theta and data y, we usually treat y as fixed and then sample \theta^{(n)} \sim p(\theta \mid y).

Is there an algorithm you have in mind here?

Presumably you need draws that follow the posterior because this is based on expected log predictive density calculations, which require samples from the posterior.

Thanks a lot for your reply.

No, I actually mean data samples, so I mean adaptive sampling to generate each observation y_i to construct the dataset itself, and I want to use the reference model, sequentially, to help me decide what is the next input x_{i+1} to the generative process, say a physical model, to get the observation y_{i+1}.

yes, basically any algorithm that exploits the predictions of the model and selects the next input from a set of random inputs that maximise an objective function.

One simple possibility, the method applied here

which selects the next optimal input x_{m+1} from a set of random candidates X based on this objective function.

x_{m+1} = \underset{x_* \in \mathop{X}}{\arg\max}{ \sum_{i=1}^{m} e_{Loocv}(x_i) (\exp(-\alpha ||x_i-x_* ||))}
where m is the size of collected observations so far. and e_{Loocv}(x_i) is a leave-one-out cross validation error, for all the data points in the current dataset, the exploitation part. The second term of the equation represent an exploration part, based on distance, to select input far from the collected data at the current step.

My first idea was to do the variable selection first, then use the reference model to do adaptive sampling on the reduced inputs space, for a more accurate surrogate model. However, my concern is I also need to collect data samples to train the reference model, so I think I need to start the adaptive sampling while creating the reference model, but doing adaptive sampling on a high dimensional inputs space, might take more iterations than the case when it is done on the reduced features, because I may waste expensive runs of the generative model on non-sensitive parameters. It seems like an infinite loop for me, I need adaptive sampling to do sensitivity analysis, and I need sensitivity analysis to do adaptive sampling.

I was thinking of starting the adaptive sampling with a pure exploration , until I get some confidence in the reference model, even if it is not super accurate yet in terms of predictions, but it can detect the important features. I am wondering if there could be some convergence criterion in projpred for the important features, so I know that adding more samples will not change the selected variables, with probably their order, even though it will result in a more accurate reference model.

I’d call that “active learning” given my ML background. There’s no way to do that built into Stan—it requires building scaffolding around it.

This is usually not a problem if you can do the two iteratively so they kind of leapfrog hand over hand up to the solution. This is how, for example, expectation maximization (EM) works. It’s also how our adaptation works—we need a good mass matrix and step size to sample, but we need to sample in order to estimate a good mass matrix and step size.

As far as I know, there’s nothing built into Stan to help with this. In these sequential cases where eacy pdate is similar to the last, people like to use sequential Monte Carlo (SMC). I don’t have any experience with those methods myself.

1 Like

That´s absolutely true, I usually use these terms interchangeably but I forget that sampling has a different meaning in the statistics world.

I think I will start by trying to do the nested loop and see if both problems can accelerate each others` convergence. Thank you very much for your insights!