Thanks a lot for your reply.
No, I actually mean data samples, so I mean adaptive sampling to generate each observation y_i to construct the dataset itself, and I want to use the reference model, sequentially, to help me decide what is the next input x_{i+1} to the generative process, say a physical model, to get the observation y_{i+1}.
yes, basically any algorithm that exploits the predictions of the model and selects the next input from a set of random inputs that maximise an objective function.
One simple possibility, the method applied here
which selects the next optimal input x_{m+1} from a set of random candidates X based on this objective function.
x_{m+1} = \underset{x_* \in \mathop{X}}{\arg\max}{ \sum_{i=1}^{m} e_{Loocv}(x_i) (\exp(-\alpha ||x_i-x_* ||))}
where m is the size of collected observations so far. and e_{Loocv}(x_i) is a leave-one-out cross validation error, for all the data points in the current dataset, the exploitation part. The second term of the equation represent an exploration part, based on distance, to select input far from the collected data at the current step.
My first idea was to do the variable selection first, then use the reference model to do adaptive sampling on the reduced inputs space, for a more accurate surrogate model. However, my concern is I also need to collect data samples to train the reference model, so I think I need to start the adaptive sampling while creating the reference model, but doing adaptive sampling on a high dimensional inputs space, might take more iterations than the case when it is done on the reduced features, because I may waste expensive runs of the generative model on non-sensitive parameters. It seems like an infinite loop for me, I need adaptive sampling to do sensitivity analysis, and I need sensitivity analysis to do adaptive sampling.
I was thinking of starting the adaptive sampling with a pure exploration , until I get some confidence in the reference model, even if it is not super accurate yet in terms of predictions, but it can detect the important features. I am wondering if there could be some convergence criterion in projpred
for the important features, so I know that adding more samples will not change the selected variables, with probably their order, even though it will result in a more accurate reference model.