Thanks for sticking with me. So Row 1 of 18000 has 82,868 values. What do those values represent? I was expecting to get 18,000 predicted y variables. Is that not the case?
Edit:
Is the value for each column a predicted value from that sample taken? So each 18,00 is the samples of each variable and the value/s I’m seeing are the predicted ‘y’ for that observation in the test data?
Think of it this way. You have some sort of functional relationship, \mathbf{y} = f(\mathbf{X}, \boldsymbol{\theta}). \mathbf{X} is your matrix of input points (e.g. x_test
in your Python script), with row i
representing the i^{\rm th} vector of inputs (e.g. BIlmt_model2_new[i]
, multi_policy_count_model_new[i]
, etc.). \boldsymbol{\theta} is a vector of model parameters (e.g. BIlmt_coeff
, unit_value_model2_coeff
, etc.)
For any single value of \boldsymbol{\theta}, inputting the 82868\times16 matrix \mathbf{X} into f will get you a vector \mathbf{Y} of length 82868. But you’ve sampled 18000 random values of \boldsymbol{\theta}, so now you have 18000 predicted values of the vector \mathbf{Y}.
Got it. Thank you @jjramsey. I’m going to think through how I want to infer from this and start another thread if needed. Thanks so much!