Model inputs as parameters of a distribution

I have a problem, where the inputs to the Bayesian model are the parameters of the distributions, so they need to be sampled first. I know that it is confusing, so I am explaining this by making a synthesized example of regression.

For the typical regression, where X and y are the predictors and the predicted, we compute the coefficient w and the bias b as follows:

y ~ normal( X*w+b,sigma)

Now let’s assume that the response variable y is not directly observed, but we know that it follows a distribution with specified parameters. As a simple case, assume that we know that each y_i follows a uniform distribution with two parameters a and b.
So, what is required here (as far as I could grasp) is to first sample from a uniform distribution (in each step of MCMC) and then apply the above sampling.

Does anyone know how it is possible with STAN?
Any hint is highly appreciated!

This sounds like a problem that is only possible with the use of rng functions in the transformed parameters block, which is not possible at this time. I’ve seen this issue pop up a few times in other discussions.

Regardless, it might be helpful to see some code that generates the data you have in mind. Maybe it will reveal some alternative solution that doesn’t rely on rng functions.

Thanks @simonbrauer! I had checked the rng function in the transformed parameter, but did not know it is not possible!

Haven’t been able to find a solution so far!

This is similar to Bayesian measurement error models (see the section in the user’s manual: 6.1 Bayesian measurement error model | Stan User’s Guide). In this case the error is not Gaussian but uniform, but the strategy is the same: define y_true as a parameter in the parameters block, and assume that it is distributed such that y_{\text{true}} \sim \text{U}[y_{\text{obs}} - a, y_{\text{obs}} + b] (or whatever your measurement error model happens to be).

3 Likes

Just to point out that @hhau’s solution (which is good), will result in the regression itself informing the true values of y (the posterior margins for y_true won’t be uniform). If this is undesirable (if for some reason you really need to uniformly sample y_true and then fit the model), one compute-intensive possibility is to do the resampling of y_true many times, fit the model many times, and concatenate the resulting posterior samples.

2 Likes

Thanks, @hhau for the response. That seems like a good solution, but not sure if it precisely models what I need. I want y_true to follow a uniform distribution with parameters a and b, but the statement y_true ∼ U[y_obs-a, y_obs+b] will not do the job (or will it?). Would be grateful if you could clarify more!

@jsocolar Thanks! that is indeed my case to have the uniform distribution. For do, do you mean I need to execute the model multiple times?

The key question here is exactly what you mean when you say

Do you mean that you want the posterior margins for y_true to be uniform? Or do you mean that you want the model to start from uniform uncertainty in y_true but then to learn something about the true values so that you get posterior margins for y_true that are not necessarily uniform.

If what you need is the former, then you can, as you say, execute (i.e. fit) the model multiple times, each to a different dataset of y_true sampled from the desired uniform distributions. You will need to do this many multiple times, but you don’t need to obtain a huge number of posterior iterations from each fit. The main cost will be warmup, which needs to be repeated for each fit (but if the posterior is stable enough as you re-sample the y_true’s you might be able to re-use previous warmup and avoid most of this cost). An additional cost, depending on how dangerously you want to live, is the need to get sufficient posterior samples out of each fit to reliably evaluate convergence diagnostics.

@jsocolar Thanks a lot for the clarification. It is quite clear now.

Do you also think that by y_true ∼ U[y_obs-a, y_obs+b], y_true is between a and b?

Do you actually observe anything like y? Or just that it is in some interval [a, b]? Is this interval unique per observation? This might also be interval censored data (which is similar to the rounding example in the user’s guide). Any more info on the actual problem would be helpful.

2 Likes

@hhau We only know that it follows a distribution with some specified parameters. The distribution and the parameters are extracted from users for each observation, so we do actually not have a specific y.

That being said, for the Uniform dist. case, we only know that y is in an interval [a,b].

It is different from the rounding example in the user guide, I believe. There, we assume that each observation is not precise and has some deviation centered at the specified value. But here, we say that the predicted value follows a distribution:

y_i ~ U[a,b]

At the same time, for regression, we must have:

y_i ~ Normal(x_i*w + b,sigma)

So, I think it is distinct from both the measurement errors and the rounding example.

Can you provide any more detail about the data-generating process here? I’m a bit confused about how the interval data is actually generated, and how it relates to the standard Y \sim \text{N}(X\beta, \sigma^2) regression part (do you actually have a Y?). What precisely is the process that relates a particular value of the covariates to the interval values a and b?

1 Like

Thanks, @hhau for taking a lot of time!

As I said, I simplified my problem so that it could be understood. But to give more perspective, there is indeed no Y, but the bounds of the uniform distribution are given. You can imagine that the intervals are given by the users for different items, and the X would include the features of the items, i.e., predictors.

Is this restatement of what you’ve just said accurate?

We have “data points” that consist of nothing other than a lower and upper bound, and the knowledge that the true value is somewhere between the bounds. And we want to regress the true values on a predictor.

I agree with @hhau that more details about the data and how they were obtained might be helpful, but I think that an answer of sorts is already possible.

What I would do in this situation is to declare a vector parameter y_true with lower and upper bounds given by the vectors a and b. And then write the regression model to perch atop the y_true parameter. This will not give you uniform posterior margins for the elements of y_true; the regression itself will inform where in the interval [a_i, b_i] the value of y_{true_i} is likely to be. But it seems to capture the problem as stated.

If you need something else (for example if you want the posterior margins for y_true to be uniform), then a different approach is required, but more details about your actual problem, your actual data, and your actual reason for wanting uniform margins might let us come up with more creative suggestions than brute forcing the problem with multiple repeated model fits.

2 Likes

@jsocolar Thanks a lot. I guess I’d better explain the original problem because otherwise, the current one might sound irrational.

Here is the original problem: I ask some users to express their opinions on some items. For the sake of simplicity, let’s assume that they express their opinions by integers, where sometimes a higher number means a higher preference, and sometimes the other way around (both ways can be asked from one user for a set of items). The goal is to obtain a weight for the items and for each user as well as an aggregated weight representing the preferences of all the users combined. The weights need generally to be non-negative and have a unit-sum constraint.

For doing this in Bayesian statistics, I used the multinomial distribution for the users’ inputs and the Dirichlet distribution for the weights. Assume that users will give us two vectors of integers, the magnitude of elements in one is proportionate to the weight and is proportionate to the inverse weight for the other one. So, the STAN model would be is as follows:

data { 
       int<lower=2> cNo; //#items
       int<lower=1> usrNo; //#uesrs
       int AB[usrNo, cNo]; //values proportionate to the inverse weight
       int AW[usrNo, cNo]; // values proportionate to the weight
       vector<lower=0,upper=1>[cNo] e; // a vector of one
    } 

    parameters { 
       simplex[cNo] W[usrNo]; //the weight vectors of all users (one for each)
       simplex[cNo] wStar;// aggregated weight
       real<lower=0> kappaStar; // a parameter used for aggregation modeling
    } 

 
    model {

      kappaStar ~ gamma(.01,.01);
      wStar ~ dirichlet(0.01*e);

      for (i in 1:usrNo){
         W[i] ~ dirichlet(kappaStar*wStar);
         AW[i,:] ~ multinomial(W[i]);

         vector[cNo] wInv;

         wInv = e ./ W[i];
         wInv ./= sum(wInv);
         AB[i,:] ~ multinomial(wInv);        
      }
   } 

Now, the problem that I have is that sometimes users would not give concrete (integer) numbers, but rather intervals, discrete probabilities (e.g., 4 with a likelihood of 70% and 5 with 30%), or even other sorts of uncertainties, such as the triangular distribution (note that not only integers are acceptable as the inputs so that the multinomial dist in the above model should be changed to encompass other types of inputs).

Now, I would like to take these sorts of inputs into account, but don’t know how. It seems simple at the beginning since it is only required to sample from the given intervals (or other specified inputs) first and then execute the rest of model sampling. However, it is more difficult to implement.

Thanks again @jsocolar, @hhau and I hope the above explanation is clear now.

1 Like

That would sound quite like an ordinal model (see e.g. Paul Burkner’s tutorial for some background) - wouldn’t that be a more natural approach than the multinomial?

If all the answers were just a single number, it sounds like you have some population mean of true preferences and then a varying intercept per person. For the negative answers, it IMHO seems the easiest to me to just recode them to positive (reverse the ordering) and add an additional model term that handles potential differences between the two forms of model answers.

I think you are right - this is distinct. And I kind of struggle to understand what it implies. I think a useful exercise (that will pay off in development of the model) would be for you to write Python code that simulates data as you expect it to arise. I admit I have trouble thinking of a way to simulate the data you describe coherently, so maybe there is something that needs to be better understood/clarified about the data generating process you have in mind.

The only way I can currently make sense of it is to treat the provided distributions as likelihood, i.e. the data provide us with some function y_dist(preference) that represent the (log)likelihood for the true preference of the person.

So you would use some model parameters to construct the person’s preference (e.g. as population mean + varying intercept) and then have target += y_dist_i(preference_i)

Does that make sense?