Repeat values in input variable of regression?

Night_001 · November 19, 2021, 6:11am

Hello everyone,

I have a question regarding fitting a non-linear regression model with Stan but with repeating values in the input variables.

I have an observed variable y, which depends on some predictors, let’s call the variables x1, x2, and x3. The regression model is as follows:


data{
real N;
real k1;
real k2;
vector[N] y;
vector[N] x3;
matrix[N,k1] x1;
matrix[N,k2] x2;
}
parameters{
vector[k1] beta_1;
vector[k2] beta_2;
real sigma;
}
model {
  vector[N] mu;
  vector[N] reg_1;
  vector[N] reg_2;
  reg_1 = x1*beta_1;
  reg_2 = x2*beta_2;
  mu = (exp(reg_1) .* exp(reg_2) .* x3) ./ (rep_vector(1,N) + (exp(reg_1) .* x3) );
  //Some priors here//
  y ~ normal(mu, sigma); 
}

However, there are multiple values of x3 observed for any given value of x1 and x2. The context is that x1 and x2 describe a material, while x3 describes measurement conditions, and the combination of the three can predict an outcome y.

What I am worried about is the error estimation of the parameters. Let’s say I only have 4 distinct values x1 and x2 (i.e., I have 4 materials) but I measured them at 4 different conditions each, so I have 16 values of x3 and y.

My question is, will the repeat values of x1 and x2 cause the posterior draws of the parameter estimates of beta from Stan to be overly confident/narrow? And if so how can I rectify that issue? I was planning on making a hierarchical version of this model after this, so I would like to know if that would help.

I’m sorry for the very naive question. I ask this because my real data has ~3900 observations of y and x3 but only ~70 distinct observations of x1 and x2. It successfully fit the model, but the resulting fit seems to have so little uncertainty in the estimates despite me using only a weakly informative prior. In fact, plotting the posterior draws with mcmc_intervals seems to return a dot. This caused me to be suspicious of the results.

Any help would be very appreciated.

mike-lawrence · December 1, 2021, 2:11pm

Nope! Repeated observations of predictors should-and-does yield appropriately more certainty in the inference on their effects.

yizhang · December 1, 2021, 4:40pm

One way to think this is that for exponential family the log-likelihood would be an affine function of the sum of the individual observation sufficient statistics. Specifically additional observations directly affect the score function, thus change the sensitivity of log-likelihood w.r.t the parameters.

Night_001 · December 1, 2021, 4:46pm

Thank you very much for the answers and explanations! I think I understand it a little bit better now.

Topic		Replies	Views
Repeated Measures Regression Fitting Modeling techniques	2	84	August 16, 2024
Problem with fitting a model Modeling	4	383	April 10, 2021
New User of Stan Model 'many' regressions Modeling	2	444	July 5, 2021
Issues with hierarchical linear regression stan model Modeling rstan , fitting-issues	2	459	January 26, 2021
Time series model fitting issue Modeling fitting-issues	8	507	April 2, 2020

Repeat values in input variable of regression?

Related topics