Bayesian Priors

Hi Stan Experts. I am new to Bayesian and brms. I am trying to run multivariate regression using Bayesian. I am working with 10M records at customer cross week level with sales as dependent variable and other different predictors.

I dont have any informative priors and also I cannot run the entire 10M records on the server, hence I took Q3 of the 10M data and again took a further subset of 10%(say rawdb4). I plan to run bayesian regression model on this rawdb4 using brms library in R, without providing any priors. As output, I would get some Beta and Est. Error for each of the predictors. I plan to use these Beta and Est. Error as Priors for rest of the data. I am facing below challenges:

  • Est. Error from rawdb4 model for all the predictors is zero (hence I am not able to define a distribution for these predictors as priors). What could be the possiible reason?

  • I tried with some random values as priors(for e.g N(x,sd)) and I observed that the smaller the value of standard deviation I pass, more my model results are closer to priors and the opposite if the value is larger. How shall I decide my priors then?

bmod1 <- brm(
           Y ~ Samp + C_95 + CR_85 + LE+ SPD90 + CPA + FTO,
          data = rawdb4, family = gaussian(),
          warmup = 600, iter = 3000, chains = 4,
          control = list(adapt_delta = 0.98), cores=16, seed=150)

prior1 <- c(
prior(normal(10, 100), class = Intercept),
prior(normal(0.07, 100), class = b, coef = Samp),
prior(normal(0.2, 100), class = b, coef = C_95),
prior(normal(0.1, 100), class = b, coef = CR_85),
prior(normal(0.06, 100), class = b, coef = LE),
prior(normal(0, 100), class = b, coef = SPD90),
prior(normal(0.47, 100), class = b, coef = CPA),
prior(normal(0.55, 100), class = b, coef = FTO),
prior(cauchy(10, 10), class = sigma)
)

bmod2 <- brm(
  Y ~ Samp + C_95 + CR_85 + LE+ SPD90 + CPA + FTO,
  data = rawdb3, family = gaussian(), prior = prior1,
  warmup = 600, iter = 3000, chains = 4,
  control = list(adapt_delta = 0.95), seed=150, thin=3,
    cores = parallel::detectCores() 
)

Welcome. To start with I’d go for the simplest model you think is interesting before throwing everything in there. Maybe Y ~ Samp or whatever is most important. Also maybe just a few hundred data points. And plot those first against your predictor.

As for priors I would start with the default priors in brms. You can use get_priors() to see what these are.

Once you have a handle on all that you will need to dive into some domain knowledge. This is what is the expected impact of each predictor on Y. And where does that information come from.

2 Likes

Thanks for the reply Ara. I tried running the model with a small sample, just 300 records, and also tried to increase the number of records to 50k records, however, I am seeing almost identical results in both the cases. How can we explain this? Please help me if I am missing anything here. Thanks

So you ran this model[

quote=“Sham414, post:1, topic:18420”]

bmod1 <- brm(
           Y ~ Samp + C_95 + CR_85 + LE+ SPD90 + CPA + FTO,
          data = rawdb4, family = gaussian(),
          warmup = 600, iter = 3000, chains = 4,
          control = list(adapt_delta = 0.98), cores=16, seed=150)

[/quote]

with 300 records? And it’s still showing zeros?

Can you share the model summary? Also can you plot histograms of Y and it’s predictors?

Hi, apologies for the late reply, I had to stop my work due to some personal urgency. For the reference I am attaching three seperate results below, the first one is for 257 records, second for 25k records and the third for 250k records:

Result-1:

    Estim	     E.Error     	Rhat

Inter 0.7 0.01 1
Samp 2.16264E+11 4.06347E+11 1.76
C_95 -0.14 0.2 1
CR_85 0.04 0.03 1
LE -0.14 0.13 1
SPD90 0.03 0.01 1
CPA 0.3 0.07 1
FTO 0.47 0.09 1

Result-2:

    Estim	    E.Error      	Rhat

Inter 0.7 0 1
Samp 0.05 0 1
C_95 0.29 0.02 1
CR_85 0.12 0 1
LE 0.04 0.01 1
SPD90 0 0 1
CPA 0.31 0.01 1
FTO 0.35 0.01 1

Result-3:

      Estim	     E.Error     Rhat

Inter 0.7 0 1
Samp 0.04 0 1
C_95 0.24 0.01 1
CR_85 0.11 0 1
LE 0.03 0 1
SPD90 0 0 1
CPA 0.3 0 1
FTO 0.36 0 1 1

I am confused about why model is not able to capture more variations in the data, inspite of varying data sizes. Also I would request your advise on how should I approach with the full data, because eventually I will need to incorporate my complete data(10M records) in my analysis. Shall I split the data into 10 parts of 10% records each and run a model on each data leveraging the results from the previous ones as prior? Thanks in advance for your help :)

My suggestion here would be a few things:
Simulated some fake data so you can compare to recovered parameters.
Plot the data.
Simplify the model
Run the model with the simulated data and compare the parameters. This will help you verify that both the real data and the model.

Then start again with the real data. First with plotting, then a simplified model, and finally onto the full model.

Thanks for the suggestion. In the last line, as the final step shall I run with the full data? My data is huge- 10 Million records

Just a subset. There are a number of things you can do to speed things up.

Hi Ara, sorry but I am still not able to understand how will I be able to incorporate the complete data. Shall I run 20 models with 5% data each in series, using results from one as prior for the next?

I would generate a fake data set first that’s fairly small (100’s to 1000’s) so you know the true parameters. Run that to make sure your model is setup correctly.

Then run a small set of your real data (1000’s of records) to see what that looks like.

Does that make sense?

1 Like