Heteroskedasticity - i.e. variance specific to each group, j, rather than a single variance for each group-varying parameter?

Hi,

How to have a variance for each group j rather than only for each group-varyng parameter?

Say I have just a single group-level parameter, estimated for each group say j = 1 \dots J. Assume J=10. How to estimate J (i.e. 10) variances then, one for each group? Rather than just a single variance, common to all groups…as default brms() seems to do?

Many thanks!

https://cran.rstudio.com/web/packages/brms/vignettes/brms_distreg.html#a-simple-distributional-model

1 Like

Thanks for this. Is saying sigma ~ group in the brms() code block the same as each group having its own sigma, \sigma_j?

Yes, but note that ~ group will imply dummy coding for group by default as per R base defaults. To estimate separate sigmas directly, use cell-mean coding via ~ 0 + group. Note that estimates of sigma will be on the log-scale by default.

So: I run my nested year:country hierarchical model, but this time adding in:

sigma ~ 0 + year + country.

(Could not add in year:country here since too many groups. Would take 2 weeks to run even on the high powered computer external server I am using).

The regression results after adding in the above almost do not change though; even though I would except them to since now the partial pooling will be done differently I would think? The random effects change very very very mildly (almost imperceptible). While fixed effect no change really except for one coefficient.

The sigma_year1994 \dots sigma_year2017 coefficients are all negative while the sigma_countryZAF...sigma_countryUSA are a mixture of negative and positive. Does it make sense to have a negative coefficient in this instance?? \sigma should be positive but this brms() formulation of this problem I am still trying to understand.

-*-
A bit more info:

The hierarchical model is a non-nested model with groups j = 576 formed by interacting country:year (we have 24 countries and 24 years). The regression is run on around 285,000 data points:

y ~ 1 + FE1 + FE2 + RE3 + (1 + RE1 + RE1 + RE3 | year:country)

I assume that the above formulation for sigma should lead to partial pooling taking place not just within RE1 and RE2 each coefficient, but also within each year and within each country, since now each is given their own \sigma, unless I am mistaken.

Many thanks,

Remember that the estimates of \sigma are on the log scale, so negatives imply a standard deviation smaller than 1. E.g. the standard deviation for USA in 1994 might be something like \sigma_\texttt{USA,1994}=\exp(s_\texttt{USA} + s_\texttt{1994})=\exp(0.2-0.4)=\exp(-0.2)\approx 0.82.

Ps: just out of the blue… Are you working with FDI data?

Thank you for that helpful reminder! In this study we are working with publicly listed firm-level balance sheet and cashflow statement data: capital expenditure, profits, capital stock, market valuation etc. But in general I do work with FDI data (and have some expertise in that field actually). My two cents on official FDI data is that its mostly noise on noise mixed with transfer pricing.