R Package dirReg (beta): an attempt to use stan for improving "softmax regression" inference. Please test if you wish for feedback

Hello,

I was approaching the problem of inferring regression for a simplex.

After a week of work I came out with a package that works possibly better than public R pachages such as DirichletReg.

Here what it does better (fully explained on Github):

https://github.com/stemangiola/dirReg

dirReg

This is an R package that infers trend of change in proportions with the condition that they have to sum to 1. The Challenging part is represented by this example:

The problem
There are 4 species of birds, and we count their proportions sampling a picture of the sky.

The totality of the birds at time 0 is:

black brids: 350
red birds: 350
blue birds: 200
green birds: 100

At time 1, the black bird population has increased a lot while all others did not change at all:

black brids: 2850
red birds: 350
blue birds: 200
green birds: 100

Because under some conditions we cannot know the absolute number of total birds, we necessarily have to think in terms of proportions. For example we would get this picture.

We can have the (statistical) impression see that all four specie populations changed in size, and that all changes can be significant; while the reality is that three population are not decreasing independently, but just as result of the normalization to 1.

The following probabilistic model tries to address this challenge. Understanding what is an independent change and what is not.

R code example

if (!require(devtools)) {
  install.packages("devtools")
  library(devtools)
}
options(buildtools.check = function(action) TRUE)
install_github("stemangiola/dirReg", args = "--preclean", build_vignettes = FALSE)
set.seed(123)
library("dirReg")

n_samples = 300
my_design = model.matrix(~cov , data = data.frame(cov=runif(n_samples)))

obj = simulate_proportions(
30 ,                        ## noise of the data
c(0.35, 0.35, 0.2, 0.1 ) ,  ## proportions at the intercept
c(2, 0, 0, 0),              ## rate of change (positive just for population 1)
my_design,                  ## design matrix
n_samples = n_samples       ## number of samples
)

fit = dirReg(obj$prop, my_design)

------------------------------------------------------------------
Beta-Coefficients for variable no. 1:
Estimate Std. Error z value Pr(>|z|) 
(Intercept)          0.36    0   -29.94 0 ***
cov  0.53    0  -29.12  0 ***
------------------------------------------------------------------
Beta-Coefficients for variable no. 2:
Estimate Std. Error z value Pr(>|z|) 
(Intercept)          0.34    0   -26.05 0 ***
cov  0.16    0  -0.6  0.55 
------------------------------------------------------------------
Beta-Coefficients for variable no. 3:
Estimate Std. Error z value Pr(>|z|) 
(Intercept)          0.2    0   -5.55 1.21e-13 ***
cov  0.16    0  -0.01  0.99 
------------------------------------------------------------------
Beta-Coefficients for variable no. 4:
Estimate Std. Error z value Pr(>|z|) 
(Intercept)          0.11    0   7.47 0 ***
cov  0.15    0  0.37  0.71 
------------------------------------------------------------------

Significance codes: 0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1
Number of Observations: 300
Link: linear-softmax

This model has successfully identified which group had trends in proportions that were independend (variable no. 1) and which were dependent from the softmax (variable no. 2, 3, 4).

This is different from the package DirichletReg, where all changes of the four groups are seen to be significant.

if (!require(DirichletReg)) {
	install.packages("DirichletReg")
	library(DirichletReg)
}

dd = data.frame(obj$prop, cov = my_design[,2])
AL <- DR_data(dd[, 1:4])
summary(DirichReg(AL ~ cov, dd))
------------------------------------------------------------------
Beta-Coefficients for variable no. 1: X1
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.50592    0.09509  26.353  < 2e-16 ***
cov          0.88813    0.16188   5.486  4.1e-08 ***
------------------------------------------------------------------
Beta-Coefficients for variable no. 2: X2
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.50290    0.09643  25.954  < 2e-16 ***
cov         -0.93510    0.16617  -5.627 1.83e-08 ***
------------------------------------------------------------------
Beta-Coefficients for variable no. 3: X3
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.86560    0.09455  19.730  < 2e-16 ***
cov         -0.84977    0.16008  -5.308 1.11e-07 ***
------------------------------------------------------------------
Beta-Coefficients for variable no. 4: X4
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.17471    0.09963  11.791  < 2e-16 ***
cov         -0.59494    0.17108  -3.477 0.000506 ***
------------------------------------------------------------------
Significance codes: 0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Log-likelihood: 1425 on 8 df (98 BFGS + 1 NR Iterations)
AIC: -2835, BIC: -2805
Number of Observations: 300
Link: Log
Parametrization: common
1 Like

Before this goes onto CRAN, there is an oversight in the rstan_package.skeleton function and rstanarm build process in that it only works with GNUmake. It is possible to insert

SystemRequirements: GNU make

into DESCRIPTION, but I am trying to come up with a better fix.

I have some problem in understanding:

https://github.com/stemangiola/dirReg/blob/master/src/stan_files/dirReg.stan

Precisely

for(s in 1:S) beta_hat_hat[s] = softmax(to_vector(beta_hat[s])) * phi + 1;

I don’t understand the + 1.

Thanks Ben, however, the model is not mature or fully tested yet, I put it out there mainly for feedback. But I will do if it will put it on CRAN eventually.

The β€œ+ 1” make sure that the prior for dirichlet is bigger than one, with the assuption that the dirichlet distribution is unimodal. This is an assumption of my model, but I think could have general validity.

What do you think?

By the way, if I allow the dirichlet to be bimodal the convergence is more difficult.

I what way does the +1 makes the Dirichlet distribution unimodal?

Say we have a softmax, eg.

p ← c(0.2, 0.15, 0.65)

any know we choose phi to be 4. Then we get:

(p * 4+1)/sum(p * 4+1)
[1] 0.2571429 0.2285714 0.5142857

A functional dependency between phi and p (or phi and the softmax).
I don’t know if Stan can handle this well. Can somebody comment?

Thanks for your comment

Unless I am missing something, the * phi + 1 is outside the soft max.

Given that parameters in soft max can go from 0 to 1, + 1 should make sure that the result is > 1.

For example

> p <- c(0.2, 0.15, 0.65)
> (p/sum(p)) * 4 + 1

[1] 1.8 1.6 3.6

I hope it makes sense

I understand, that the result > 1 and I think the softmax defining the porpotions and
phi the scaling.

c> p<-c(1.8,1.6,3.6)

p/sum( p )
[1] 0.2571429 0.2285714 0.5142857

Thus this is your porportions, different from softmax. And if phi (the 4) changes the porportions gets different again:

p ← c(0.2, 0.15, 0.65)
x<-(p/sum( p )) * 6 + 1
x/sum(x)
[1] 0.2444444 0.2111111 0.5444444

Btw sum( p ) == 1

But please somebody else may comment.

1 Like

Good to know, might be relevant for me. Is there some issue # so that I can follow this up?

It doesn’t affect existing packages, only new ones that are generated with rstan_package.skeleton in rstantools 2.17.x and only for last week and next.

Thanks Guys,

few questions:

  1. so after next week I can eliminate the β€œSystemRequirements: GNU make” requirement?
  2. what would be the most appropriate name for this model?
  • Dirichlet regression
  • Simplex regression (I would prefer)
  • Softmax regression

Thanks a lot.

Ideally, we can make the build process not depend on GNU make. But we still haven’t figured it out completely and you will probably have to run rstan_package.skeleton again.