# Decision trees in Stan

Dear Staners,

I am wondering how could a simple 5 Bernoulli random variable “data” be modelled by fitting a “mixed decision tree” model in Stan.

I am wondering if the traditional decision tree method is “just” simply a max likelihood approximation to some simple Stan model. Something that can be described in Stan, in a “not too complicated way”.

What needs to be described is a distribution on the tree structures, so it is a mixture model of decision trees, it is not clear to me how easy it is to do this in practice with Stan, partially due to the “mixture model chain convergence problem” because the random variable that indexes the tree structure is discrete, but, this is just a vague intuition.

Any thoughts on this topic ?

Cheers,

Jozsef

This may be of interest. Try implementing the log likelihood of a mixture as a Stan function with some generated data if it’s of interest and post it and I’d be happy to muddle through trying to make it work in Stan with you :-)

Yes, I think we had that as a homework assignment in 2010 or so, given by J. Hollmen : https://people.aalto.fi/jaakko.hollmen#publications . At least the EM approximation.

I was wondering, what would it mean to do a “decision tree” modelling on 3 binary random variables : A, B, C

then the “phase space” (the states, in which the system-to-be-modelled can be) is :

\begin{array}{c|cc} & A & B & C \\ P_1 & 0 & 0 & 0 &\hline \\ P_2 & 1 & 0 & 0 \\ P_3 & 0 & 1 & 0 \\ P_4 & 1 & 1 & 0 \\ P_5 & 0 & 0 & 1 \\ P_6 & 1 & 0 & 1 \\ P_7 & 0 & 1 & 1 \\ P_8 & 1 & 1 & 1 \\ \end{array}

In a decision tree, the result is “deterministic”, hence the probabilities are either zero, or “one”. Or, better said, the conditional probabilities.

So, if C is to be “predicted” by a decision tree then - for example - P(C=1|A=0,B=0) can be either 0 or 1 but nothing else, this is because decision tree-s are “deterministic”, at least the “non-Bayesian” ones.