Think of the hidden layers as forming something like a mixed-membership mixture or factor model that’s sprinkled over the units. With this many degrees of freedom, you get lots of different ways to carve up the coefficients without even considering the label-switching problem.