Instead of restricting the first weight to be positive, it seems to work better if you leave the weight unrestricted in the parameter block, then “fix” the signs in generated quantities. See here:
Or, if you fixed the first weight to 1, the SD of the latent distribution would then typically be free.