How do I best (counter)balance an experiment with for instance 200x4 permutations?

I am trying to create a (roughly) counterbalanced experimental design for a few hundred images that are run through various distortions. Let’s say 4 types of distortions: a,b,c and d.

I want to test a group of observers so that they see each image once and this image should be influenced by one distortion. I want to counterbalance both images and distortions so there is no risk of observers learning a pattern in the distortions. Crucially, I want the combinations of numbers and letters to be equally often presented.

I try to solve a toy example like this:

image <- sample(1:12) # Must be divisible by 4. Will be +100 in my use case
div <- length(a)/4 # How many of each letter
distortion <- sample(c(rep("a",div),rep("b",div),rep("c",div),rep("d",div)))
c <- paste(image, distortion, sep = "")

When I run it, it outputs for example:

"1b"  "8d"  "4c"  "11d" "9a"  "6b"  "3a"  "7c"  "2b"  "10d" "5c"  "12d"

As such, this would be great to show a single observer. My problem that I can’t see a solution to comes when testing multiple observers. Imagine that we want to test 16 people for this toy example. We aim for having about the same number of number + letter combinations. For instance, we want 2b (and 2a,2c,2d) to be present in roughly 25% of cases.

When I run the following code, testing for the proportion of 2b, I get a lot of variance:

shuff <- matrix(NA,16)
for(i in 1:length(shuff)){
  image <- sample(1:12)
  distortion <- sample(c(rep("a",div),rep("b",div),rep("c",div),rep("d",div)))
  c <- paste(image, distortion, sep = "")
  shuff[i] <- ("2b" %in% c)*1}
mean(shuff)

There could both be ~ 6% of the cases where 2b was present or +40%. Is there anything obvious to do to achieve a design that is more balanced?

Thank you for reading, my question. I illustrate my problem with R code, but python is also more than fine for solutions.