# Farmer wants to know how many apples the squirrels are stealing!

Hi all,

I posted a question recently here, but I think that it included a little too much detail, so I am trying to rephrase the question.

Suppose there is a farmer who has an apple orchard and he wants to know how many apples are being stolen by the squirrels. He places a box under each tree (pretend the box covers all the ground underneath the apple tree) that the squirrels can’t get into. He visits N apple trees for i days and counts (1) the number of apples left on the tree and (2) the number of apples in the box. However, there would be error on his counts of the apples in the trees, but we assume that he accurately counts the apples in the box.

I am trying to get started on modeling this, but not sure how to do it. It seems like a state space model with a poisson distribution on the apples. Any help on getting started would be appreciated.

The data would look something like this:
tree day apples(Tree) box
tree_1 1 97 21
tree_1 2 76 24
tree_1 3 75 15
tree_1 4 57 14
tree_1 5 48 12
tree_2 1 93 21

tree_10 4 57 10
tree_10 5 46 7

Here is the code I used the generate the data:

``````numTrees = 10
apples_start = rpois(numTrees,100) # number of apples at day 1
trees = paste(rep("tree",numTrees),as.character(1:numTrees),sep="_")
days = 1:5
i = 1 # counter

data = data.frame(Tree = character(),day = numeric(),applesTrue = numeric(),applesObs = numeric(),applesFallen=numeric())

for (apples in apples_start){
applesTrue = apples # start at the same place
tree = trees[i]
print(apples)
Fallen = c()
totalApplesTrue = apples
totalApples = rpois(1,apples)

for (day in days) {
tmpFallen= rbinom(1,apples[length(apples)],.2)
if (day == 1){
totalApplesTrue = append(totalApplesTrue,applesTrue - tmpFallen)
totalApples = append(totalApples,rpois(1,(apples-tmpFallen)))}
if (day != 1 && day != days[length(days)]) {
totalApples = append(totalApples,rpois(1,(apples-tmpFallen)) )## add observer error
totalApplesTrue = append(totalApplesTrue,applesTrue - tmpFallen)
}
Fallen = append(Fallen,tmpFallen)
apples = apples - tmpFallen
applesTrue = apples
}
tmpdat = data.frame(rep(tree,max(days)),days,totalApplesTrue,totalApples,Fallen)
data = rbind(data, tmpdat)
i = i+1
}
``````

The data set looks like this:

tree days totalApplesTrue totalApples Fallen
tree_1 1 105 93 22
tree_1 2 83 86 19
tree_1 3 64 58 12
tree_1 4 52 50 12
tree_1 5 40 38 6
tree_2 1 98 76 12

So here, the farmer puts down the box at day 0 and checks at day 1. 22 apples have fallen out of a ‘true’ number of 105. However, he miscounts and gets 93. The next day, he counts 19 apples in the box and 86 on the tree. The true number is 83 (105 - 22).

1 Like

What’s the data model for the observed # of apples on the tree? i.e. if there are actually n apples on the tree, and we observe n^{obs}, what’s the distribution of n^{obs}?

I am thinking that it nobs is Y∼Poisson(λ).

That is how I modeled it in the data generation.

Thanks - I see now where the observed counts are generated.

Are the squirrels represented in the simulation? I see that you’ve got an object `tmpTaken` but it’s not defined.

I have edited the data generation code. `tmpTaken` was a mistake and should have been `tmpFallen`. The squirrels are not observed or included in the model. Only (a) the number of apples counted on the tree and (b) the number of apples that fell into the box. I have added a totalApplesTrue to keep track of the ‘true’ number of apples.

Based on the first post it sounds like the goal is to estimate the number of apples being removed by squirrels. If that’s right, I’d assume that you’d be simulating the removal of apples by squirrels (even if this process is latent).

Let me see if I can follow the model for the true number of apples: for a tree on day t, if there are n_t apples, then the next day based on your simulation the model for the “state” is:

n_{t+1} = n_t - f_t,

where f_t is observed without error (number of apples in the trap).

If squirrels are removing apples such that apples removed by squirrels don’t end up in the trap, then I’d think of a state model that includes (1) apples removed by squirrels and (2) apples that fall into the trap:

n_{t+1} = n_t - f_t - s_t,

where s_t is the number of apples removed by squirrels and f_t is the number of apples that fell in the trap between time t and t+1. Here s_t is latent, and f_t is still observed. Is that consistent with how you’re thinking about how the true number of apples changes over time?

In some ways this reminds me of something like an open population removal model: https://projecteuclid.org/euclid.aoas/1475069619 Matechou, Eleni, et al. “Open models for removal data.” The Annals of Applied Statistics 10.3 (2016): 1572-1589.

1 Like

@ mbjoseph, you are exactly right and the data generation code that I provided is not quite correct. I am going to edit the code to reflect the removal of apples by squirrels and repost. I will also read the paper. Thanks for your help in this.

I have edited the code generation. Here it is now.

``````numTrees = 10
apples_start = rpois(numTrees,100) # number of apples at day 1
trees = paste(rep("tree",numTrees),as.character(1:numTrees),sep="_")
days = 1:5
i = 1 # counter
ApplesRem_prob = .05
ApplesFall_prob = .2

data = data.frame(Tree = character(),day = numeric(),applesTrue = numeric(),applesObs = numeric(),applesRemoved = numeric(), applesFallen=numeric())

for (apples in apples_start){
applesTrue = apples # start at the same place
tree = trees[i]
print(apples)
Fallen = c()
Removed_sq = c()
totalApplesTrue = apples
totalApples = rpois(1,apples)

for (day in days) {
tmpFallen= rbinom(1,apples[length(apples)],ApplesFall_prob)
tmpRemove = rbinom(1,apples[length(apples)],ApplesRem_prob)
if (day == 1){
totalApplesTrue = append(totalApplesTrue,applesTrue - tmpFallen - tmpRemove)
totalApples = append(totalApples,rpois(1,(apples-tmpFallen - tmpRemove)))}
if (day != 1 && day != days[length(days)]) {
totalApples = append(totalApples,rpois(1,(apples-tmpFallen - tmpRemove)) )## add observer error
totalApplesTrue = append(totalApplesTrue,applesTrue - tmpFallen - tmpRemove)
}
Fallen = append(Fallen,tmpFallen)
Removed_sq = append(Removed_sq,tmpRemove)
apples = apples - tmpFallen - tmpRemove
applesTrue = apples
}
tmpdat = data.frame(rep(tree,max(days)),days,totalApplesTrue,totalApples,Removed_sq,Fallen)
data = rbind(data, tmpdat)
i = i+1
}
``````

The data set looks like this:

Tree days totalApplesTrue totalApples Removed_sq Fallen
tree_1 1 92 74 2 23
tree_1 2 67 47 1 13
tree_1 3 53 51 1 10
tree_1 4 42 41 4 8
tree_1 5 30 30 2 3

So here, the farmer puts down the box at day 0 and checks at day 1. 23 apples have fallen out of a ‘true’ number of 92, and those darn squirrels took 2. However, he miscounts and gets 74 apples on the tree. The next day, he counts 13 apples in the box and 47 on the tree. The true number is 67 (92 - 23 - 2).