Universidad Nacional Abierta Matemáticas El Tigre

Mostrando entradas con la etiqueta R puzzles. Mostrar todas las entradas

miércoles, 5 de octubre de 2016

R Puzzle No. 2 - Where's the zebra?

For the second R puzzle, we will tackle the famous Zebra Puzzle, sometimes known as Einstein's Puzzle or Einstein's Riddle. There are various implementations of solutions to the puzzle in different programming languages¹, but here we will be using R and an R package called lpSolve, which is intended to solve linear programming problems. I had a lot of fun preparing this entry and learned quite a bit on how to implement linear programming models using R - I hope you will find this post entertaining, interesting and useful.

The Puzzle

The Zebra Eye, by Singapore artist Jarrell Goh aka fuzzyzebra (2007).

The Zebra puzzle, or Einstein's riddle as it is often called, was popularized in a 1962 number of Life International magazine, with a follow-up in a March 1963 issue of Life containing the solution. It is said to have been invented by a young Einstein, although it is also credited to Lewis Carroll. The truth is, there is no clear evidence of either's authorship of the riddle. I should also point out that various versions of the puzzle exist, with different nationalities, brands of cigarettes, or even replacing sports teams with cigarettes, etc.

The version of the puzzle we will be using here is as follows. There are five houses along a street whose five occupants are all of different nationalities (English, German, Norwegian, Swedish, and Danish). Each occupant also smokes a different brand of cigarette (five brands in all), has a different favorite drink, has a different pet and owns a house of a different color than his neighbors. The following clues are given:

The Englishman lives in the red house.
The Swede has a dog.
The Dane drinks tea.
The man in the green house drinks coffee.
The white house on the left of the green house.
The man in the yellow house smokes Dunhill.
The man who smokes Pall Mall has a bird.
The German smokes Prince.
The man who drinks beer smokes Bluemaster.
The man that smokes Blends lives next to the man with a cat.
The man with a horse lives next to the one that smokes Dunhill.
The man who smokes Blends lives next to the man that drinks water.
The Norwegian lives next to blue house.
The man in the middle house (third house) drinks milk.
The Norwegian lives in first house.

If you have never tried to solve this puzzle, you should. It'll give you some moments of good entertainment. I suggest you make a table with 5 columns and 5 rows in which the columns represent the five houses, numbered 1-5 from left to right and the rows represent the nationalities, cigarette brands, pets, favorite drinks and house colors. Start with the most direct clues and incrementally reach the solution by using one clue at a time. Once you reach the solution, reflect on how you did it. What algorithm did you use? How did you represent the data? And finally, how would you program a computer (using R specifically) to solve this puzzle? This is when the real fun begins...

A mathematical approach

Like I said, there are various code implementations in various programming languages that you can find on the Internet. On the rosetta.org site (listed in the References section), there's even an R solution using an R-package called "combinat", which I confess I'm not familiar with but I assume it's similar to the Python "constraints" module. In this post I'll take a different approach (using linear programming), but first, let's lay down the basic strategy for solving this puzzle computationally.

If you tried to solve the puzzle by hand using a 5x5 table like the one I suggested, you may have probably considered the blank table cells as the problem variables, to be filled by a specific drink, nationality or other characteristic. If we are to solve this puzzle by coding it, we should take another approach: we should consider instead the 25 characteristics (brands of cigarettes, nationalities, house colors and so on) as the variables, each variable assuming an integer value between 1 and 5 representing the number of the house where that specific item is located. Doing so brings our puzzle within the realm of mathematics.

Without any restrictions, we simply assign to each of the 25 variables a number between 1 and 5. There are $5^{25}\approx 2.98\cdot 10^{17}$ possible ways of doing so. If we next assume that there are no two houses with occupants of the same nationality, pets, brands of cigarettes, etc., then we narrow the solution space down to $(5!)^5=24,883,200,000$ possible cases, which is still a considerable number of cases. Each of the puzzle's clues is really an additional constraint and together, these constraints reduce the possibilities to a single definitive solution.

So once we have defined our 25 variables identifying them according to the specific item they represent, such as Horse, Norwegian, Tea, Red, and so on, how do we mathematically represent the constraints? Consider the first clue for example: the Englishman lives in the red house. This can be represented mathematically as $English=Red$. Clues like "the occupant of the middle house (third house) drinks milk" can be expressed as $Milk=3$. If the white house is to the right of the green house, we could simply write this as $White=Green + 1$. If the Norwegian lives in the first house, we represent this by $Norwegian=1$. The "x is next to y"-type clues are a bit trickier though.

What does "x is next to y" or "x and y are neighbors" mean? It simply means that their house numbers differ by one. So, for example, "the Norwegian lives next to the blue house" could be expressed mathematically as $|Norwegian-Blue|=1$, which in turn means that either $Norwegian-Blue=1$ or $Blue-Norwegian=1$. Finally, since we require all variables to be bounded between 1 and 5, we could impose these constraints as inequalities: $Red\geq 1$ and $Red\leq 5$, for all 25 variables. All in all, what I mean to show you is that all constraints can be expressed as a system of equations and inequalities. More specifically, these equations and inequalities are all linear.

This is where linear programming comes in to the picture. In linear programming, we attempt to maximize or minimize an objective (linear) function, usually representing a cost as a function of the decision variables, when we impose constraints as a system of linear inequalities like $Ax\leq b$ or $Ax\geq b$. In linear programming, $A_{n\times m}$ is a matrix of coefficients whose m columns represent the m decision variables and the n rows represent the n constraints as inequalities. On the right-hand side, b is a vector whose n components represent the right-hand side of each constraint inequality.

To give an idea about what a linear programming problem looks like, I'll use the following example from Taha (1982): A paint factory that produces interior and exterior paints uses two types of raw materials (A and B). The maximum availability of A is 6 tons per day, whereas for B it is 8 tons per day. To produce 1 ton of exterior paint, they require 1 ton of raw material A and 2 tons of raw material B, whereas for 1 ton of interior paint, they require 2 tons of A and 1 ton of B. Market surveys indicate that the daily demand for interior paint cannot exceed that of exterior paint by more than 1 ton. The survey also indicates that the maximum demand for interior paint is limited to 2 tons daily. Furthermore, the wholesale price per ton of interior paint is $2000 and for exterior paint it is $3000. How much interior and exterior paint should the factory produce to maximize gross income?

In the example problem given, it is clear that our two decision variables are $X_I$ and $X_E$ representing the daily production (in tons) of interior and exterior paint respectively. Then we could express the usage restrictions on the usage of raw materials and maximum raw material availability as follows:

\[\begin{align*} X_E + 2X_I &\leq 6 \quad \text{(raw material A)}\\ 2X_E + X_I &\leq 8 \quad \text{(raw material B)} \end{align*}\]

The demand restrictions (according to the market surveys), could be expressed as

\[\begin{align*} -X_E + X_I &\leq 1 \quad & \text{(the demand for interior paint cannot exceed}\\ {} & {}\qquad & \text{that of exterior paint by more than 1)}\\ X_I &\leq 2 \quad & \text{(the daily demand for interior paint is less than 2)} \end{align*}\]

And then of course there are the standard non-negativity restrictions on the decision variables that are tacitly assumed in any linear program: $X_E, X_I \geq 0$. We want to maximize the gross income, and this income is given by $z=3000X_E+2000X_I$, which is a linear function on the decision variables. All in all, our linear program would be:

Maximize $z(X)=Z\cdot X$, subject to $AX\leq B$, where $X=(X_E,X_I)$, $Z=(3000,2000)^T$, $B=(6,8,1,2)^T$, and $A$ is the following matrix of coefficients:

\[A=\left(\begin{array}{cc} 1 & 2\\ 2 & 1\\ -1 & 1\\ 0 & 1\\ \end{array}\right)\]

With $X_E, X_I\geq 0$. Alternatively, in non-matrix form, our problem would be to maximize $z=3000X_E+2000X_i$ subject to the following restrictions:

\[\begin{align*} 1X_E + 2X_I &\leq 6\\ 2X_E + 1X_I &\leq 8\\ -1X_E + 1X_I &\leq 1\\ 0X_E + 1X_I &\leq 2 \end{align*}\]
Again, subject to the non-negativity restrictions on $X_E$ and $X_I$.

The lpSolve package

It is not my intention in this post to give a thorough explanation of how the simplex algorithm used in linear programming works, nor delve into the principle of duality in linear programs. My intention is a more pragmatic and modest one- to show how to use one of the R packages dedicated to linear programming to solve such problems. Of course, knowing the inner workings of the mathematical tools could be useful in some cases, but in most real-life scenarios, it suffices to be able to identify the tool needed, to set up the problem in terms of the inputs that the tool requires, and to be able to assess and interpret the output. Specifically, we will make use of the "lpSolve" R package, which is an R wrapper to the lp_solve version 5.5 functions freely available under the LGPL 2 software license². So, to continue with the example problem we defined above, the R code for solving it would be as follows:

library(lpSolve)
z <- c(3000,2000)
B <- c(6,8,1,2)
A <- matrix(c(1,2,2,1,-1,1,0,1),ncol=2,byrow=TRUE)
colnames(A) <- c("X_E","X_I")
dir <- rep("<=",4)
(result <- lp("max", z, A, dir, B))
result$solution

Success: the objective function is 12666.67

[1] 3.333333 1.333333

In the code above, we obtained a maximum gross income of $12666.67 with 3.33 tons of exterior paint ($X_E$) and 1.33 tons of interior paint ($X_I$). In line 4 we define the constraints matrix A. In the next line, we give names to the columns that correspond to the decision variables. This is not necessary, but the user must be aware that the solution vector will be given in terms of these variables in the order that they appear on the constraints matrix. Notice that the lp function call in line 7 requires we specify whether we want a maximum or minimum for the objective function, in this case given by the vector z defined in line 2. A neat feature of the lp function is that for each constraint row of the constraints matrix, we can specify a direction for the inequality as "<=", "=" or ">=", so not all restriction inequalities have to be of the same type.

Granted, our zebra puzzle is quite more complicated than this example problem. It involves at least 25 decision variables (the 5 nationalities, 5 pets, 5 colors, 5 cigarette brands and 5 drinks). But having seen how we mathematically expressed some of the restriction clues, it is clear that they all involve inequalities or equalities, which suits us fine because the lp function in lpSolve can handle any of those three types of restrictions for each constraint separately.

An important point to consider though, is that we are only interested in integer solutions for the decision variables. Although classic linear programming is intended to solve for real solutions on the decision variables, there are variants of the linear programming method, called integer linear programming, which restrict the solutions to integer values. Fortunately, lp can handle these cases and even the more restricted cases where the decision variable is binary (either 0 or 1). As we shall see, it is very easy to indicate which variables we want to be real, integer, or binary. So on that aspect of our zebra problem requiring that our variables be restricted to integer values (1,2,3,4 or 5) indicating the house where each item is, we have nothing to worry about.

Not so fast, cowboy

It would seem like just by invoking the lpSolve library, our zebra problem has been solved. But it isn't. For one thing, how can we efficiently and most simply represent the 25 decision variables and build the constraints matrix? Do we have to build this matrix manually, perhaps using a spreadsheet program and reading it in as a csv file from R? Instead of doing that, I decided to use a neat feature of R which allows me to index elements of a vector by using character-string names instead of integer numbers. Such vectors are called named vectors. So what I will do is first define a character vector with all the variable names. For each row in the restriction matrix, I will start with a numeric vector of 25 elements all set to 0 and name that vector with the variable names I defined previously. If I want to set a particular element of that constraint row to a certain value, I just simply index it by the variable name instead of using a number. To keep things simple as I show you how to do this, we will pretend that we only have four decision variables: "W","X","Y" and "Z".

To begin with, how would we define the constraint "X is in house 3"? As we have previously seen, that is tantamount to $X=3$, so our constraint matrix row would be defined in R as follows:

variable_names <- c("W","X","Y","Z")
row1 <- rep(0,4)
names(row1) <- variable_names
row1["X"] <- 1
row1

 W X Y Z 
 0 1 0 0

The named vector row1 contains the coefficients for the constraint. The direction of that constraint would be "=", and the corresponding element of the right-hand side vector of the (in this case) equality would be 3. Now say we want to indicate that "W and Z are in the same house". That would be equivalent to stating that $W=Z$ or $W-Z=0$. In that case we would add a second row to are constraint matrix as follows:

row2 <- rep(0,4)
names(row2) <- variable_names
row2["W"] <- 1
row2["Z"] <- -1
row2

  W  X  Y  Z 
  1  0  0 -1

As in the previous case of row1, the direction (sign) of that constraint would again be "=" and the corresponding right-hand side value would be 0, since we want $W-X=0$. You may be thinking that perhaps this is a complicated way to do this, but consider that with 25 variables per linear constraint where most of the coefficients are zero, it makes a lot of sense to set the coefficients only of one or two variables we are dealing with. In effect, we are seeing that the constraints matrix will be a sparse matrix (mostly filled with zeros). Constraints like "W is to the right of Y" are mathematically expressed as $W-Y=1$ and it wouldn't be too hard to see how that constraint row would be defined in R.

Now when we come to a constraint like "X and Y are next to each other", that means that they differ by 1: $|X-Y|\leq 1$. This absolute-value inequality results in two simultaneous inequalities: $X-Y \leq 1$ and $Y-X\leq 1$. These two inequalities can be easily represented as constraint rows in R as we have seen. The catch is that $-1\leq X-Y \leq 1$ does not exclude the possibility that $X-Y=0$, that is, that X and Y be equal. However if "X and Y are next to each other, they cannot occupy the same house, so to the $|X-Y|\leq 1$ constraint we need to add $X-Y\neq 0$. The problem is, how do we add a strict inequality constraint to a linear program when our only "direction" operators are "<=", "=" and ">="?

We want X and Y to be different (not equal), and since X and Y are limited to integer values, this is equivalent to stating that $|X-Y|\geq 1$, which in turn means that either $X-Y\geq 1$ or $Y-X\leq -1$. However, we cannot have an "either/or" disjunction condition in a linear program, since all the conditions must be met simultaneously (through the "and" conjunction). We could use de Morgan's laws and state that transform these statements as I show below, , but we would just be going around in circles.

\[\begin{align*} X-Y\geq 1\;&\vee\;X-Y\leq -1 \equiv\\ \neg (X-Y\lt 1\;&\wedge\;X-Y\gt -1) \equiv\\ X-Y&\neq 0 \end{align*}\]

Fortunately, there's a "trick" that I discovered on the lp_solve reference guide web-page³ that one can use to incorporate constraints like $|X-Y|\geq 1$ into a linear program. To generalize as much as possible in what follows, we will assume that $X=a_1X_1 + \ldots + a_mX_m$ is a linear expression on the decision variables. We want to be able to express $|X|\geq min$ as a valid linear programming constraint. The trick is to introduce an additional binary decision variable $B$ that can only assume two values: 0 or 1.

\[\begin{align*} X+M\cdot B &\geq min\qquad (1) \\ -X+M(1-B) &\geq min \end{align*}\]

In the above, $M$ is simply a big, positive constant. How big does it have to be is something we will discover shortly. For now, bear with me and let's see what happens if $B=0$. In that case, the inequalities in (1) become:

\[\begin{align*} X&\geq min \qquad (1a)\\ -X+M &\geq min \end{align*}\]

In this case ($B=0$), $X\geq min$ and since we are assuming that M is a large enough constant, the second condition will always be fulfilled. If $B=1$, then (1) is changed to the following:

\[\begin{align*} X+M&\geq min \qquad (1b)\\ -X&\geq min \end{align*}\]

In that case ($B=1$), the condition $-X \geq min$ must hold, which can only mean that $-X$ is positive (X was negative). The first condition in (1b) automatically holds because we have been assuming that M is large enough anyways. So how large is "large enough"? Well, from either (1a) or (1b), we can see that $M\geq min+|X|$. While we could in theory use a really large value for M like $1\cdot 10^{20}$, it is not advisable to do so because that would create numerical instabilities with our linear program solver.

If we can somehow put an upper bound to $|X|$ in our problem, then we can find a feasible value for M. In our zebra puzzle, say we have two variables like "Norwegian" and "Blue" and we want to express that these have different values $|Norwegian-Blue|\geq 1$. By looking at the expression within the absolute value brackets $Norwegian-Blue$, we would have to see how large the difference can become. Since all decision variables in our problem are within the 1-5 range, we can see that this difference, in absolute value, cannot be larger than 4. So that means that any value of M larger than $4+1=5$ will suffice. In my R solution, I'll be using a value of 20 for M. At any rate, a "x not equal to y constraint" implies using one additional binary decision variable and two additional constraint rows, as shown in (1), in the linear program. Now we are ready to tackle the zebra puzzle in R, so without further ado, here's the script.

And finally, the R solution

#define the variables, each assuming integer values between 1 and 5
nationalities <- c("Dane","English","German","Norwegian","Sweede")
drinks <- c("Beer","Coffee","Milk","Tea","Water")
smokes <- c("Blend","Bluemaster","Dunhill","Pallmall","Prince")
pets <- c("Bird","Cat","Dog","Horse","Zebra")
house_color <- c("Blue","Green","Red","White","Yellow")
variable_names <- c(nationalities,drinks,smokes,pets,house_color)
#add binary variables for defining inequalities:
#4 for the problem constraints
#50 for requiring that the variables within each group
#(colors, nationalities, pets, etc.) have to be different
variable_names <- c(variable_names,sprintf("B%02d",1:54))
#define maximum bound for inequalities
M <- 20
#define Binary variable counter
b_counter <- 1
#define blank constraints, dir and rhs
constraints <- NULL
dir <- character(0)
rhs <- numeric(0)
#functions for defining constraints
x_is_y <- function(x,y) {
  #x and y are in the same house
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- 1
  new_row[y] <- -1
  constraints <<- rbind(constraints,new_row)
  row.names(constraints) <<- NULL
  dir <<- c(dir,"=")
  rhs <<- c(rhs,0)
}
x_is_in_n <- function(x,n) {
  #x is in house number n
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- 1
  constraints <<- rbind(constraints,new_row)
  row.names(constraints) <<- NULL
  dir <<- c(dir,"=")
  rhs <<- c(rhs,n)  
}
x_to_left_of_y <- function(x,y) {
  #x is on the house to the right of y
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- 1
  new_row[y] <- -1
  constraints <<- rbind(constraints,new_row)
  row.names(constraints) <<- NULL
  dir <<- c(dir,"=")
  rhs <<- c(rhs,-1)  
}
x_is_not_y <- function(x,y) {
  #function that defines x and y as not equal
  #1st inequality constraint
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  B_variable <- sprintf("B%02d",b_counter)
  new_row[x] <- 1
  new_row[y] <- -1
  new_row[B_variable] <- M
  constraints <<- rbind(constraints,new_row)
  dir <<- c(dir,">=")
  rhs <<- c(rhs,1)
  #2nd inequality constraint
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- -1
  new_row[y] <- 1
  new_row[B_variable] <- -M
  constraints <<- rbind(constraints,new_row)
  dir <<- c(dir,">=")
  rhs <<- c(rhs,1-M)
  row.names(constraints) <<- NULL
  b_counter <<- b_counter+1  
}
x_is_next_to_y <- function(x,y) {
  #x is next to y (x and y are neighbors)
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- 1
  new_row[y] <- -1
  constraints <<- rbind(constraints,new_row)
  dir <<- c(dir,"<=")
  rhs <<- c(rhs,1)
  #2nd neighboring constraint row
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- -1
  new_row[y] <- 1
  constraints <<- rbind(constraints,new_row)
  dir <<- c(dir,"<=")
  rhs <<- c(rhs,1)
  #now define inequality constraints
  x_is_not_y(x,y)
}
inequality_within_group <- function(var_group) {
  #function that defines all variables within a group
  #to be different from each other.
  #first obtain all possible pair of group variable combinations
  group_combinations <- t(combn(var_group,2))
  #now loop through this table, adding constraints
  for (i in 1:nrow(group_combinations)) {
    x_is_not_y (group_combinations[i,1],group_combinations[i,2])
  }
}
variable_bounds <- function(x) {
  #function that creates constraints for bounds of 
  #problem variable (must be in 1-5 range)
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- 1
  constraints <<- rbind(constraints,new_row)
  dir <<- c(dir,">=")
  rhs <<- c(rhs,1)
  new_row <- numeric(length(variable_names))
  names(new_row) <- variable_names
  new_row[x] <- 1
  constraints <<- rbind(constraints,new_row)
  dir <<- c(dir,"<=")
  rhs <<- c(rhs,5)
  row.names(constraints) <<- NULL
}
#Now define constraints in Prolog fashion
x_is_y("English","Red")
x_is_y("Sweede","Dog")
x_is_y("Dane","Tea")
x_is_y("Coffee","Green")
x_is_y("Dunhill","Yellow")
x_is_y("Pallmall","Bird")
x_is_y("German","Prince")
x_is_y("Bluemaster","Beer")
x_to_left_of_y("White","Green")
x_is_next_to_y("Blend","Cat")
x_is_next_to_y("Dunhill","Horse")
x_is_next_to_y("Blend","Water")
x_is_next_to_y("Norwegian","Blue")
x_is_in_n("Milk",3)
x_is_in_n("Norwegian",1)
inequality_within_group(nationalities)
inequality_within_group(drinks)
inequality_within_group(smokes)
inequality_within_group(pets)
inequality_within_group(house_color)
for (v in variable_names[1:25]) variable_bounds(v)
#Finally invoke the lp solve to solve the linear program
#and output results
library(lpSolve)
obj <- rep(0,79)
result <- lp("max", obj, constraints, dir, rhs, int.vec=1:25,binary.vec=26:79)
result <- result$solution[1:25]
result_table <- NULL
for (i in 1:5) {
  result_table <- cbind(result_table,
    variable_names[which(result==i)])
}
colnames(result_table) <- paste("House",1:5)
row.names(result_table) <- c("Nationality","Drinks","Smokes","Pet","House color")
result_table

             House 1     House 2 House 3    House 4      House 5 
 Nationality "Norwegian" "Dane"  "English"  "Sweede"     "German"
 Drinks      "Water"     "Tea"   "Milk"     "Beer"       "Coffee"
 Smokes      "Dunhill"   "Blend" "Pallmall" "Bluemaster" "Prince"
 Pet         "Cat"       "Horse" "Bird"     "Dog"        "Zebra" 
 House color "Yellow"    "Blue"  "Red"      "White"      "Green"

So the zebra is in house number 5 (green colored), where the Coffee drinking, Prince smoking German guy lives.

Notes

There is a thorough list of coding implementations (with source code in various languages) of the Zebra Puzzle at the rosettacode.org site (see WILLNESS et al).
See BERKELAAR (2015).
See BERKELAAR (2010).

Bibliography

BERKELAAR, M. et al. (2010). lp_solve 5.5 reference guide: Absolute values. http://lpsolve.sourceforge.net/5.5/asolute.htm.
BERKELAAR, M. et al. (2015). Interface to 'Lp_solve' v. 5.5 to Solve Linear/Integer Programs. R package "lpSolve". http://cran.r-project.org/.
GOH, J. (2007) The Zebra eye [Digital visualization]. Retrieved from http://fav.me/d11cbfy.
R DEVELOPMENT CORE TEAM (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.
TAHA, H. (1982). Operations Research - An Introduction. 3rd Edition. New York: MacMillan Publishing Co.
WILLNESS et al. (2011). Zebra Puzzle. https://rosettacode.org/wiki/Zebra_puzzle.
.
Zebra Puzzle. (2016, August 18). In Wikipedia, The Free Encyclopedia. Retrieved 17:01, October 1, 2016, from https://en.wikipedia.org/w/index.php?title=Zebra_Puzzle&oldid=735058353

If you found this post interesting or useful, please share it on Google+, Facebook or Twitter so that others may find it too.

martes, 6 de septiembre de 2016

R Puzzle No. 1 - Five Sailors, a Monkey and some Coconuts

In a brief diversion from the simecol dynamic system models, which I promise to resume in the future, I'll start another series of entertaining posts about solving mathematical/logic puzzles using R, of which this post you're reading is the first. This time around, we'll take a crack at the Problem of the Five Sailors, One Monkey and a Pile of Coconuts. Read on to find out what this is all about.

The Puzzle

This problem first appeared in the Saturday Evening Post in the 1920's and was later popularized by Martin Gardner. It goes as follows:

Five sailors arrive at a deserted island that has only coconuts and one monkey. After laboriously collecting all the coconuts into one big pile, the sailors all agree to divide up the coconuts into equal shares the next morning. However, in the middle of the night a sailor wakes up and, not trusting the others, decides to give the monkey one coconut (to buy his silence?) and takes away his 1/5 share of the coconuts, leaving the rest in the pile. The other 4 sailors wake up at different time intervals and each does the same thing: give one coconut to the monkey and stash away his 1/5 share of the coconuts that remain in the pile. In the morning they divide what is left of the pile into equal shares and there is still one coconut left for the monkey. How many coconuts were in the pile the day before?

Readers in the 1920's did not have the benefit of Google or the Internet so after racking their brains for a while, many gave up in frustration and soon, the premises of the newspaper were flooded by an angry mob bearing pitchforks and torches, demanding to know the answer¹. Now, before you fire up the Google search engine for the answer to this conundrum, let's first try a...

Brute-force solution

By brute-force, I really do mean brute as opposed to smart. It would seem that most people think mathematicians arrive at sophisticated answers directly in one step. Not me - I'm a dumb one so I usually try to explore the problem situation and try a little brute-force searching first. I don't actually have to do the computations myself, that's what R is for. Only after I understand the problem more thoroughly do I attempt to tackle it using more formal mathematical artifices.²

Our brute-force strategy will be to try out successive integer values for the size of the coconut pile to search for one where we will be able to deal out the coconut pile to the 5 sailors and the monkey in the conditions described by the problem. For each candidate integer value, we will simulate each of the five sailors taking their shares and giving one to the monkey, and in the end, dividing the pile into five equal shares plus one for the monkey. The key here is that the divisions in each step must be exact, since we can't deal with fractions of coconuts, otherwise the problem would be too easy, wouldn't it? Here is my brute-force solution script:

#------------------------------------------------------------------
# Five Sailors, a monkey and a pile of coconuts
# José Romero - 05/09/2016
#------------------------------------------------------------------

n_sailors <- 5
n_iterations <- 6
max_bound <- 1000000

verify <- function(pile, verbose=FALSE) {
  solution_DF <- data.frame()
  for (i in 1:n_iterations) {
    sailors_share <- pile%/%n_sailors
    for_monkey <- pile%%n_sailors
    if (for_monkey == 1) { 
      solution_DF <- rbind(solution_DF,
                           data.frame("Pile"=pile,
                             "Sailor.s.share"=sailors_share,
                             "For.monkey"=for_monkey))
      pile <- pile - (sailors_share+for_monkey)
    } 
  }
  return(solution_DF)
}

solve_monkey_coconuts <- function(max_bound) {
  for (i in 1:max_bound) {
    if (nrow(verify(i))==6) return(i)
  }
  return(0)
}

solution <- solve_monkey_coconuts(max_bound)
if (solution>0) {
  cat("Number of coconuts in pile: ",solution,"\n")
  print(verify(solution))
} else sprintf("No solution in [1,%d]",max_bound)

     Number of coconuts in pile:  15621 
        Pile Sailor.s.share For.monkey
     1 15621           3124          1
     2 12496           2499          1
     3  9996           1999          1
     4  7996           1599          1
     5  6396           1279          1
     6  5116           1023          1

There it is - the pile originally had 15621 coconuts and there were 1023 coconuts for each sailor (and one for the monkey) at daybreak. The one thing that strikes us when we run the script above is how agonizingly slow it is. Since we initially have no idea how large the solution is, we brute-search in the first one million positive integers using the for loop in lines 27-29³.

To check if an integer candidate is a solution to the problem, we simulate 6 partitionings (one for each of the five sailors and the final division of the coconuts done in the morning). Each partitioning consists of dividing the coconut pile into five equal parts and verifying that the remainder (integer modulo division) is one, which is the coconut for the monkey. If at some point in the process we do not get a remainder of 1, we will be exiting the verfication process, knowing that we could not perform all 6 iterations of it. This is what the code in lines 10-24 does.

Finally, note the output below the R script, indicating how many coconuts remain in the pile, the share for each sailor and the share for the monkey at each iteration. Could we have somehow guessed this would be the answer? Well, we could have ventured that since we were going to divide the pile by 5 six times, the answer would need to be divisible by $5^6=15626$, this being our approximate answer. Of course, we would have known this could not be the answer to the puzzle because we had not taken into consideration the part about giving one coconut to the monkey at each step, which complicates things a great deal.

Diophantine equations to the rescue

Now that we know the answer to the problem, let's see if we can deduce it mathematically. Let $C_0$ be the number of coconuts the five sailors originally collected. When the first sailor comes along in the middle of the night, he divides this pile into five equal parts and gives the remaining coconut to the monkey. So if $S_1$ is his one fifth share of the coconuts, then the following equation holds: $C_0=5S_1+1$. So when the next sailor comes along, there will be $C_1=4S_1$ coconuts in the pile, the previous sailor having taken his share, plus one for the monkey. But if he also divides that pile into 5 equal parts and throws the remaining coconut to the monkey, then $C_1=5S_2+1$ also holds, and so on. Writing down all six of these equations describing the division of coconuts at each stage:

\[\begin{align*} C_0&=1+5S_1\\ C_1=4S_1&=5S_2+1\\ C_2=4S_2&=5S_3+1\qquad\qquad(1)\\ C_3=4S_3&=5S_4+1\\ C_4=4S_4&=5S_5+1\\ C_5=4S_5&=5S_6+1 \end{align*} \]

Oh great, six equations and twelve unknows?! Please bear with me a little longer. With the equations in (1), we can try to express five of the $S_i$'s in function of their sucessors $S_{i+1}$:

\[\begin{align*} S_1&=\frac{1+5S_2}{4}\\ S_2&=\frac{1+5S_3}{4}\\ S_3&=\frac{1+5S_4}{4}\qquad\qquad(2)\\ S_4&=\frac{1+5S_5}{4}\\ S_5&=\frac{1+5S_6}{4} \end{align*}\]

Some backward substitution, starting with the last two equations in (2) and working on up to the top equation there allows us to express $S_1$ in terms of $S_6$ and substituting this in the first equation of (1), we get:

\[ C_0=\frac{5}{4}\left(\frac{5}{4}\left(\frac{5}{4}\left(\frac{5}{4}\left(\frac{5}{4}\left(5S_6+1\right)+1\right)+1\right)+1\right)+1\right)+1\qquad(3) \]

Working with this last equation:

\[ C_0=\frac{5^6}{4^5}S_6+\sum_{i=0}^5 \left(\frac{5}{4}\right)^i\qquad(4) \]

In the right hand side of (4) we have a geometric series so we apply the pertinent formula:

\[ C_0=\frac{5^6}{4^5}S_6+\frac{\left(\tfrac{5}{4}\right)^6-1}{\tfrac{5}{4}-1}\qquad(5) \]

This means that:

\[\begin{align*} C_0&=\frac{5^6}{4^5}-4+\frac{5^6}{4^5}S_6\quad\rightarrow\\ 4^5C_0&=5^6-4^6+5^6S_6\quad\rightarrow\quad&(5a)\\ 1024C_0&-15625S_6=11529\qquad\quad&(5b) \end{align*}\]

Equation (5b), which looks just like an innocent linear equation on two variables, is called a linear diophantine equation. If $C_0$ and $S_6$ were allowed to assume real values, we simply would have infinite solutions, but there's a catch here: we're looking for integer values of $C_0$ and $S_6$ as solutions to (5b). This is typical of diophantine problems, which usually involve less equations than unknowns but in which integer solutions are required.

What comes next is a little sleight of hand. Due to the symmetry in equation (5a), we can see that if $C_0$ and $S_6$ are one solution to the equation, so are $C_0'=C_0+5^6$ and $S_6'=S_6+4^5$. If we substite $C_0'$ for $C_0$ and $S_6'$ for $S_6$ in equation (5a) and notice that the $4^55^6$ terms cancel out on both sides of the equality, the resulting equation is essentially the same:

\[ 4^5(C_0+5^6)=5^6(S_6+4^5)+5^6-4^6 \]

What this all amounts to is that we can find other solutions by adding some constants to each of the variables in a known solution pair. But, how do we find one solution pair? The trick is to add $4^6$ to both sides of (5a) and to factor out both sides:

\[ 4^5(C_0+4)=5^6(S_6+1)\qquad(6) \]

Now this last equation is easy to solve. We just need to make both sides of (6) equal to zero and obtain $C_0=-4$ and $S_6=-1$. But the problem is, how can we have negative coconuts?⁴ This is not really a problem because if we add $5^6$ to $C_0$ and $4^5$ to $S_6$, we know we can get another solution pair just as mathematically valid, but more sensible, given the problem. We finally get $C_0=15625-4=15621$ and $S_6=1024-1=1023$, which concur with the solutions we obtained by brute-force.

Taking a crack at diophantine equations with R

We're not about to give up on trying to find a more efficient and possibly vectorized way to solve this problem using R. I looked for an R package to solve diophantine equations and found that the "elliptic" package has a function called "congruence" which deals with linear diophantine equations of the type $mx+by=l$.

To be honest, I could never get this function to give me the solution to equation (5b). Poring through the package's vignette, I encountered some frightening number theoretic stuff, so given my ignorance and aversion to number theory⁵, I gave up on using this package but before doing so, I inspected the R code for the congruence function:

function (a, l = 1) 
{
    l <- as.integer(l)
    m <- as.integer(a[1])
    n <- as.integer(a[2])
    zero <- as.integer(0)
    one <- as.integer(1)
    if (m == zero & n == one) {
        return(NULL)
    }
    if (m == one & n == zero) {
        return(c(NA, 1))
    }
    if (m == 1) {
        return(rbind(c(a), c(1, n + l), c(0, l)))
    }
    if (n == 1) {
        return(rbind(c(a), c(m - l, 1), c(l, 0)))
    }
    q1 <- which((+l + (1:m) * n)%%m == zero)
    if (!any(q1)) {
        q1 <- NA
    }
    q2 <- which((-l + (1:n) * m)%%n == zero)
    if (!any(q2)) {
        q2 <- NA
    }
    out <- rbind(a, cbind(q1, q2))
    rownames(out) <- NULL
    colnames(out) <- NULL
    return(out)
}

Hmmm, the code above doesn't look so inscrutable. I mean, it uses the expected integer modulo operator in R (the %%'s in the code). Maybe if I knew a little bit more about diophantine equations I could reimplement this function from scratch and get it to work. So I looked up the Wikipedia article on diophantine equations and found the following theorem:

The linear diophantine equation $ax+by=c$, where $a$,$b$,$c$,$x$ and $y$ are integers, has a solution if and only if $c$ is a multiple of the greatest common divisor of $a$ and $b$. Moreover, if $(x, y)$ is a solution, then the other solutions have the form $(x + kv,y − ku)$, where $k$ is an arbitrary integer, and $u$ and $v$ are the quotients of $a$ and $b$ (respectively) by the greatest common divisor of $a$ and $b$.

This theorem is really handy for two reasons: (1) it gives us the conditions for the existence of solutions to the diophantine equation and (2) it gives us a mechanism for generating infinitely more solutions if we are able to find one solution. This last point, finding one solution, is what I would have to tackle in my implementation of the code.

I decided that I would look for a first solution by separating each variable to one side of the $ax+by=c$ equation and getting $ax=c-by$ or $by=c-ax$. Then, taking $ax=c-by$ for example, I would have my code try out a range of values for $y$ to verify when the expression $c-by$ is divisible by $a$. Intuitively, I realized that the range of values to try for $y$ would be all integers between 0 and $a$ (when using $by=c-ax$, I would have to try all integers between 0 and $b$ for $x$), so it's not like I would have to search through an infinitely large solution space.

Having found values of $x$ or $y$, which are guaranteed to exist if the solution existence conditions in the Wikipedia theorem are met, I would substitute this value in the equation to obtain the other value. Finally, I would have to implement a function for finding the greatest common divisor. All in all, I came up with the following code:

gcd <- function(m,n) {
  #greatest common divisor function
  m <- as.integer(abs(m))
  n <- as.integer(abs(n))
  divm <- (1:m)[which(m%%(1:m)==0)]
  divn <- (1:n)[which(n%%(1:n)==0)]
  return(max(divm[which(divm%in%divn)]))
} #gcd

diophantine <- function (m,n,l) {
  #solves the linear diophantine equation: mx+ny=l
  l <- as.integer(l)
  m <- as.integer(m)
  n <- as.integer(n)
  gcd <- gcd(m,n)
  if (l%%gcd!=0) return(NA) #no solution exists
  #otherwise obtain one solution (solx,soly)
  if (abs(m)>abs(n)) {
    #use multiples of the lower magnitude n
    mult <- (0:n)[which((l-m*(0:n))%%n==0)]
    solx <- mult[1]; soly <- (l-m*solx)%/%n
  } else {
    #use multiples of the lower magnitude m
    mult <- (0:m)[which((l-n*(0:m))%%m==0)]
    soly <- mult[1]; solx <- (l-n*soly)%/%m
  }
  return(rbind(c(solx,soly),c(n%/%gcd,-m%/%gcd)))
} #diophantine

diophantine(1024,-15625,11529)

##        [,1]  [,2]
## [1,]  15621  1023
## [2,] -15625 -1024

As you can see, it works like a charm. This is a code I can be proud of (look ma, no for/while loops!). My function not only gives me the solution in the first row of the results matrix, but gives me the coefficients whose multiples I would have to add to $x$ and $y$ to get other solution pairs. I haven't really proven that it works for all possible values of m, n and l, but I tried a few example problems which I solved by hand and it seems to be ok. Needless to say, I invite my readers to put this code under close scrutiny (you know what they say about many pairs of eyes seeing more than one). Perhaps some of you will find errors in the code, or a test-case which "breaks" the code. Also, I invite my readers to come up with an R function for solving general 3 or more variable linear diophantine equations. For further discussion or questions of this, I invite you to participate in the comments for this post.

Notes

I guess I got carried away with that pitchfork and torch bearing angry mob bit, but I do wish we Venezuelans would organize into such a mob to evict Maduro from power. Ironically, it seems like in the days when there were no social networks or smartphones, people somehow organized into action more effectively - for example during the 1958 ousting of the Perez-Jiménez dictatorship. Now, we somehow expect the Internet to "do things for us", while we stay at home.
When teaching math, it is never a good idea to just go ahead and show students the "clean" mathematical solution. Teaching math is all about modeling the problem-solving process, which in real life, is usually a messy, trial-and-error roundabout affair. When we have become experts in making mistakes and following dead-end leads, we will have gained sufficient familiarity with the problem and only then can we hope to come up with a clean solution.
R is not particularly fast when running for/while loops. What R programmers always strive to do is to vectorize their code. Vectorization consists of avoiding for/while loops all together and working with vectors and functions of vectors. In the next script, we will be using vectorized code. So you could say that in R, using for or while loops in brute-search is the most brutish way to go about it.
I wonder what negative coconuts taste like...
My apologies to number-theorists (they would cringe at a blog like this anyways), but number theory seems to me like the most useless branch of mathematics. I mean, of what practical use is Fermat's Last Theorem or Goldbach's Conjecture? Sure, probably the most difficult and even unresolved or unproven math problems are buried deep within number theory. Sure, solving these problems would be an instant tiket to mathematical fame and fortune - the stuff of legends. As for me, I prefer to steer clear of things for which I just don't have the talent, so that I won't suffer the same fate as Uncle Petros. Uncle Petros, by the way, is the main character in an extremely entertaining novel about mathematics by Apostolos Doxiadis, called Uncle Petros and Goldbach's Conjecture

Bibliography

Diophantine equation. (2016, July 26). In Wikipedia, The Free Encyclopedia. Retrieved September 4, 2016, from https://en.wikipedia.org/wiki/Diophantine_equation
HANKIN, R. K. S. (February 2006). Introducing Elliptic, an R package for Elliptic and Modular Functions. Journal of Statistical Software, Volume 15, Issue 7.
PADILLA, T. and McPARTLAN, P. [Numberphile] (2015, April 29). Monkeys and Coconuts - Numberphile [Video file]. Retrieved September 4, 2016, from https://www.youtube.com/watch?v=U9qU20VmvaU
.
R DEVELOPMENT CORE TEAM (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.