Depth First Search for Hard Sudoku
Badri Toppur, TAPMI, Manipal
badri.toppur@gmail.com
This paper demonstrates an implementation of depth
first search (DFS) to fill all the empty cells in the sudoku grid correctly.
There are easy versions with dense matrices, hard versions
that are sparse matrices, and in the worst case is the blank matrix. We present a scheme that
is based up on keeping track of ones guesses . This is complemented by a finite
set of principles for testing whether a row, column or subsquare is complete
without duplication of a digit. Whenever one encounters infeasibility due to
cell assignments following a
guess, or a sequence of consecutive guesses, one takes back all the temporary
cell assignments up to the previous
guess and trys an alternative spot for the guess in the same subsquare or try a
new digit in a new subsquare.
The completed sudoku solution reveals an immediate
application as a set covering problem in which there are n2 job
categories with n2 persons in each category and 3n2
districts that need to share these human resources. The sudoku solution is the
assignment with the best utilization, such that no two workers are assigned the
same job and no two jobs are assigned to the same worker.
1. Introduction
Sudoku a game based upon a grid, appeared first as
2. Strategy for Easy Sudoku
I |
II |
III |
IV |
V |
VI |
VII |
VIII |
IX |
Figure 1 Numbering the sub-squares Figure 2 The puzzle
A simple strategy exists for solving the easy version of this game. Suppose the nine subsquares are numbered from left to right from the top as in Figure 1. Focus on three subsquares in a row or in a column. These will be made up of three rows or three columns depending on which manner we pick the three 3 ΄ 3 subsquares. Suppose we pick the three subsquares on the top of the grid. Scanning the three topmost rows in Figure 2 one can see there is the digit 4 in each of the three rows. They are in different rows and not confronting each other. There is the digit 9 in the second subsquare and a 9 in the third subsquare, but there is no 9 in the first subsquare. So we can conclude that a 9 is required in the first subsquare. Furthermore it cannot be in the second or third row, but must be in the first row given the object of the game. The first row in subsquare I has two empty cells and an 8 in the third cell. Into which cell will we place the 9? Looking down the first column one can see a 9 in the first cell in subsquare IV, so this rules out the first square. Therefore the 9 must go into the second cell of subsquare I. Now we know where the 9 is in the first subsquare.
Looking down the first three columns, that is subsquares I, IV, and VII; there is a 9 missing in the seventh subsquare. Where does the 9 go in this subsquare? One can get a clue from the 9 that we have just entered in subsquare 1, and the 9 in the eight subsquare. To avoid the 9 in seventh row, and the two 9s in the first two columns, we have to put the 9 in the eight row to the right of the 6 in subsquare VII.
If one is unable to make headway with the 9 then try another number. If all the remaining numbers fail to pin any new entries, move to a new row or column of subsquares. After a few permanent assignments one can return to 9 or whichever number one began with.
In more complex versions of the game, there are usually too few numbers provided for one to obtain the complete solution using just the above strategy. One needs to take guess at certain points to get ahead at all. Typically there one to three choices for a particular digit in a subsquare. After making one of these choices, one keeps assigning temporary labels from this point onwards. If no contradiction is reached, and one reaches the limits of deduction based upon that guess, we take another guess, and continue the process. If a contradiction to the rules is observed one has to undo all the temporary labels and make another choice for that particular digit.
Thus we need to keep track of the sequence of guesses, and also the sequence of temporary cell assignments that make up each guess, so that we can backtrack to the position in the decision tree, where we may have chosen unwisely. To each digit entered whose status is temporary, one needs a subscript indicating its position in the sequence of temporary cell assignments made in the current drive. What if a second guess is required. One needs a superscript to keep track of the guess number. If we realize that we have made a mistake in the cell assignments, we can backtrack with the aid of this subscript and superscript. Permanent cell assignments are distinguished by the absence of these subscripts or superscripts. Book-keeping also has to include the coordinates of the cell for the guesses, to avoid repetition after back-tracking. All this information is maintained separately in appropriate data structures.
3. Depth First Search for Hard Sudoku
We have presented a labelling scheme with temporary and permanent cell assignments with descent down a search tree, with branching nodes denoting multiple conditional guesses about the cell assignment of a particular digit. This descent is followed by backtracking back up to the root node, whenever one encounters infeasibility, i.e. a contradiction between the temporary cell assignment and some previous temporary or permanent cell assignment. We present an example in this section and dozens more in the appendix. In terms of computational time and other performance measures these are to be used to empirically determine heuristic rules for deciding which digit to branch upon and in which subsquare to place the digit.
Example 1
2 |
|
|
|
5 |
|
|
8 |
|
|
|
|
8 |
|
4 |
|
|
|
4 |
|
|
|
|
7 |
5 |
|
|
|
|
3 |
6 |
|
|
8 |
|
|
8 |
|
|
7 |
|
2 |
|
|
9 |
|
|
6 |
|
|
5 |
4 |
|
|
|
|
5 |
9 |
|
|
|
|
7 |
|
|
|
4 |
|
1 |
|
|
|
|
4 |
|
|
6 |
|
|
|
1 |
2 |
|
|
|
5 |
6 |
|
8 |
4 |
5 |
|
|
8 |
|
4 |
|
|
|
4 |
|
|
2 |
|
7 |
5 |
|
|
|
|
3 |
6 |
4 |
9 |
8 |
|
5 |
8 |
5 |
4 |
7 |
|
2 |
|
|
9 |
|
|
6 |
|
8 |
5 |
4 |
|
|
|
|
5 |
9 |
2 |
|
|
4 |
7 |
|
|
|
4 |
7 |
1 |
|
5 |
8 |
|
4 |
|
5 |
6 |
|
|
|
1 |
Figure 3 HARD 9΄ 9 problem Figure 4 Without Guesswork
Consider the puzzle in Figure 3. One can without much look ahead and without any guesswork get to the position indicated in Figure 4. An automated procedure, that in the absence of robotic vision, necessarily proceeds in the dark, will require a labelling scheme from the very first square that is filled regardless of how obvious things are by inspection. At this stage, one must take a guess. One can take a guess about any digit missing in some subsquare. A guess that helps clear some of the congestion of blank cells is preferable, to one that is appears ornamental. In an automated search procedure, the multiplicity of guesses corresponding to such entries that do not quickly help one locate other digits is not appear to be a critical factor.
2 |
6212 |
|
3210 |
5 |
6 |
|
8 |
4 |
5 |
3211 |
|
8 |
|
4 |
|
|
|
4 |
811 |
|
2 |
|
7 |
5 |
|
|
126 |
725 |
3 |
6 |
4 |
9 |
8 |
223 |
5 |
8 |
5 |
4 |
7 |
|
2 |
|
|
9 |
922 |
221 |
6 |
128 |
8 |
5 |
4 |
724 |
329 |
|
127 |
5 |
9 |
2 |
813 |
|
4 |
7 |
|
|
|
4 |
7 |
1 |
|
5 |
8 |
715 |
4 |
812 |
5 |
6 |
314 |
|
|
1 |
Figure 5
The first guess in this example is in the third row and second column of subsquare I as in Figure 5: The digit 8 is definitely required in the third row. The two choices are cell (3, 2) and (3,3), either one of them will be denoted 811 to indicate that this is the first entry in the sequence corresponding to the first guess. Conditional to this guess, one can deduce another four digits in sequence, upto the cell in the ninth row and first column: 715. This is the end of the run of deductions based upon this guess. One must guess again, and the second guess is about the location of the digit 2 in subsquare IV. There are again two choices for this 2. It must go into cell (4,2) or (6,2). Let us go with the guess that the ideal position for the 2 designated as 221 is in the cell (6, 2). In the run of logical deductions that follow, one can go up to eleven more guesses before placing a temporary 6 in the same row as a permanently placed 6. These twelve temporary entries are also indicated in Figure 5. We next roll back the last twelve temporary labels constituting the second guess, and take another branch at the first position. Recall that our alternate choice for the position of the digit 2 is in the cell (4,2). We get up to 128 in cell (7,2) with no contradiction. This leads to a third guess starting arbitrarily in the seventh row and first column about the digit 3. This chain of deductions leads to two sixes in the seventh column as apparent in Figure 6, contradicting the rules of the puzzle.
2 |
335 |
|
138 |
5 |
6 |
|
8 |
4 |
5 |
635 |
|
8 |
|
4 |
|
|
|
4 |
811 |
|
2 |
|
7 |
5 |
|
|
124 |
221 |
3 |
6 |
4 |
9 |
8 |
723 |
5 |
8 |
5 |
4 |
7 |
137 |
2 |
6310 |
139 |
9 |
926 |
725 |
6 |
336 |
8 |
5 |
4 |
|
|
331 |
128 |
5 |
9 |
2 |
813 |
634 |
4 |
7 |
633 |
927 |
222 |
4 |
7 |
1 |
332 |
5 |
8 |
715 |
4 |
812 |
5 |
6 |
314 |
|
|
1 |
Figure 6
We therefore roll back all the moves from the third guess and try the only other possible location for the three in subsquare VII, that is in cell (8,1). This leads to the situation depicted in Figure 7.
2 |
|
|
|
5 |
6 |
|
8 |
4 |
5 |
|
|
8 |
|
4 |
|
|
|
4 |
811 |
|
2 |
|
7 |
5 |
|
|
124 |
221 |
3 |
6 |
4 |
9 |
8 |
723 |
5 |
8 |
5 |
4 |
7 |
|
2 |
|
|
9 |
926 |
725 |
6 |
|
8 |
5 |
4 |
|
|
632 |
128 |
5 |
9 |
2 |
813 |
334 |
4 |
7 |
331 |
927 |
222 |
4 |
7 |
1 |
633 |
5 |
8 |
715 |
4 |
812 |
5 |
6 |
314 |
|
|
1 |
Figure 7
We require a fourth guess here. If we try 341 in cell (6, 4) in subsquare V, we soon reach a contradiction in column seven with the digit 6.
2 |
|
|
|
5 |
6 |
|
8 |
4 |
5 |
|
|
8 |
|
4 |
|
|
|
4 |
811 |
|
2 |
|
7 |
5 |
|
|
124 |
221 |
3 |
6 |
4 |
9 |
8 |
723 |
5 |
8 |
5 |
4 |
7 |
143 |
2 |
644 |
342 |
9 |
926 |
725 |
6 |
341 |
8 |
5 |
4 |
|
|
632 |
128 |
5 |
9 |
2 |
813 |
334 |
4 |
7 |
331 |
927 |
222 |
4 |
7 |
1 |
633 |
5 |
8 |
715 |
4 |
812 |
5 |
6 |
314 |
|
|
1 |
Figure
8
We can try putting this 3 of the fourth guess in cell (5, 5).
2 |
645 |
|
343 |
5 |
6 |
|
8 |
4 |
5 |
344 |
|
8 |
|
4 |
|
|
|
4 |
811 |
|
2 |
|
7 |
5 |
|
|
124 |
221 |
3 |
6 |
4 |
9 |
8 |
723 |
5 |
8 |
5 |
4 |
7 |
341 |
2 |
|
|
9 |
926 |
725 |
6 |
142 |
8 |
5 |
4 |
|
|
632 |
128 |
5 |
9 |
2 |
813 |
334 |
4 |
7 |
331 |
927 |
222 |
4 |
7 |
1 |
633 |
5 |
8 |
715 |
4 |
812 |
5 |
6 |
314 |
|
|
1 |
Figure 9
This also leads to a contradiction. We have explored every node on this branch of the tree. Thus our very first placement of 8 in cell (3,2) of subsquare I (Figure 5) is improper. The only alternative in this case is to place the very first 8 in cell (3,3) in subsquare I as shown in Figure 11. We therefore take the other branch right at the start of the search and place the digit 8 in cell (3,3), leaving us with the digits 1, 2, 3, 6, 7 and 9 that need to be filled in. Let the second guess be about the position of digit 1 being in cell (6,2) in subsquare IV, also shown in Figure 10.
2 |
|
|
125 |
5 |
6 |
|
8 |
4 |
5 |
|
126 |
8 |
|
4 |
|
|
|
4 |
|
811 |
2 |
|
7 |
5 |
|
127 |
|
|
3 |
6 |
4 |
9 |
8 |
123 |
5 |
8 |
5 |
4 |
7 |
122 |
2 |
|
617 |
9 |
|
121 |
6 |
324 |
8 |
5 |
4 |
|
|
115 |
812 |
5 |
9 |
2 |
314 |
616 |
4 |
7 |
|
|
|
4 |
7 |
1 |
|
5 |
8 |
|
4 |
|
5 |
6 |
813 |
|
|
1 |
Figure 10
The two 1s in the last column are a contradiction. So we take back the second guess about the position about the 1 in subsquare IV, instead putting the 121 in cell (4,2). This leads to the completion of the puzzle as seen in Figure 11. The depth first search tree is displayed in Figure 12.
2 |
7223 |
1228 |
3211 |
5 |
6 |
9225 |
8 |
4 |
5 |
6222 |
9227 |
8 |
1230 |
4 |
7224 |
3232 |
2220 |
4 |
3226 |
811 |
2 |
9229 |
7 |
5 |
1231 |
6221 |
724 |
121 |
3 |
6 |
4 |
9 |
8 |
223 |
5 |
8 |
5 |
4 |
7 |
329 |
2 |
127 |
617 |
9 |
925 |
222 |
6 |
1210 |
8 |
5 |
4 |
726 |
328 |
115 |
812 |
5 |
9 |
2 |
314 |
616 |
4 |
7 |
6213 |
9216 |
2215 |
4 |
7 |
1 |
3217 |
5 |
8 |
3214 |
4 |
7212 |
5 |
6 |
813 |
2218 |
9219 |
1 |
Figure 11
Figure 12
4.
Inductive Argument for Higher Dimensional Sudoku
The same technique must work for 4΄4, 9΄9, 16΄16, 25΄25 and in general n2΄n2 sudoku instances. Figures 13 and 14 show an easy instance of the 4΄4 case and its solution.
1 |
|
|
4 |
4 |
|
|
|
|
4 |
|
2 |
|
|
|
|
1 |
2 |
3 |
4 |
4 |
3 |
2 |
1 |
3 |
4 |
1 |
2 |
2 |
1 |
4 |
3 |
Figure 13 Figure 14
5. A
Possible Application
In Figure 15 there is an assignment of workers to jobs indicated by the numeral 1. Worker A is assigned to job 1, worker B is assigned to job 6, worker C is assigned to job 3 and so forth. This assigment is not unique, but if there is some cost implicit with assigning a particular worker to a particular job, there will be a unique or a handful of solutions.
Returning to Su Do Ku - suppose we have 9 categories of jobs with 9 workers available in each category. Say these workers are electrician (1), plumber (2), carpenter (3), mechanic (4), locksmith (5), doctor (6), cook (7), postman (8), and tailor (9). Furthermore suppose we want to assign these 81 workers to 27 villages in the district so that each village gets the services of each type of worker. These villages may be labelled A, .,J, 1, ,9 and I, IX. In Figure 16, each row, each column and furthermore each of the nine sub-squares represents one village. Although it is a 2d grid one is thinking of three coordinates in this scheme. One can see that a number of assignments have already been made. For example village 1 has a designated plumber, a tailor, a cook and a postman. Village I has a designated plumber, a mechanic and a postman. One must simply fill in the remaining squares heeding the above mentioned restrictions on multiple assigments and absence of any assigments whatsoever.
6. References
[1] Will
Shortz, 2005, Sudoku Easy to Hard, Rupa and Co. Released originally by St.
Martins Press,
[2] R. E.
Tarjan. 1983, Data Structures and Network Algorithms,