Professional Documents
Culture Documents
Alan Cheng
4/1/2012
Table of Contents
I. Abstract 1. Introduction 2. Review of Literature 2.1 Logic Synthesis
2.1.1 Boolean Algebra and Switching Algebra 2.1.2 Truth Tables 2.1.3 Karnaugh Maps 2.1.4 Commonly Used Gates 2.1.4.1 AND Gate 2.1.4.2 OR Gate 2.1.4.3 NOT Gate 2.1.4.4 XOR Gate 2.1.4.5 Multiplexer 2.1.5 Arithmetic Gates 2.1.5.1 Adder 2.1.5.2 Multiplier 2.1.6 Clock 2.1.7 Flip-Flops and Latches 2.1.7.1 D-Latch 2.1.7.2 D-Flip Flop
2.3 Cryptography
2.3.1 An Introduction to Cryptology 2.3.2 Cryptography in History 2.3.3 Modern Cryptography
3. Goal, Question, and Hypothesis 3.1 Goal 3.3 Hypothesis 4. Design and Setup of Hardware 4.1Constraints of Hardware 4.2 Terasic DE2-115 Development Board 4.3 ModelSim 4.4 Quartus II 5. TWO+TWO=FOUR Oracle 5.1 Overview of the Oracle 5.2 TWO+TWO=FOUR Problem 5.3 The Selector 5.4 The Permuter 5.5 Arithmetic Checker
5.6 The RAM Module 5.7 The RAM Counter 5.8 The LCD Display 5.9 Verification for the TWO+TWO=FOUR Oracle 5.10 TWO+TWO=FOUR Problem Experiment 5.11 Conclusion for the TWO+TWO=FOUR Problem 6. Oracle for Arbitrary Cryptarithmetic Problems 6.1 Overview of the Oracle 6.2 The Input of the Oracle 6.3 The Counter 6.4 The Input Validity Checker 6.5 The Arithmetic Checker 6.6 Software Variation of the Oracle for Arbitrary Cryptarithmetic Problems 6.7 Verification of the Oracle for Arbitrary Cryptarithmetic Problems 6.8 Oracle for Arbitrary Cryptarithmetic Problems Experiment
6.8.1 TWO+TWO=FOUR 6.8.2 ONE+ONE=TWO 6.8.3 TOO+TOO=LONG 6.8.4 OOP+LLP=PRGM
6.9 Conclusion for the Oracle for Arbitrary Cryptarithmetic Problems 6.10 Future Extensions for the Oracle for Arbitrary Cryptarithmetic Problems 7. Oracle for Single Solutions of Arbitrary Cryptarithmetic Problems 7.1 Overview of the Problem 7.2 Efficiency of Current Oracle Design
7.4 Discussion of Different Designs 7.5 Random-Start Algorithm 7.6 Calculation Time Analysis of the Current Oracle Design and the Random-Start Algorithm
7.6.1 TWO+TWO=FOUR 7.6.2 ONE+ONE=TWO 7.6.3 TOO+TOO=LONG 7.6.4 OOP+LLP=PRGM
7.7 Conclusion of Oracle for Single Solutions of Arbitrary Cryptarithmetic Problems 7.8 Future Extensions 8. Conclusion 9. References Appendix
I. Abstract
Cryptography is a major area of research in the government. Recently, however, secret codes are getting harder to decode with standard software, and therefore a different method is needed. The purpose of this project is to use Field-Programmable Gate Arrays (FPGAs) for fast prototyping of oracles to solve constraint satisfaction problems, which include cryptography. Cryptarithmetic problems are a subset of constraint satisfaction problems, which are commonly used to test hardware. Cryptarithmetic problems consist of characters, in which any value 0-9 can be used to substitute into each character. The goal is to find a successful encoding which would be able to solve the problem. The hardware used is FPGAs. FPGAs are very flexble development platform which can be reconfigured (by programming in hardware-description languages) for new circuits. There were two problems in which the hardware was tested on, a specific problem of TWO+TWO=FOUR, and a problem with arbitrary puzzles. In result, the hardware performed over three million times faster than the software in the TWO+TWO=FOUR problem, and 50 times faster for the arbitrary cryptarithmetic problems. In conclusion, hardware technologies are a faster approach to solving cryptarithmetic problems (and constraint satisfaction problems). This occurred in both experiments by drastic amounts.
1. Introduction
Cryptography has been an important area of research to the government, military, and large corporations. In the past, during World War II, cryptography was vital in winning the war. The Allies were able to decode German secret codes (using methods of cryptography) which revealed the plans of their military. If this was not done, the war would have lasted much longer. Currently, there are chances for war. If such war occurs, the lives of many civilians could be potentially saved by finding information about pre-planned attacks. However, current software technologies are slow and will not be able to decode strongly secured messages in a practical amount of time. As a result, other methods of technologies are required in order to be able to decode the messages in a reasonable amount of time. Field-programmable Gate Arrays and other hardware technologies are one alternative to such limitations in software.
2. Review of Literature
2.1 Logic Synthesis
2.1.1 Boolean Algebra and Switching Algebra
Boolean algebra is a two-valued algebraic system that was invented by George Boole in the book An Investigation of the Laws of Thought. It is a variant of algebra, in which the only values operated on was 0 (for false) and 1 (for truth). It is used in mathematic logic, computer programming, and digital logic. In the 1930s, Claude Shannon observed that Boolean algebra can be used for logic circuits and gates. Thus, he introduced switching algebra, a variant of Boolean algebra which can be used for analyzing logic circuits. In logic synthesis, both Boolean algebra and switching algebra are commonly used.
B C X 0 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0
POS) with size 2n.(more description about groups and Hamming Distance) An example of finding a POS function is shown in figure 2.1.3.2. Input values (for cd) In graycode encoding cd 00 ab 00 0 Input values (for ab) In graycode encoding 01 11 10 01 11 10
Inputs
1 1 1 1
0 1 0 0
0 0 1 1
X Output(s) Output value for corresponding inputs
0 1 0
ac bc
cd 00 ab 00 01 11 10
01
11
10
1 1 1 0
1 1 1 0
0 0 0 0
1 0 0 1
X
bcd
F(x) = ac + bc + bcd
Figure 2.1.3.2: Example Karnaugh Map with minterms labeled and function at bottom. Notice that in Figure 2.1.8.2, the group of six is not grouped as one minterm. This is n because in binary logic, there are 2 combinations for n bits, and 6 is not a result of 2n. It can also be noticed that there can exist groups which wrap around the edge of the Karnaugh Map. It is allowed because 00 and 10 are in Hamming Distance one.
2.1.4.1 AND Gate The AND gate is one of the most basic gates. It is commonly used in deriving other larger and complex gates and circuit. The output of the AND gate is 1 when both inputs are 1, and 0 otherwise. In the equation representation of the AND operator, the plus sign + is used. A X B A B 0 0 0 1 1 0 1 1 X 0 0 0 1
X=A+B
Figure 2.1.4.1: (Left) The graphical symbol for the AND gate, (Center) the truth table for the AND gate, (Right) the equation representation of the AND gate.
2.1.4.2 OR Gate The OR gate is another basic gate. Like the AND gate, it is commonly used in deriving other larger and complex gates and circuit. The output of the OR gate is 1 when either inputs are 1, and 0 otherwise. In the equation representation of the OR operator, the multiplication sign is used. A X B A B 0 0 0 1 1 0 1 1 X 0 1 1 1
X=AB
Figure 2.1.4.2: (Left) The graphical symbol for the OR gate, (Center) the truth table for the OR gate, (Right) the equation representation of the OR gate.
2.1.4.3 NOT Gate The NOT gate (or inverter) is another basic gate. Unlike both the AND and OR gate, the NOT gate has only 1 input. The function of the NOT gate is to invert the input. If the input is 0, the output is 1, and vica-versa. There several different notations for the NOT operation, and two are described in Figure 2.3.3. A X A X 0 1 1 0
X=A X = ~A X=A
Figure 2.1.4.3.1: (Left) The graphical symbol for the NOT gate, (Center) the truth table for the NOT gate, (Right) the equation representations of the NOT gate. Sometimes, the when the not gate is appended to the input or output of a logic gate, it is denoted with an empty circle.
=
Figure 2.1.4.3.2: Different notations of a not gate appended to a AND gate. 2. 1.4.4 XOR Gate Although the XOR (also called the EXOR gate or the Exclusive OR) gate is a combination of AND, OR, and NOT gates (shown in figure 2.1.4.4.1), it is fundamental to several logic synthesis methods such as ESOP minimization. The XOR gate is also the most basic gate in quantum technologies. In the XOR gate, the output is 1 when only one of the inputs has value 1, and 0 otherwise. The notation of the XOR operator is .
A B X
Figure 2.1.4.4.1: Decomposition of XOR gate to primitive AND, OR, and NOT gates. A X B A B 0 0 0 1 1 0 1 1 X 0 1 1 0
X=AB
Figure 2.1.4.4.2: (Left) The graphical symbol for the XOR gate, (Center) the truth table for the XOR gate, (Right) the equation representation of the XOR gate.
2. 1.4.5 Multiplexer The multiplexer is another gate produced by combining the primary gates (AND, OR, NOT). The function of the multiplexer is equivalent to a selector, where there is two inputs, and a third input wire determines which of the first two inputs are chosen for the output. Figure 2.1.4.5.1 shows the decomposition of the multiplexer to primary gates.
A
Control
A B
Control
Figure 2.1.5.5.2: (Left) The symbol for the multiplexer, (Right) the corresponding truth table to the multiplexer.
X Half Sum Y
Carry-Out
A B 0 0 0 1 1 0 1 1
HS CO 0 0 1 0 1 0 0 1
Figure 2.1.5.1.1: Half Adder with truth table A full adder enables there to be inputs of arbitrary bits to be added together. For each bit addition, a carry can also be added as well. When full adders are combined, they create a ripple adder, which can be used to add numbers of multiple bits.
X Y Carry-In Sum
Carry-Out
X Y CIN S COUT 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1
X C4
Y C3
Y C2
Y C1
Y C0
COUT CIN S
COUT CIN S
COUT CIN S
COUT CIN S
S3
S2
S1
S0
2.1.5.2 Multiplier The function of the multiplier is to perform the multiplication operation. The circuit is constructed based on multiplication using addition. Thus, the circuit shown in figure 2.1.5.2.1 is constructed using half adders. Note that the multiplication is done using binary inputs, instead of integers.
X2
Y1
X2
X1
X2 X1 X Y2 Y1 Z4 Z3 Z2 Z1
Half Adders
Z4
Z3
Z2
Z1
2.1.6 Clock
A clock is a waveform generator. It is an input to the circuit which continuously switches between the values 0 and 1. Several logic gates used in state machines require the use of a clock pulse, where the gate is activated by a rising edge (from 0 to 1) or a falling edge (from 1 to 0). Figure 2.1.7.1 describes the clock waveform.
tper
2.1.7.1 D-Latch The D-latch is a latch which simply stores bits of information. The circuit is derived from the SR latch, and is shown in figure 2.1.7.1.1. It is used in asynchronous circuits and can also be used to build D Flip-Flops. D Q
Enable
D En
Q Q
D 0 0 1 1
En 0 1 0 1
Q Last Q Last Q 0 1
Q Last Q Last Q 1 0
2.1.7.2 D-Flip Flop The D Flip-Flop is simply a D-latch in synchronous logic. The inputs are stored on each clock pulse instead of when the input changes. This flip-flop is derived from D-latches, where only a reader for the clock slope is needed (positive edge and negative edge). The notch in figure 2.1.7.2.1 represents a clock input. D 0 0 1 1 Clk X X Rising Rising Q Last Q Last Q 0 1 Q Last Q Last Q 1 0
Q Q
2.2.1.1 Satisfiability Satisfiability (SAT) is the problem of testing if given variables of a Boolean function will result in the output of true or 1. Satisfiability is a NP-complete problem, meaning that there is no algorithm which can solve it in polynomial time. Figure 2.2.1.1 shows an example of the problem.
2.2.1.2 Graph Coloring Graph coloring is a problem consistent of the assignment of nodes. Given is a graph (as shown in figure 2.2.1.3) where arbitrary colors can be assignment to any node. However, no adjacent nodes can be assigned the same color. The goal is to have the fewest number of colors. Similar to satisfiability, graph coloring is NP-complete.
2.2.1.3 Cryptarithmetic Problems Cryptarithmetic problems are mathematical problems consisting of an equation with unknown numbers. These numbers are displayed in the form of letters. The goal of the problem is to find a successful encoding for each letter. Note that the same letter contains the same value, while different letters have to contain different values.
S E N D + M O R E M O N E D Y D D
Inputs
Global AND
Output
Figure 2.2.2.1: General oracle for a cryptarithmetic problem.
2.3 Cryptography
2.3.1 An Introduction to Cryptography
Cryptography is the study and practice of secure communication. It is involved with the area of mathematics, electrical engineering, and computer science. Typically, cryptography consists of a key, which acts as a translator between the original text and the encoded text. Currently, cryptography is used in many devices, such as banks, ATM cards, and computer passwords.
Cryptography has always been important in both the use of the government, military, and now the internet. Although handwritten cryptography was practiced by the Greeks and in the Medieval Ages, it was not until World War I when cryptography became important. The first methods of encryption (making the secret codes) or decryption (decoding the secret codes) were not introduced until the 19th century. Later, in World War I, The United Kingdom military was able to decrypt German naval codes. However, the most notable case of cryptography in this time was the decryption of the Zimmermann Telegraph, with prompted the United States entry into World War I. World War II was the most common example of use for cryptography. The Germans had started to use an electrical rotor machine known as an Enigma. The Allies, however, were able to decrypt Enigma, as well as advancing technology of decryption through many cryptographers including Alan Turing (the founder of modern computers). America was also able to decrypt Japanese naval codes, which lead to the famous victory of the Battle of Midway.
2.4.2 Hardware description languages Hardware descriptions languages are used in order to design electrical circuits. They are often used in programming FPGAs. A program in these languages can describe circuit operation,
design, and organization. The most common hardware description languages are Verilog-HDL and VHDL.
2.4.3 Verilog-HDL Verilog-HDL is a hardware description language used to program electronic systems. It is based off the popular language of C. Verilog was created by Phil Moorby and Prabhu Goel during 1984. Since then, Verilog has become recognized as an IEEE standard, with the most recent extension in 2005.
3.2 Hypothesis
I hypothesize that making an efficient oracle for arbitrary cryptarithmetic problems is possible. The speed of the oracle on a Field Programmable Gate Array would also be more efficient that the computer program because FPGAs is hardware, unlike software on the computer. Working directly on the circuit would make it more efficient than the software on the computer, which accesses hardware. Also, with a FPGA, calculation can be done in parallel. However, programming languages are sequential, and if multi-threading was implemented, there still will not be as much parallelism as a logic circuit.
Constraints:
Cost Hardware cannot be expensive (high-performance FPGAs over $1000) Computer HP Pavilion dm4 with core i5 processor (2.4 GHz) Features on FPGA Must be able to display the solutions of the given cryptarithmetic problem.
114,480 logic elements (LEs) 3,888 Embedded memory (Kbits) 266 Embedded 18 x 18 multipliers 4 General-purpose PLLs 528 User I/Os 128MB (32Mx32bit) SDRAM 2MB (1Mx16) SRAM 8MB (4Mx16) Flash with 8-bit mode 32Kbit EEPROM 18 switches and 4 push-buttons 18 red and 9 green LEDs Eight 7-segment displays 16x2 LCD module
4.3 ModelSim
Mentor Graphics ModelSim is an integrated development environment (IDE) as well as a simulator for hardware description languages. It is useful for writing and debugging the modules. This is because the simulator provides waveforms that one could analysis to find early errors in their circuits. However, although simulation may prove to be working, the module may not work on the actual hardware.
4.4 Quartus II
Alteras Quartus II is a compiler for any Altera FPGA. This means that the software would automatically download any Verilog-HDL program onto the FPGA. The Quartus II software is used in this project for running Verilog programs (from ModelSim) on the Terasic DE2-115 development board.
5. TWO+TWO=FOUR Oracle
5.1 Overview of the Oracle
The purpose of the TWO+TWO=FOUR oracle is to be able to find all possible combination specifically for the TWO+TWO=FOUR problem (presented is section 5.2). The answers for the problem will be displayed on a LCD display. There are three major parts of the oracle: the selector, the permuter, and the arithmetic checker. In the oracle, there are also smaller circuits in order to store and display to combinations. These are the RAM (memory) module, the RAM counter, and the LCD display module
+
Figure 5.2.1: TWO+TWO=FOUR Problem
T W O T W O
F O U R
The purpose of the selector is to select values for each different letter. In numerical terms, this means that five out of nine possible numbers (0-9 excluding 1, explained in section 5.5) are selected. To do this operation, the selector is split into two sections: the counter and the multiplexers.
Counter
Multiplexers
Figure 5.3.1: Diagram for the selector circuit. The counter portion of the selector simply counts up in the given modulo. This portion can be broken into five smaller counters. In the problem of TWO+TWO=FOUR, since there are five unique letters, the modulo input is 4 (selection states 000, 001, 010, 011, and 100). Figure 5.3.2 shows a sample count sequence for the counter.
Time (t)
T=0 T=1 T=2 T=3 T=4 T=5 T=6
Output Combination
00000 10000 11000 11100 11110 11111 20000
Figure 5.3.2: Sample count sequence for selector counter. In the actual circuit, the output signal is 3 bits, so that a value of 2 (in t = 6) is actually 10. The outputs of each counter are then directly connected to control the multiplexers. Each multiplexer has only five inputs, and the inputs to each multiplexer was adapted to the counter.
Figure 5.3.3 shows the multiplexer, and figure 5.3.4 shows the resulting output of the full selector.
Time (t)
T=0 T=1 T=2 T=3 T=4 T=5 T = 6s
Output Combination
56789 46789 45789 45689 45679 45678 36789
Outputs
Time (t)
T=0 T=1 T=2 T=3 T=4 T=5 T=6
Output Combination
12345 12354 12435 12453 12534 12543 13245
Figure 5.5.1: Equations derived from the TWO+TWO=FOUR problem. The equations shown in figure 5.5.1 are the decomposition of each column for the TWO+TWO=FOUR problem. In this problem, it is assumed that F has to equal 1, not 0. This is because if the value was 0, there is no need for a letter. It is also noticed that carry positions were added, similar to the handwritten method of addition. The arithmetic checker strictly follows the equations, by replacing the +, *, and = operators with their respective gates.
X = X +
X X +
+ =
X +
+ =
The circuit used for the RAM of the FPGA is based off the concept of vectors. In a vector, information can be inserted, as well as removed according to a reference number. The circuit produced first stores the successful combinations (combinations with an arithmetic checker output of 1) into the vector. In the post-processing stage, the combinations in the vector can be displayed to the user by a counter, which is explained in section 5.7 Pop-in Pop-out
Remove by address
Figure 5.6.1: Concept of a vector used for memory in the TWO+TWO=FOUR oracle.
program is very accurate, and always finds all possible solutions using exhaustive search (refer to section 6.9). If the oracle works, then the solutions produced should be identical. Notice that all solutions considered here are only when F=1.
Since the selector, permuter, and arithmetic checker operate under one clock pulse, theoretically stating, without glitches in the circuit, it would take n clock pulses to go through n combinations. The circuit has to go through 9C5 combinations, and from that number of combinations, it has to go through 5P5 permutations. Solving it would figure out how many clock pulses it would take in order to complete.
Figure 5.10.1 Derivation of number of clock pulses required to solve the problem The number of clock pulses is then converted into time. 1 MHz is equivalent to 1000 KHz. Since the clock on the DE2-115 is 50 MHz, it is equivalent to 50(1000) = 50,000 KHz. There is a formula of 1/(KHz) which converts Kilohertz to milliseconds. Substituting 50,000 into the equation, 1/50,000 is the result, the number of milliseconds per cycle. The number of cycles, 15120, is multiplied to the factor, receiving 0.3024 as an answer, stating that the computational time is 0.3024 milliseconds.
Graph 5.10.1: Bar graph comparing computational time of a hardware and software implementation of an oracle for the TWO+TWO=FOUR Problem.
Graph 5.10.2: Bar graph comparing the average computational time of the hardware and software implementations for the TWO+TWO=FOUR Problem. From the data shown in Table 5.10.1 and the graphs, it can be shown that the software computational time was significantly slower than that of both hardware times. Also, the hardware computational times were not equivalent, but had little differences between the times. In the average computational time comparison, the software took 1126368 milliseconds to calculate, while the hardware only took a small fraction of 0.2999 milliseconds. Thus, from the data, the hardware has around a 3724762 speed up time.
Figure 6.1.1: Flowchart for the oracle for arbitrary cryptarithmetic problems.
+
X10
X1 X4 X9
X2 X5 X8
X3 X6 X7
Figure 6.2.1: Layout of the arbitrary oracle inputs. Notice that the last row of the problem, Xn increments from right to left (instead of left to right). This is because the value of X10 has only two values 0 and 1, and thus is logical to be put as the last Xn value (for the counter described in section 6.3). In the input, each Xn can be replaced with any letter A through Z or 0, where 0 represents an empty space. An empty space can be placed anywhere in the first two rows, but not in the last row (as for the solution, the sum of any two numbers is never nothing). In the oracle, this empty space is treated the same as a constant zero. The given input is then encoded to binary from the starting ASCII value. This allows for the values to be compared with each other, which produces the following control values: 00 meaning inequality, 01 meaning equality, and 10 meaning that the letter is a 0 (empty space). The control values are used in the Input Validity Checker circuit.
Time (t)
T=0 T=1 T=9 T = 10 T = 11
Output Combination
0000000000 0000000001 0000000009 0000000010 0000000011
Figure 6.3.2: Sample count sequence of the counter. It is noticed that when the first counter increments to over the max of 9, it increments the counter of the next higher value (which is on the left) by 1. This is done in the circuit by having the enable of the counter (the input which allows the counter to count) connected to the overflow of the previous counter (the counter to the right). Thus, when the counter overflows, it signals a 1 to the enable of the next counter, allowing it to increment. However, an AND gate is required for the enable 3rd to 10th counter from the right. This is because the overflow signal of the previous gate (the 2nd from the right for instance) would stay 1 until the next time it increments, which is
the next time the previous counter from that (the 1st counter from the right) increments. Since the counter increment at each clock pulse, it would increment for how long the enable is on. The sequence of the faulty counter (with direct overflow to enable connection) is shown below in figure 6.3.3.
Time (t)
T = 90 T = 91 T = 99 T = 100 T = 101 Figure 6.3.3: Incorrect counter sequence.
Output Combination
0000000090 0000000191 0000000999 0000001000 0000001001
T X1 T W O T W O F O U R X1 X2 X3 X4 X5 X6 X10 X9 X8 X7 =
W X2
O X3
T X4
W X5
O X6
F X10
O X9
U X8
R X7
= =
Figure 6.4.1: Inequality and equality comparison cases for the TWO+TWO=FOUR problem
X1
X2
X1
X3
X1
X4
= = =
X8
X9
X8 X10
X9 X10
Global AND output Tells if inputs are valid Figure 6.4.2: Subset of circuit for the scenario described in figure 6.4.1. It can be seen that for every identical letter there is an equality checker, for every different letter, there is an inequality checker, and for all other cases the output is automatically 1 (not applicable for the TWO+TWO=FOUR problem). The cases where the output is automatically 1 are where a letter value is being compared with a null value.
The circuit can be derived by strictly following the excquations. There are adders for each + of the equation, there are multipliers for each 10*C3, and there are comparators for each =.
+ X +
+ X +
+ =
+ X +
+ =
exhaustive search on all combinations is performed, and each module is verified extensively (using different methods.) See Appendix D for the program code.
Figure 6.8.1: Steps for calculating the calculation time of the oracle implemented in hardware. For this experiment, four different problems are tested: TWO+TWO=FOUR, ONE+ONE=TWO, TOO+TOO=LONG, and OOP+LLP=PRGM. 6.8.1 TWO+TWO=FOUR
Graph 6.8.1.1: Bar graph comparing computational time of a hardware and software implementation of an oracle for the TWO+TWO=FOUR Problem.
Graph 6.8.1.2: Bar graph comparing the average computational time of the hardware and software implementations for the TWO+TWO=FOUR Problem. From the data shown above, the hardware calculation times were much faster than the software counterpart. Also, the predicted and the actual time for the hardware oracle were slightly different, stating that the real hardware oracle has a slight delay in calculation. In the average comparison, the software took 225306.7 milliseconds (around 4 minutes) to calculate, while the FPGA only took 40247.67 milliseconds (around 1 minute).
6.8.2 ONE+ONE=TWO
Graph 6.7.2.1: Bar graph comparing computational time of a hardware and software implementation of an oracle for the ONE+ONE=TWO Problem.
Graph 6.7.2.2: Bar graph comparing the average computational time of the hardware and software implementations for the ONE+ONE=TWO Problem. From the data shown above, the hardware calculation times were drastically faster than the software counterpart. Again, the predicted and the actual time for the hardware oracle were slightly different, with the actual time being slightly higher. In the average comparison, the software took 245262.3 milliseconds to calculate, while the FPGA only took 40232.67 milliseconds.
6.7.3 TOO+TOO=LONG
Graph 6.7.3.1: Bar graph comparing computational time of a hardware and software implementation of an oracle for the TOO+TOO=LONG Problem.
Graph 6.7.3.2: Bar graph comparing the average computational time of the hardware and software implementations for the TOO+TOO=LONG Problem. From the data shown in the graphs and table, the hardware implementation again performed much faster than the software implementation. Again, the predicted and the actual
calculation time for the hardware implementation of the oracle were very close, but never equivalent. For a comparison of the averages of both methods, the hardware implementation only took 40241.67 milliseconds, while the software implementation took 254722.7 milliseconds. 6.7.4 OOP+LLP=PRGM *Note, OOP and LLP are types of programs, and PRGM is an abbreviation for program
Graph 6.7.4.1: Bar graph comparing computational time of a hardware and software implementation of an oracle for the TOO+TOO=LONG Problem.
Graph 6.7.4.2: Bar graph comparing the average computational time of the hardware and software implementations for the TOO+TOO=LONG Problem. From the data shown above, the hardware calculation times again were faster than the software counterpart. As usual, the actual calculation time for the hardware oracle was slightly higher than the predicted time. In the comparison of the average time, the software took 40200.33 milliseconds to calculate the solutions, while the FPGA only took 247121.7 milliseconds.
methods are different. If the FPGA has a faster clock equivalent to the computer, it would perform much faster.
6.10 Future Extensions for the Oracle for Arbitrary Cryptarithmetic Problems
Since the exhaustive search performed slowly, there is a need for a different method in order to solve larger problems. The current method already goes through all the combinations in the least amount of time (1 combination per clock pulse), which leaves artificial intelligence as the only option. By using artificial intelligence, the search space is drastically smaller. One artificial intelligence algorithm which could be applied to solve cryptarithmetic problems is the breadth-first search. In this search, the algorithm goes through all cases in the current layer before going deeper into a new layer. Each node of the algorithm contains a letter to number mapping, and each node is expanded to the all possible number mappings for the next letter. After a node is reached, local minimization is done in order to simplify the problem. If the current node happens to be invalid from local minimization, the rest of the nodes branches are cut, saving the time to search those combinations.
TWO+TWO=FOUR
T=2
T=9
9WO+9WO=FOUR
1WO+1WO=FOUR 2WO+2WO=FOUR
Figure 6.10.2: The first layer of the breadth-first search for TWO+TWO=FOUR. The tree shows several nodes which are already determined incorrect by local minimization.
The random-start algorithm first generates a random initial value of the counter (generated by a linear feedback shift register). The counter than increments from the value a predefined number of clock pulses; and then new values are generated for the counter. This is somewhat similar to mutation, where after a period of time, a mutation occurs. Figure 7.5.1 shows an example search space of the Random-Start Algorithm. The blue peaks represent the solutions (0 for no, and 1 for yes), and the horizontal arrows show an example search sequence for the algorithm. The orange vertical peak is the selected solution, and the number over the arrows shows the order of the search. 1
Oracle Output 1
0 000000000
2
199999999
Figure 7.5.1: Example search space of the Random-Start Algorithm It can be noted that the Random-Start algorithm does not have a general calculation time. This is because the calculation time is heavily reliant on the pseudo-random number generator. If the random number generator generates the right sequence, then the problem is quickly solved. However, if the number generator is not close to the solution, then the algorithm takes a longer time to solve the problem.
7.6 Calculation Time Analysis of the Current Oracle Design and the RandomStart Algorithm
7.6.1 TWO+TWO=FOUR
Graph 7.6.1.1: Bar graph comparing computation time of the original oracle and random-start algorithm methods for TWO+TWO=FOUR.
Graph 7.6.1.2: Bar graph comparing the average computation time of the original oracle and random-start algorithm methods for TWO+TWO=FOUR. From the data shown above, the random-start algorithm has a significantly faster calculation time. The average of the original oracle is 5546.3 milliseconds, while the average of the random-start algorithm is 942.3 milliseconds, yielding a 5.9 times speed up. However, the calculations times of the random-start algorithm are not stable, ranging from 207 to 1334 milliseconds. 7.6.2 ONE+ONE=TWO
Graph 7.6.1.1: Bar graph comparing computation time of the original oracle and random-start algorithm methods for ONE+ONE=TWO.
Graph 7.6.2.2: Bar graph comparing the average computation time of the original oracle and random-start algorithm methods for ONE+ONE=TWO. From the data shown above, the random-start algorithm has a significantly faster calculation time. The average of the original oracle is 2799 milliseconds, while the average of the random-start algorithm is 572 milliseconds, which is a 4.9 times speed-up rate for the random-start algorithm. Again, the calculations times of the random-start algorithm are not stable, ranging from 1317 to just 40 milliseconds.
7.6.3 TOO+TOO=LONG
Graph 7.6.3.1: Bar graph comparing computation time of the original oracle and random-start algorithm methods for TOO+TOO=LONG.
Graph 7.6.3.2: Bar graph comparing the average computation time of the original oracle and random-start algorithm methods for TOO+TOO=LONG. From the data shown above, the random-start algorithm actually has a slower calculation time than the original oracle. In fact, the average of the original oracle is 15276.67 milliseconds, while the random-start algorithms calculation time is only 25000.67. This leads to a 0.6 times speed-up rate by using the random-start algorithm (which is a 1.6 times speed-up rate for using the original oracle. Also, the random-start algorithm has unpredictable results, where it only did better in trial 2. As an observation for this problem, there were numerous trials where the random-start algorithm could not find a solution. Therefore, these trials are omitted.
7.6.4 OOP+LLP=PRGM *Note, OOP and LLP are types of programs, and PRGM is an abbreviation for program
Graph 7.6.4.1: Bar graph comparing computation time of the original oracle and random-start algorithm methods for OOP+LLP=PRGM.
Graph 7.6.4.2: Bar graph comparing the average computation time of the original oracle and random-start algorithm methods for OOP+LLP=PRGM. From the data shown above, the random-start algorithm again has a significantly faster calculation time. The average of the original oracle is 28634 milliseconds, while the average of the random-start algorithm is 6607 milliseconds, which is a 4.3 times speed-up rate for the random-start algorithm. Again, the calculations times of the random-start algorithm are not stable, ranging from 13314 to 1438 milliseconds.
solutions. However, TOO+TOO=LONG only has two solutions, therefore explaining the slower speed of the random-start algorithm.
Figure 7.8.1: Flowchart of genetic algorithm implementation for solving cryptarithmetic problems. A reason why a genetic algorithm might perform better than the previous methods mentioned is because there is a logical path to the solution. The solution will have the highest cost function, and each generation, the cost function gets closer to the solution. Compared to randomly choosing random points to count, the genetic algorithm is more reliable. However, as with many other artificial intelligence algorithms, there is a risk of a loss of data.
Start
Cost Function
8. Conclusion
From the data obtained from experiments, the goal was proven to be possible, and the hypothesis proven correct. The goal, which was to create an oracle that would be able to solve arbitrary cryptarithmetic problems, was possible. The oracle was also able to give all possible solutions to a given problem. Calculation time results from the project also supported the known fact that hardware is faster than software. For both oracles, the TWO+TWO=FOUR and the arbitrary cryptarithmetic problem oracles, all answers were reported. For the case of the TWO+TWO=FOUR oracle, the same solutions were outputted between the software and the hardware, but in different order. This is because both implementations have different methodology. However, for the oracle for arbitrary cryptarithmetic problems, the same answers were reported in the same order, because of similar methods used to construct both the software and hardware. In the scenario of a specific oracle for the TWO+TWO=FOUR problem, the hardware implementation of the oracle was much faster than the software implementation. The oracle in hardware was in fact over three million times faster than the oracle in software. Also, the calculation time of the hardware oracle was constant; it was always around 0.3024 milliseconds to calculate all answers. Theoretically, the circuit for the oracle is always the same and operates directly on clock pulses. As the clock pulses are always study, the calculation time never changes. However, with slight errors in the physical model, there is always a slight difference from the predicted time. The software oracle, in the other hand, varied in calculation time. Since the speed of the processor in the computer regular varies depending on the operation being calculated, the calculation time of the software oracle is not constant. In the final scenario of an oracle for solving arbitrary cryptarithmetic problems, the hardware implementation of the oracle was still much faster than the software implementation. Although the oracle in hardware was much slower due to processing more combinations, it still proved to be 6.5 times faster than the software counterpart. Again, the calculation time of the hardware oracle was constant at 40000 milliseconds, since the circuit again runs directly on clock pulses. The software oracle had varied times, all of them between 23000 and 28000 milliseconds. There can be major improvements done to make the experiment more accurate. For one, in both experiments, the speed-up ratio can be more accurately improved by using hardware and software of the same clock speed. Secondly, in the TWO+TWO=FOUR experiment, a comparison should be done between two optimized circuits, instead of an optimized circuit for hardware, and a non-optimized circuit in software. The problem for solving an arbitrary cryptarithmetic problem for one solution was presented. In this problem, new constraints and factors were presented. Two hardware
realizations of an oracle were compared in terms of speed-up. The first was an adaptation of the oracle for solving arbitrary cryptarithmetic problems (discussed above). The second the random-start algorithm, which was an algorithm which used a random starting point to reduce search time. In conclusion of this experiment, the random-start algorithm performed better on an overall basis. However, for problems with few solutions (TOO+TOO=LONG), the random-start algorithm performs poorly. Therefore, the idea of a genetic algorithm is brought into discussion as an extension of the problem. The next step of the project is to improve the circuit realized in hardware. Currently, in the oracle for arbitrary cryptarithmetic problem, there are 2000000000 combinations which are processed. However, of those combinations, in the largest problem of the accepted size, only 30240 combinations are possible solutions. Thus as a result, only about 0.00001512% of the total combinations are useful. By using a selector and permuter for arbitrary cryptarithmetic problems, the oracle would be able to calculate answers to much larger problems. Another step in order to increase the calculation time of the hardware implementation is to apply artificial intelligence to hardware. One way is to use a depth-first search algorithm in hardware, which would be able to drastically reduce the search size of the problems.
9. References
Chu, P. P. (2008). FPGA Prototyping by Verilog Examples. Hoboken, NJ: John Wiley & Sons. Haskell , R. E. (2010). Digital Design Using Digilent FPGA Boards (3rd ed.). Rochester Hills, MI: LBE Books LLC. Ishaque, B., Haider, B., Wasid, M., Alaul, S., Hassan, K., Ahsan, T., ... Alam, M. (2004). An Evolutionary Algorithm to Solve Cryptarithmetic Problem. TRANSACTIONS ON ENGINEERING, COMPUTING AND TECHNOLOGY, VI, 305-313. Prata, S. (2004). C++ Primer Plus (5th ed.). Indianapolis, IN: SAMs Publishing. Reza, A. (2009). Solving Cryptarithmetic Problems Using Parallel Genetic Algorithm. Second International Conference on Computer and Electrical Engineering, Vahid, F., & Lysecky, R. (2007). Verilog for Digital Design. Hoboken, NJ: Wiley & Sons. Wakerley, J. F. (2006). Digital Design (4th ed.). Upper Saddle River, NJ: Pearson Prentice Hall. *Title page image from: http://www.presseagentur.com/media/2577/INK_DE2115_Base_Board.JPG