Professional Documents
Culture Documents
1 Introduction
Lightweight Cryptography is an emerging area focused on research and develop-
ment of cryptographic algorithms suitable for resource constrained devices like
RFID tags, wireless sensors etc. These kinds of devices have very low hardware
footprint and operate with minimal power and as such the usual cryptographic
algorithms like AES, RSA etc. are not suitable for protecting data processed by
such devices. Thus all the lightweight cryptographic algorithms need to satisfy
twin conditions of providing sufficient levels of security and ability to operate
efficiently with minimal hardware/software resources. Some known lightweight
block cipher include PRESENT [3], PRINCE [5], CLEFIA [18]. Similarly Grain v1
[10] MICKEY [2] and Trivium [6] are some examples of lightweight stream ciphers.
MDS matrices are a popular choice to build diffusion layers of block cipher
and hash functions, because they have maximal differential and linear branch
number which in turn provides perfect diffusion thus securing the cipher against
linear and differential attacks. Ciphers such as AES, Twofish, SHARK are some
of the well known block cipher using MDS matrices for diffusion. For a square
matrix to be MDS it need to satisfy the condition that its every possible square
sub matrix has to be non singular. This requirement makes it challenging to find
MDS matrices which have efficient implementation in hardware and software.
In [11] the metric XOR count that measures the implementation cost of a
diffusion matrix is introduced. Using this metric one can find diffusion matrices
which can be efficiently implemented. Following this several works followed [16,
?,?,?] in which different types of MDS matrices were studied which have low XOR
counts. In [17] authors computed the minimum possible value of a 4 4 diffusion
matrix over F24 , F28 and presented constructions of MDS matrices which attain
these minimum values.
2 Preliminaries
Let F2m denote a finite field of size 2m . An [N, K, D] code over F2m is MDS if the
minimum distance of the code attains Singleton bound, i.e., if D = N K + 1. A
nn matrix M is said to be MDS if the linear code defined by the matrix [In M ]
is MDS code of length 2n and dimension n. Another way of characterizing an
MDS matrix is following: A n n matrix M over F2m is MDS if and only if every
square submatrix of M is nonsingular [12].
The finite field F2m is also a m dimensional vector space over F2 . This vec-
tor space has several bases but we use only the polynomial basis given by
{, 2 , . . . , m1 } where is the root of the irreducible polynomial that de-
fines F2m over F2 . The notion of XOR count defined below, was introduced in
[11] to measure the cost of field multiplication in F2m .
Definition 1. Let B be a vector space basis of F2m over F2 . For any a F2m
the XOR count of a with respect to B is denoted by XOR (a) and is defined as
the number of XOR s needed to implement the field multiplication of a with an
arbitrary element b F2m
Below we present the XOR counts of all the elements of F24 defined by the
irreducible polynomial X 4 + X + 1 under a polynomial basis.
F24 elements 0 1 2 3 + 1 2 + 3 + 2 3 + + 1 2 + 1 3 +
XOR count 0 0 1 2 3 5 5 5 6 6 8
F24 elements 2 + + 1 3 + 2 + 3 + 2 + + 1 3 + 2 + 1 3 + 1
XOR count 9 8 6 3 1
Table 1. F24 elements and their corresponding XOR counts, where is the root of
primitive polynomial X 4 + X + 1.
The diffusion layers in block ciphers are defined by MDS matrices. The notion
of XOR count of a an element can be extended to n n matrices defined over
F2m as follows [17]:
n1
X n1
X n1
X
ij + (`i 1) m = C(M ) + (`i 1) m (1)
i=0 j=0 i=0
where ij is the XOR count of the j-th entry of the i-th row of the matrix, and
`i is the number of nonzero entries in that row. The term C(M ) is the sum of
XOR counts of all the entries of M . For an n n MDS matrix over F2m , `i = n,
so (1) becomes C(M ) + n (n 1) m, and C(M ) is the part that varies with
the matrices. In this paper we are mainly focused on Serial matrices, in which
2
all but the last row has only 0 or 1 as entries (see (3)). In this case the the XOR
count of the matrix is simplified to
n1
X
C(S) + n (m 1) = XOR (ai ) + n (m 1) (2)
i=0
3
3.2 Lightweight Recursive MDS Matrices
If S = Serial (a0 , . . . , an1 ) is a nn serial matrix defined over F2m then it is easy
to see that (see Theorem 1) least possible value of i for which Si is MDS is i = n.
Consequently while constructing such MDS matrices the focus is to find n n
serial matrices S for which Sn is MDS. This is done mainly to keep the latency
under control: If Si is MDS then we need i iterations to compute the output
y = (Si ) x (see (5)) and hence lower the value of i the better it is. However,
this criteria may lead to exclusion of some lightweight recursive MDS matrices.
For instance, in [9] authors show that if S = Serial (a0 , a1 , a2 , a3 ) is such that
ai = 1 for more than 2 values of i then S4 is never MDS over F2m . Though it
is true that in such cases Si is never MDS if all four ai = 1 there actually exist
lightweight serial matrices of the form Si = Serial (a0 , a1 , a2 , a3 )i if we assume
ai = 1 for 3 values of i. As we will see this kind of matrices are extremely
lightweight and can be used in applications where latency does not matter but
resource constrain does. In this section we present some new results concerning
such recursive MDS matrices. We briefly recall some terminology from [1] which
will be useful in presenting new results. Suppose S = Serial (a0 , a1 , a2 , a3 ) which
is the companion matrix of the polynomial f (X) = a0 +a1 X +a2 X 2 +a3 X 3 +X 4 .
We can interpret the matrix S as
0 1 0 0 X
0 0 1 0 X2
S= 0 0 0 1=
3
(6)
X
a0 a1 a2 a3 X 4 mod f (x)
| {z }
()
where elements of each row in the matrix () consists of coefficients of the poly-
nomial given in that row. With these notations it is easy to see that for any
i 1 the matrix S i is given by
i
X mod f (x)
X i+1 mod f (x)
Si = . (7)
..
.
X i+3 mod f (x)
With these we are ready to present our results. We begin with the following fact
which can be verified easily.
4
S0 , S1 , S3 then Si is not MDS for 1 i 8:
where a0 , a1 , a3 6= 1.
Proof. Let S = Serial (a0 , a1 , a2 , a3 ) be a serial matrix defined over F2m . From
(7) it follows directly that for i = 1, 2, 3 the matrix Si has 0 entries and hence
cannot be MDS. Remains to show that Si is not MDS for 4 i 8 whenever S
is in any of the form S0 , S1 , S3 given in (8). We do this by considering each form
separately. For the sake of clarity for each i we denote the polynomial associated
with the serial matrix Si by fi (X)
Note that this polynomial has zero coefficients and also these coefficients form a
row in the matrices S0i for 4 i 7 as can be seen from (7). Consequently none
of these matrices can be MDS. Using same argument we see that the matrix S08
cannot be MDS because the coefficients of
Theorem 2. Let S = Serial (1, 1, a, 1) be defined over F2m , then Si is not MDS
for 1 i 7. Further, S8 is MDS precisely in either the following two cases:
(1) If the minimal polynomial of a over F2 is X 4 + X + 1
(2) If the degree of the minimal polynomial of a over F2 is 5 and the minimal
polynomial is not in the set { X 5 + X 4 + X 2 + X + 1, X 5 + X 3 + X 2 + X +
1, X 5 + X 4 + X 3 + X 2 + 1}
5
Power of Matrix Rows of Columns of
of the form Singular Singular
S = S3 Submatrix Submatrix
S4 (0, 1) (1, 2)
S5 (0, 1) (1, 3)
S6 (0, 1) (0, 2)
S7 (0, 1) (0, 2)
S8 (2, 3) (1, 2)
Now, if we consider the matrix S(a) for some a F2m then S(a)8 is MDS
if and only if (a) 6= 0 for every (X) . One can check that this happens
precisely in either the two cases (1), (2) stated in statement of theorem. t
u
Corollary 1. Let S = Serial (1, 1, , 1) be defined over F24 where is root of
irreducible polynomial X 4 + X + 1 F2 [X]. The matrix S8 is MDS and the total
XOR count of the matrix S is 13.
Proof. That the matrix S8 is MDS follows from Theorem 2. To compute XOR (S)
first note that XOR () = 1 which follows, for instance from [17, proposed 1].
Using this and observing that for any (x0 , x1 , x2 , x3 ) (F24 )4 ,
S (x0 , x1 , x2 , x3 ) = (x1 , x2 , x3 , x0 x1 x2 x3 )
we have XOR (S) = 1 + 3 4 = 13. t
u
6
4 Hardware Implementation and Comparison
In this section we will describe the hardware implementation of datapath of
typical AES-like block ciphers comprising of linear layer evaluated using our
proposed serial MDS matrix, non-linear layer using our proposed S-box and
ShiftRow operation. We will demonstrate how our proposed serial MDS matrix
has significant reduction in hardware overhead over lightest serial MDS matrix
reported in literature as used in LED block cipher [7]. Further, we have reduced
the hardware overhead of non-linear layer using our proposed method, without
compromising the non-linearity, differential uniformity and degree of a typical
4 4 S-box as used in case of PRESENT [4] or LED [7] block ciphers. Finally,
we have implemented the datapath, using ASIC design flow, made up of our
proposed components(linear layer, S-box) and ShiftRow operation and compare
the same with that of LED block cipher, having the same components, in terms
of hardware performance. We have used Synopsys Design Compiler(version: vI-
2013.12-SP5-4) for synthesis and Synopsys VCS(version: I-2014.03-SP1-1) for
simulation. Standard cell library(TSL18FS120) on 180nm technology from Tow-
erSemiconductor Ltd. is used during synthesis, which is characterized using Sili-
conSmart Software (version: 2008.02-SP1p1) under Fast-Fast process(P), 1.98V
voltage(V) and -40 degree C temperature(T).
Serial Matrix Multiplication with Column Vector Implementation and Compar-
ison: The serial matrix(S1 ) used in LED block cipher, where,
0 10 0
0 0 1 0
0 0 0 1 ,
S1 = (9)
2
1
is reported to be the lightest in literature with XoR cost 16, using which MDS
matrix is calculated as S14 . We implemented S1 X, where X = [x1 , x2 , x3 , x4 ]T
is a column vector, and each element(xi ) of the vector is a nibble, and found
that hardware footprint to be 387.5um2 comprising of 6 3-input XOR gates and
2 2-input XOR gates. The serial matrix(S2 ) we propose is given by
0100
0 0 1 0
S2 =
0 0 0 1 (10)
111
is lighter than S1 with XoR cost 13, using which MDS matrix is calculated as S28 .
Similarly, we implemented S2 X, and hardware footprint reported by synthesis
tool 365.4um2 comprising of 5 3-input XOR gates and 3 2-input XOR gates.
7
+(3 + 1)x10 + (3 + 1)x9 + (2 + + 1)x8 + 2 x7 + (3 + 2 )x6
+(3 + )x5 + (3 + 2 + )x4 + (2 + + 1)x3 + (2 + + 1)x2 + 3 + 2 ,
where is a primitive root of X 4 +X +1 = 0. Instead an S-box which is monomial
would have low hardware footprint. The monomials X 7 X i for i = 7, 11, 13, 14,
are such that the associated 4 4 S-box nonlinearity 4, differential uniformity
4, and algebraic degree 3. We consider such monomial with the least i, i.e.,
X 7 X 7 . Note that this S-box has fixed points: 0 7 0, 1 7 1, etc. However,
if we consider X 7 X 7 + 1, then the associated S-box will not have any fixed
points, at the same time nonlinearity, differential uniformity and algebraic degree
remains invariant. We also check that the implementation cost of this S-box
is lesser than that of LED. In Verilog, we implemented this S-box function as
X 4 X 2 X, which requires 2 nibblewise multiplication operation, 1 square
and 1 fourth power calculation followed by one bit XOR. We did not not choose
X 4 X 3 , as X3 is heavier than 2 multiplication operations. Our proposed 44 S-
box occupies 359um2 area while look-up table based LED S-box implementation
occupies 391um2 which is 11% more than proposed one. We implemented case
based look-up table implementation for LED S-box in Verilog. The proposed
S-box values are shown in Table 4. We evaluated our proposed as well as LED S-
box properties using S-Box Evaluation Tool(SET) [14] tool and the comparison
results is shown in Table 4.
x 01 2 3 4 5 67 8 9AB C DE F
S(x) 1 0 A C 8 F 7 6 D 4 9 2 E 3 5 B
8
PLAINTEXT
20 ns 8 8 Cycles
A * STATE
CLOCK
DIVIDER
40 ns SBOX
+
SHIFTROW
STATE MATRIX
the fact that our proposed MDS matrix is obtained as a result of S28 requiring 8
clock cycles, where MDS matrix obtained from S1 requires 4 clock cycles. Our
main target is to reduce area of datapath without compromising the time re-
quired for calculation, as it affects Area-Time Product [19] of the datapath. To
do so in original design we have supplied a clock with time-period 40 ns, and
then a round will take 4 clock cycles(4 40 = 160 ns) to complete MDS layer
multiplication, followed by byte substitution and shiftrow. In case of our pro-
posed design, we used tactic of clock switching [13], where we have utilized clock
divider circuit to divide supply clock(20 ns) to a clock with double the time
period40 ns than supplied clock as shown in Fig. 1. The MDS layer multiplica-
tion with state matrix operates using 20 ns clock while rest parts of datapath
operates on (40 ns) clock. So, as in our new design MDS layer multiplication
with state matrix requires 8 clock cycles(8 20 = 160 ns), the sum of clock
period value requirement is same as in original design. With the new design we
even gained on critical path length as well, which resulted into 73% reduction in
Area-Time(AT) Product as shown in Table 4.
9
5 Conclusions
References
1. D. Augot and M. Finiasz. Direct construction of recursive mds diffusion layers using
shortened bch codes. In International Workshop on Fast Software Encryption,
pages 317. Springer, 2014.
2. S. Babbage and M. Dodd. The stream cipher MICKEY 2.0, 2006. http://www.
ecrypt.eu.org/stream/mickeypf.html.
3. A. Bogdanov, L. R. Knudsen, G. Leander, C. Paar, A. Poschmann, M. J. B. Rob-
shaw, Y. Seurin, and C. Vikkelsoe. PRESENT: An Ultra-Lightweight Block Cipher.
In P. Paillier and I. Verbauwhede, editors, Cryptographic Hardware and Embedded
Systems - CHES 2007, volume 4727 of LNCS, pages 450466. Springer, 2007.
4. A. Bogdanov, L. R. Knudsen, G. Leander, C. Paar, A. Poschmann, M. J. B. Rob-
shaw, Y. Seurin, and C. Vikkelsoe. Present: An ultra-lightweight block cipher. In
P. Paillier and I. Verbauwhede, editors, CHES, volume 4727 of Lecture Notes in
Computer Science, pages 450466. Springer, 2007.
5. J. Borghoff, A. Canteaut, T. Guneysu, E. B. Kavun, M. Knezevic, L. R. Knudsen,
G. Leander, V. Nikov, C. Paar, C. Rechberger, P. Rombouts, S. ren S. Thomsen,
and T. Y. in. PRINCE - A Low-latency Block Cipher for Pervasive Computing
Applications (Full version). Cryptology ePrint Archive, Report 2012/529, 2012.
http://eprint.iacr.org/.
6. C. De Canniere and B. Preneel. Trivium. New Stream Cipher Designs, pages
244266, 2008.
7. J. Guo, T. Peyrin, A. Poschmann, and M. Robshaw. The LED Block Cipher, pages
326341. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
8. K. C. Gupta, S. K. Pandey, and A. Venkateswarlu. On the direct construction of
recursive mds matrices. Designs, Codes and Cryptography, 82(1-2):7794, 2017.
9. K. C. Gupta and I. G. Ray. On constructions of mds matrices from companion
matrices for lightweight cryptography. In International Conference on Availability,
Reliability, and Security, pages 2943. Springer, 2013.
10. M. Hell, T. Johansson, and W. Meier. Grain : a Stream Cipher for Constrained
Environments. Int. J. Wire. Mob. Comput., 2(1):8693, May 2007.
11. K. Khoo, T. Peyrin, A. Y. Poschmann, and H. Yap. FOAM: Searching for
Hardware-Optimal SPN Structures and Components with a Fair Comparison. In
L. Batina and M. Robshaw, editors, Cryptographic Hardware and Embedded Sys-
tems - CHES 2014, volume 8731 of Lecture Notes in Computer Science, pages
433450. Springer Berlin Heidelberg, 2014.
12. F. J. Macwilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes
(North-Holland Mathematical Library). North Holland, January 1983.
13. S. Patranabis, D. B. Roy, Y. Shrivastava, D. Mukhopadhyay, and S. Ghosh. Par-
simonious design strategy for linear layers with high diffusion in block ciphers. In
2016 IEEE International Symposium on Hardware Oriented Security and Trust,
HOST 2016, McLean, VA, USA, May 3-5, 2016, pages 3136, 2016.
14. S. Picek, L. Batina, D. Jakobovic, B. Ege, and M. Golub. S-box, SET, Match: A
Toolbox for S-box Analysis. In D. Naccache and D. Sauveron, editors, 8th IFIP
International Workshop on Information Security Theory and Practice (WISTP),
volume LNCS-8501 of Information Security Theory and Practice. Securing the In-
ternet of Things, pages 140149, Heraklion, Crete, Greece, June 2014. Springer.
Part 5: Short Papers.
10
15. M. Sajadieh, M. Dakhilalian, H. Mala, and P. Sepehrdad. Recursive diffusion layers
for block ciphers and hash functions. In Fast Software Encryption, pages 385401.
Springer, 2012.
16. S. Sarkar and S. M. Sim. A deeper understanding of the XOR count distribu-
tion in the context of lightweight cryptography. In D. Pointcheval, A. Nitaj, and
T. Rachidi, editors, Progress in Cryptology - AFRICACRYPT 2016 - 8th Inter-
national Conference on Cryptology in Africa, Fes, Morocco, April 13-15, 2016,
Proceedings, volume 9646 of Lecture Notes in Computer Science, pages 167182.
Springer, 2016.
17. S. Sarkar and H. Syed. Lightweight diffusion layer: Importance of Toeplitz matrices.
IACR Trans. Symmetric Cryptol., 2016(1):95113, 2016.
18. T. Shirai, K. Shibutani, T. Akishita, S. Moriai, and T. Iwata. The 128-Bit Blockci-
pher CLEFIA (Extended Abstract). In A. Biryukov, editor, Fast Software Encryp-
tion, 14th International Workshop, FSE 2007, Luxembourg, Luxembourg, March
26-28, 2007, Revised Selected Papers, volume 4593 of Lecture Notes in Computer
Science, pages 181195. Springer, 2007.
19. C. D. Thompson. Area-time complexity for vlsi. In Proceedings of the Eleventh
Annual ACM Symposium on Theory of Computing, STOC 79, pages 8188, New
York, NY, USA, 1979. ACM.
11