You are on page 1of 9

c 


c  c c
 
c 
PROBLEM PACK 7
(MODULAR MULTIPLICATION)
cc
cc c      
 !"#$!%$!#$&  "!c'#!#' $%(c%$!#$ %c'#!#'#)!&##*%'#
 % (&# * #+!,

 º+#&$"#& c !-%" - &% -! !( cost of the product meanwhile increase the reliability and
!!  &#'#' + &. !  + !. !  !##   performance. Nevertheless the complexity of embedded
"- &(&",  /01-# "% ! "%#%#$ # 2"3"%4 system is continually increased. The number of states in
$##&-#'&#', software is very large and complex description of a system
making system analysis very hard.
I. INTRODUCTION
System on chip (SoC) refers to integrating all
The embedded systems are implemented as System- components of a computer or other electronic system into a
on-Chip, which is referred to as a SoC design or SoC single integrated circuit (chip). It may include digital,
embedded system. In this lab, SOPC Builder, Quartus II, and analogue, mixed-signal, and often radio-frequency
Nios II IDE software is used in the Altera SOPC (System -on- functions on one single chip. A typical purpose of SoC is
Programmable Chip) Builder to develop a Nios II-based for embedded systems. The major advantages of SoC
Embedded System. device is greatly reduce the size, cost, and power
This lab aims to design the software and hardware consumption of a system. SoC device used in handheld
partition of an embedded system. It will also perform the digital product has replaced bulkier and higher power
design-space exploration between the hardware and software consuming digital systems into a board with several chips.
partition when performing specific computation. The A system on chip may include a configurable logic unit.
performance metric is measured in logic cost and computation The configurable logic unit includes a processor, interface,
cycle count. and a programmable logic on the same device. As
In this experiment, a 32-bit Modular Multiplication technology advances, integration of various units included
(mod_mul) function is being developed. The function can in a SoC design becomes increasingly complicated. A SoC
generate 25 sets of 32-bit random number input operands (r, , device integrates into a single chip might have many of the
), to perform the 32-bit modular multiplication (mod_mul) components of a complex electronic system, such as a
consecutively. The modular multiplication algorithm is given. wireless receiver. This problem required good software
The performance of the hardware and software partition is hardware co design process.
being compared. The common description for hardware/software
(HW/SW) co design is the meeting of system-level
II. ÷ITERATURE STUDY
objectives by taking advantage of the trade-offs between
An embedded system is the essence of every modern HW and SW in a system through their parallel design.
electronic device, from toys to traffic lights to power plant Interaction between HW and SW was developed at the
controllers. It covers all aspects of modern life and there are same time on parallel path to produce design that meets
many examples of their use. Watches, microwaves, and performance and functional specs. The two key concepts
phones make the most of embedded systems. An embedded involved in co design are concurrent development of HW
system is a specialized computer system which contained and SW and integrated design. Integrated design allows
hardware and software customized to perform one or few interaction between the design of HW and SW. Co design
particular tasks in real-time restrictions. They are generally techniques using these two key concepts take advantage of
a part of larger system or machine (e.g., industrial design flexibility to create systems that can meet strict
controllers) housed on a single microprocessor board with performance requirements with a shorter design cycle. The
the programs stored in ROM. Embedded systems are main reason motivating the need for HW/SW co design is
controlled by one or more main processing cores that are the fact that most systems today include both dedicated
normally either microcontrollers or digital signal processors hardware and software units on microcontrollers or general
(DSP). purpose processors. Meanwhile, the increasing use of
Since the embedded system is dedicated to specific programmable processors, available of cheap
tasks, design engineer can optimize it to reduce the size and microcontrollers, increased efficiency of higher level
l   C   C++ 
il
  iti iit l
lti i i ill il t  t l

   
 
  i  l ti
  it  l
lti
 
 
i l  t i t it i 
t  t  i ! t i t   t  


  t
   t      t itt  tt t    
 i
lt
l t 

t    i
l
t ti     t
i t   t
   
 i 
t   
 jti 
t  
i '()   ')#          
t t
i i  it i 
t i
t t  t    
 *  
    


ti l  t
   l    i
l
t ti  r           
 
  
t lll  ti    ti  t i   t  
         

 t      i ti   t  i
l
t ti  i +,    
        -
 
 ll  l  t   t
   i
l
t ti  t   
  
     
 
   t  i  i  t   i ti   
./
0

 M ! L G" !. R%#/L 0 $!#C/##!"



i jt i ii t t t 
 #
i
l
t ti   ll t  i
l
t ti     $

l
t    ll     l  it


i
l
t ti  i i l  t
  

    t i t t t 
  t

 
t  i  itt i C 

i l   i B l 
# 
C++  t    i i i  ! %$& it 

i
'( ii i
lti 
lt   
 

    t    i t   ; r  @ 
 t' i   B
i  l''( ) t* it 
l l'( i 
ti  i
 iltti

lti( li+ti ti  i 
t i t t*
l it* i ,@   ti   
lt t   2 %1 # 2 #34  2% 
t i t  t t   t 
ti t #  # 
l  t t   l t il t 
t
t t  
  
it t #t ) 1'2 ) 1 '2 )' 3423
 t  t t  t it  t ti 5 5

l tt lt ill  t t t it  ttit 5/#t  ' 6 ' 6 3 63 2 5
ll i   it 
l i ll   
@  t i  it
t l@ ti 5
 
l i ll L-. /t 5/#t  '361' '36 1'  6443 5
 0
@ i  it l@ ti 5
 t it 
l t  i  t 
l

ltiliti ti  r     i i 
t 5/#t i  it
t l@ it t t
   
   
  t   titi i !"# !! R
lt ti

   
   
  
 !"#$  i it 
 t li t i i   t
%& +
ti ti il  li t l i  i t
  
l; L-. /t ill tl t   t  t +
ti ti i t
l  t t  t i tt ill  itt i C 
lt i ti
l
 
i !"# !! !$% t !t  t i
t  5  i t #t  )' 3423 5@ 

t
t tt ill it it t    t l 5/#t i   6443 5@ 
l  
B
 it  
@ t ll t t  t tt +
ti ti  5/#t i  i 23
 t l i  ll t  t t l 
t ti t t 
 t i  5/#t 
 t  lt  L-. !t  t i   l ti 
ti 
 t  i
t l ilt &'()* it itt& '()* i  iill  tt ti
l l it 

lt& '()*  & '()*  i
t i l !t l  t  iit
+& '()* & '()* ,& '()*  t& '()*  
t
t l ( L i t  i
 iltti
Bt 
l  t i t t it 
i 
l ll l
-l t   l 
 2 % # 2 2#3 
it tt ill it t it t !"# !! lt #t  )
 5/#t i  3' 1
 i  i tt  t t  it   tl  tt t l i t 
ti 
  i ill t ')) t  it  5/#t i  i ' '1 ti  t
 
  ill   t t 
 t  t t i 
By increasing logic cost by 1.17 times, the execution time is V. CONC÷USION
shortened by 33.64 times. Therefore, this trade off is In conclusion, the algorithm and the functionality of the
acceptable. hardware are being verified while the timing for both
Running frequency( 100000000 Hz software and hardware executions show that the full
Time taken(Software)( 503716 ticks software implementation is slower than the hardware
Time taken(Hardware)( 14971 ticks implementation.
1932757444*1812851329 mod 4294967295=142185736
786279284*481968937 mod 4160749567=890607585
1699140529*368211428 mod 3187671039=2547598544 6
1619701602*850790193 mod 4294967295=740490066 ACKNOW÷ED MENT
1642638206*810407642 mod 4294443007=4023211626
199959575*1030935169 mod 4294967295=1966239155
The authors would like to express their gratitude to
237895031*590249699 mod 4294967295=131936119 Universiti Teknologi Malaysia (UTM) for supporting this
1997930855*53546138 mod 2147483647=670880292 laboratory.
668396581*1839177873 mod 4294967295=1633499898
638216208*1670124393 mod 4294967295=2273883489 REFERENCES
332713197*1035074448 mod 4294836191=2738433170
[1] Mohamed Khalil-Hani, PhD ,Irwansyah,A & Hau,Y.W.(2006). veCAD
1985900155*725370578 mod 4294967295=2373844565
Technical Report( NiosII Tutorial( Avalon Memory-Mapped (Avalon-
1302741285*134650351 mod 4278190079=3480589006
MM) Bus Interface Design of User-Designed ÷ogic in Slave Transfer
638160306*565314516 mod 4294967295=425204706
(VHD÷ Version)
1889387775*1286302809 mod 4294967295=570708015
598279984*371398014 mod 4294967295=2305680216 [2] ECAD PB÷ ÷aboratory, STUDENT PACK, HW/SW Co-design of a
1765157118*627747827 mod 4294967231=2726428706 Nios II-based Embedded System (2009)
2093973672*1774683490 mod 4294967295=3010521705 [3] ECAD PB÷ ÷aboratory, PROB÷EM PACK 7, HW/SW Co-design of a
1234867862*900890657 mod 4294963199=2671521229 Nios II-based Embedded System (2009).
282741182*1140457973 mod 4294967295=3174557551 [4] Mohamed Khalil-Hani, PhD.(2008). Digital system VHD÷ & Verilog
426936334*993029438 mod 4294967295=2955747092 Design(2 nd Edition).
714291992*22021730 mod 4294967295=1333203325 [5] http(//www.cplusplus.com/reference/clibrary/cstdlib/srand/
375146338*397206613 mod 4294967295=1646288869
2088659870*125417311 mod 4294967295=3307767680 [6] http(//soc.eurecom.fr/EDC/des_cop2/
217963178*957649779 mod 4261412863=751583278 [7] http(//www.seas.upenn.edu/~ese201/vhdl/vhdl_primer.html#_Toc52606
1364
[8] http(//www.velocityreviews.com/forums/t23789 -while-loop.html
Result Verification( [9] http(//www.freeinfosociety.com/site.php?postnum=485
1932757444 x 1812851329 [10] http(//en.wikipedia.org/wiki/Embedded_system
= 3503801900990043076 [11] http(//en.wikipedia.org/wiki/System -on-a-chip
3503801900990043076 ÷4294967295 [12] http(//www.npd-solutions.com/swcodesign.html
= 815792452.03310519643898708662926
Multiply the decimal point with divider to get the
remainder,
0.03310519643898708662926 x 4294967295
= 142185736
 c5

Running frequency( 100000000 Hz


Time taken( 12359 ticks
1932757444*1812851329 mod 4294705143=344120785
481968937*1699140529 mod 4255121407=2627761612
1619701602*850790193 mod 4294967295=3119638564
810407642*199959575 mod 4294967295=740490066
237895031*590249699 mod 4293918719=3818040340
53546138*668396581 mod 4294967291=3387772582
638216208*1670124393 mod 4026531839=2934738067
1035074448*1985900155 mod 4294967295=679815706
1302741285*134650351 mod 4227857407=4194470430
565314516*1889387775 mod 4294967295=2228820368
598279984*371398014 mod 4294967287=51824115
627747827*2093973672 mod 4294967295=2719559960
1234867862*900890657 mod 4294967295=3015005904
1140457973*426936334 mod 2013265772=2582953174
714291992*22021730 mod 4293916671=574899538
397206613*2088659870 mod 3976200191=1802861821
217963178*957649779 mod 4294705023=2926194162
1714510321*1894596297 mod 4294959103=1911865590
762000468*989767529 mod 2147483643=2616249845
1299598477*1277489890 mod 4294967295=1813901745
1953573275*608438773 mod 4260364287=3987417940
1600570223*1112997517 mod 4289724415=2312138027
1880702872*471094094 mod 4294967279=3171707931
1806439037*1060615205 mod 4294705151=3224758328
1236269356*1695691943 mod 4294967231=636010679

Result obtained for hardware/software co-design


without delay
V#$% <stdio.h>
V#$% <unistd.h>
V#$% "altera_avalon_pio_regs.h"
V#$% "system.h"
V#$% <stdlib.h>
V#$% "sys/alt_timestamp.h"
V#$% "alt_types.h"

alt_u32 mod_mul(alt_u64 V,alt_u64 A,alt_u32 P) {


alt_u64 U = 0;
.+#% (A != 0) {
7
# (A&0x01) { U = U V; }
# (U >= P) { U = U - P; }
A>>=1;
V<<=1;
# (V>=P) { V = V - P; }
}
!! U;
}

# seed=10;
)# generate_random(alt_u32 p[],alt_u32 x[],alt_u32 y[]) {
# i;
srand(seed);
77
! (i = 0;i<25;i ) {
.+#% (p[i] <= 1) p[i] = (rand())%((2^32) -1);
 {
x[i] = rand()%(p[i]);
}.+#% (x[i] == 0);
 {
y[i] = rand()%(p[i]);
}.+#% (y[i] == 0);
seed=i;
}}

# main () {
&#' # start,end;
alt_u32 p[25];
alt_u32 x[25];
alt_u32 y[25];
alt_u32 result[25];
# i;
generate_random(p,x,y);

#(alt_timestamp_start() < 0) printf("No timestamp device available \n");


%& {
printf("Running frequency( %u Hz \n", (&#' #)(alt_timestamp_freq()));
start = alt_timestamp();
77
! (i = 0;i<25;i ) {
result[i] = mod_mul(x[i],y[i],p[i]);
//result[i] = (x[i]*y[i])%p[i];
}
end = alt_timestamp();
printf("Time taken( %u ticks \n",(&#' #)(end - start));
}
77
! (i = 0;i<25;i ) {
printf("%lu*%lu mod %lu=%lu \n",x[i],y[i],p[i],result[i]);}
.+#%(1);
!! 0;
}

Source Code for Software Design


V#$% <stdio.h>
V#$% <unistd.h>
V#$% <stdlib.h>
V#$% <time.h>
V#$% <pthread.h>
V#$% "altera_avalon_pio_regs.h"
V#$% "system.h"
V#$% "sys/alt_timestamp.h"
V#$% "alt_types.h"

V# dataX 0x00


V# dataY 0x01 Hardware Partition
V#
Delay dataZfor
added 0x02
correct result to be ready
V# start_pin 0x03
V# resultData 0x05
V# done_pin 0x04
alt_u32 mod_mul(alt_u32 V,alt_u32 A,alt_u32 P) {
alt_u8 done = 0;
alt_u32 U = 0;
IOWR(A÷U_AVA÷ON_0_BASE,dataX,V); // write dataX=V;
IOWR(A÷U_AVA÷ON_0_BASE,dataY,A); // write dataY=A;
IOWR(A÷U_AVA÷ON_0_BASE,dataZ,P); // write dataZ=P;
IOWR(A÷U_AVA÷ON_0_BASE,start_pin,0x01); // write dataZ=P;
 {
done = IORD(A÷U_AVA÷ON_0_BASE,done_pin); // loop & wait for done signal
}.+#% (!done);
# d=0;
.+#%(d<1)
88
{d ;}
U = IORD(A÷U_AVA÷ON_0_BASE,resultData); // store result into U
!! U;
}
# seed=10;
)# generate_random(alt_u32 p[],alt_u32 x[],alt_u32 y[]) {
# i;
srand(seed);
88
! (i = 0;i<25;i ) {
.+#% (p[i] <= 1) p[i] = (rand())%((2^32) -1);
 {
x[i] = rand()%(p[i]);
}.+#% (x[i] == 0);
 {
y[i] = rand()%(p[i]);
}.+#% (y[i] == 0);
seed=rand();
}}
# main () {
&#' # start,end;
alt_u32 p[25];
alt_u32 x[25];
alt_u32 y[25];
alt_u32 result[25];
# i;
generate_random(p,x,y);
#(alt_timestamp_start() < 0) printf("No timestamp device available \n");
%& {
printf("Running frequenc y( %u Hz\n", (&#' #)(alt_timestamp_freq()));
start = alt_timestamp();
88
! (i = 0;i<25;i ) {
result[i] = mod_mul(x[i],y[i],p[i]);
}
end = alt_timestamp();
printf("Time taken( %u ticks \n",(&#' #)(end - start));
}
88
! (i = 0;i<25;i ) {
printf("%lu*%lu mod %lu=%lu \n",x[i],y[i],p[i],result[i]);
}
.+#%(1);
!! 0;}

Source Code for Hardware/Software Co-Design


%#-! !(#6
&#,&3%'#$37789, %%6 
&#,&3%'#$3&#', %%6 
&ccc,&3%'#$3 !#+, %%6 

#(&"#& 
!2:(;   <#&3%'#$3)$!2/7.=46 
 !&%   <&3%'#$3)$!2/7.=46 
 $%*& !   <#&3%'#$6 
    <&3%'#$46 
&"6

!$+#$!-+)&"#& 
 (& (#&2&=&7&0&/&946 
 &#' %& :3& <& (<>&=6 
 &#' %?  <&3%'#$3)$!28/.=46 
-'#
 -$<!$&&2& 4-'# 
  $ && #&
    .+&=>@
     #2A>B================================B4+
      :3& C>&76
     %&
      :3& C>&96
     #6
    .+&7>@
     :3& C>&06
    .+&0>@
     :3& C>&/6
    .+&/>@
     :3& C>&=6
    .+&9>@
     :3& C>&96  
  $ &6
 !$&&611 -$
 <!$&&-'#
  . ##%2$%*D) $%*>D7D46 11. #!!#&#''$%* 
  #& !>D7D+ 
   & C>&=6
   ?2/7.=4C>:62/7.=4C>(6 2/7.=4C>;6 
   C>D=D6
  C>B================================================================B6
  %&
   & C>:3& 6 
  $ && #&
    .+&=>@
    .+&7>@
     #22=4>D7D4+
      C>E?6
     #6
    .+&0>@
     C>D=DF28/.746
     ?C>?280.=4FD=D6
     #2@> 4+
      C>1 6
     #6
    .+&/>@
     #2?@> 4+
      ?C>?1 6
     #6 
    .+&9>@
     !&%C>2/7.=46
     C>D7D6   
 
  $ &6 
 #6
 !$&&611 
-+)6

VHD÷ code for FSM


%#-! !(ccc6
&ccc,&3%'#$37789, %%6
&ccc,&3%'#$3 !#+, %%6 

c
 %3 ) %
 2 !&<3  6
   $%*<3  6
   $+#&%$<3  6 
   !&&<3  3?c 20=46
   .!#<3  6
   .!#  <3  3?c 2/7=46
   !   <3  3?c 2/7=446
c %3 ) %6

 c c !$+ %3 ) %
&#' %%#5<&3%'#$3)$!2/7.=46 
&#' %%#
<&3%'#$3)$!2/7.=46 
&#' %%#<&3%'#$3)$!2/7.=46 
&#' %!&%3&"<&3%'#$3?$!2/7 .=46
&#' %& !3&"<&3%'#$6 
&#' %3&"<&3%'#$6 

$" 3#! $
2 !&  <3  6
  $%* <3  6
  $+#&%$<3  6
  !&& <3  3?c 20=46
  .!# <3  6
  .!#  <3  3?c 2/7=46
  !    <3  3?c 2/7=46
  
  :(;  <&3%'#$3)$!2/7.=46 
  & !  <-!&3%'#$6 
  !&%  <#&3%'#$3)$!2/7.=46 
    <#&3%'#$46 
c$"6
$"&"#& 
!2 :(; <#&3%'#$3)$!2/7.=46 
 !&% <&3%'#$3)$!2/7.=46 
 $%*& ! <#&3%'#$6 
  <&3%'#$46 
$"6
c
 3 !#+#<&"
  !" 2  
     :>@%#5
     (>@%#

     ;>@%#
     !&%>@!&%3&" 
     $%*>@$%* 
     & !>@& !3&" 
     >@3&"46
     
 3! $3%< 3#! $
  !" 2$%*>@$%* 
     !&>@!& 
     $+#&%$>@$+#&%$ 
     !&&>@ !&& 
     .!#>@.!# 
     .!#  >@.!#   
     !   >@!   
     :>@%#5
     (>@%#

     ;>@%#
     !&%>@!&%3&" 
     & !>@& !3&" 
     >@3&"
     46
c !$+6

VHD÷ code for alu_avalon


%#-! !(#6
&#,&3%'#$37789, %%6 
&ccc,&3%'#$3 !#+, %%6 

c
 3#! $
2 !&  <3  6
  $%* <3  6
  $+#&%$<3  6
  !&& <3  3?c 20=46
  .!# <3  6
  .!#  <3  3?c 2/7=46
  !    <3  3?c 2/7=46
  
  :(;  <&3%'#$3)$!2/7.=46 
  & !  <-!&3%'#$6 
  !&%  <#&3%'#$3)$!2/7.=46 
    <#&3%'#$46
c 3#! $6

 c c !$+ 3#! $
c
 !$&&2!&$%*4 
 -'#
  #!&>D7D+ 
   !   C>2+!&>@D=D46
  %&#$%*D) $%*>D7D+ 
   #$+#&%$>D7D+ 
    #2& !>D7D4+& !C>D= D6
    #6
    $ & !&&#&
     .+B===B>@
      :C>.!#  2/7.=46 
     .+B==7B>@
      (C>.!#  2/7.=46 
     .+B=7=B>@
      ;C>.!#  2/7.=46
     .+B=77B>@
      & !C>.!#  2=46 
     .+B7==B>@
      !   2=4C>6
     .++!&>@
      !   2/7.=4C>!&%6 
      
    $ &6
   #6
  #6
 !$&&6
c !$+6

VHD÷ code for A÷U_interface

You might also like