System Generator

An introduction to Xilinx System Generator
Miroslav Kne zevi c miroslav.knezevic@esat.kuleuven.be ESAT/SCD-COSIC, Room 01.62 Reviewed by: Pieter Nuyts, Tom Redant and Nele Reynders
Before We Start
Before we start with the introductory session, download the following les from Toledo, under the Cursus Informatie, section Introductiesessies / Fase 3 - System Generator, and place them on your Desktop: sysgen intro.pdf - This le. Files Sysgen Impact USB.zip - Contains les necessary for testing the FPGA board. Dragon FPGA programming and testing.pdf - A document describing the FPGA testing. In order to have the correct versions of Matlab and Impact running, you need to perform the initial set up by following the next steps (these steps need to be done only ONCE. Next time you log into an ESAT machine you can skip them): (i) Open the terminal and rst backup your old .bashrc le by typing cp .bashrc .bashrc_old (ii) Then for 64-bit machines type
echo -e "$(cat .bashrc)\nsource ~micasusr/design/scripts/xilinx_ise_12.2_64bit.rc" > .bashrc
and for 32-bit machines type

echo -e "$(cat .bashrc)\nsource ~micasusr/design/scripts/xilinx_ise_12.2_32bit.rc" > .bashrc
(iii) Finally, type source .bashrc Now, you are ready to lunch the System Generator and Xilinx Impact tools by typing sysgen and impact in your terminal, respectively.
Introduction
Xilinx System Generator is the industrys leading high-level tool for designing high-performance Digital Signal Processing (DSP) systems using FPGAs. A close interconnection with the MATLAB/Simulink software makes the implementation of complex hardware designs an easy task for the engineers. The previous experience with Xilinx FPGAs or RTL design methodologies are not required when using System Generator. Designs are captured in the DSP friendly Simulink modeling environment using a Xilinx specic blockset. All of the downstream FPGA implementation steps including synthesis and place and route are automatically performed to generate an FPGA programming le. The purpose of this exercise session is to get students acquainted with the Xilinx System Generator tool that will be further used for the Dragon project (H01Q6a). After reading the following sections, the students should be able to understand basics about the System Generator and successfully complete the three exercises given in Section 4. During the preparation of this document, the author found useful information in [1] and [2]. 1
Design Creation Basics
Over 90 DSP building blocks are provided in the Xilinx DSP blockset for Simulink. These blocks include common DSP building blocks such as adders, multipliers, multiplexers, registers, and others, as shown in Figure 1. Also included is a set of complex DSP building blocks such as forward error correction blocks, FFTs, lters and memories. These blocks leverage the Xilinx IP core generators to deliver optimized results for the selected device.
Figure 1: The Xilinx DSP Block Set. The Xilinx DSP blockset is accessed via the Simulink Library browser which can be launched from the standard MATLAB toolbar (Figure 2). The blocks are separated into sub-categories for easier searching. One sub-category, Index includes all the blocks and is often the quickest way to access a block you are already familiar with.
Figure 2: Launching Simulink.
3.1
Dening the FPGA boundary
System Generator works with standard Simulink models. Two blocks named Gateway In and Gateway Out dene the boundary of the FPGA from the Simulink simulation model. Double-click on the blocks brings up the properties editor where the block properties can be fully specied (Figure 3).
Figure 3: Gateway In and Gateway Out blocks. The Gateway In block converts inputs of type Simulink integer, double and xed-point to Xilinx xed-point number. The Xilinx xed-point types are Boolean. Signed (twos complement). Unsigned. If the chosen type is Signed or Unsigned, the Number of bits along with the Binary point need to be specied. Number of bits represents the input width, while the Binary point parameter indicates the number of bits to the right of the binary point (i.e. the size of the fraction). The Binary point position must be between zero and the specied Number of bits. While converting a Simulink type to a System Generator xed-point type, the Gateway In uses the selected quantization and overow options. For quantization, the options are Round round to the nearest representable value (or to the value furthest from zero if there are two equidistant nearest representable values). Truncate discard bits to the right of the least signicant representable bit.
For overow, the options are Wrap discard bits to the left of the most signicant representable bit. Saturate saturate to the largest positive/smallest negative value. Flag as error ag an overow as a Simulink error during simulation. It is important to realize that overow and quantization for the Gateway In blocks do not take place in hardware - they take place in the block software itself, before entering the hardware phase. In hardware the Gateway In blocks become top level input ports. The Gateway Out block converts Xilinx xed-point inputs into outputs of type Simulink integer, double or xed-point. In hardware these blocks become top level output ports or are discarded, depending on how they are congured.
3.2
Adding the System Generator Token
Every System Generator diagram requires that at least one System Generator token is placed on the diagram. This block is not connected to anything but serves to drive the FPGA implementation process. The property editor for this block allows specication of the target netlist, device, performance targets and system period (Figure 4). System Generator will issue an error if this block is absent.
Figure 4: System Generator Token.
Some of the parameters specic to the System Generator block are as follows (the best is to leave them unchanged): Compilation Species the type of compilation result that should be produced when the code generator is invoked. Default settings: Bitstream.
Part Denes the FPGA part to be used. Default settings: Spartan3E xc3s250e-4tq144. Target directory Denes where the System Generator stores the compilation results. The important thing is that the target directory is placed on the PCs local hard drive and not on the network, since compiling to a network drive makes the compiler run very slow. Once the le is compiled, you can safely move it to your home directory on the network drive. Default settings: /tmp/netlist. Synthesis tool Species the tool to be used to synthesize the design. The possibilities are Synplicitys Synplify Pro and Synplify, and Xilinxs XST. Default settings: XST. Hardware Description Language Species the HDL language to be used for compilation of the design. The possibilities are VHDL and Verilog. Default settings: VHDL. FPGA clock period (ns) Denes the period in nanoseconds of the hardware clock. The period is passed to the Xilinx implementation tools through a constraints le, where it is used as the global PERIOD constraint. Default settings: 50. Clock pin location Species a location of the clock pin on an FPGA. Default settings: P125. Block icon display Species the type of information to be displayed on the block icon. The block icon is updated with the selected display option after the design has been compiled. Default settings: Default. Simulink system period Important: The system period is NOT given in seconds but in units of the FPGA clock period! If both the FPGA and the Simulink frequency need to be 20 MHz, then FPGA clock period (ns) should be set to 50, and Simulink system period to 1. If, for example, the Simulink frequency need to be only 10 MHz then Simulink system period should be set to 2. Default settings: 1.
3.3
Creating the DSP Design
Once the FPGA boundaries have been established using the Gateway blocks, the DSP design can be constructed using blocks from the Xilinx DSP blockset. Standard Simulink blocks are not supported for use within the Gateway In/Gateway Out blocks. You will nd a rich set of lters, FFTs, FEC cores, memories, arithmetic, logical and bitwise blocks available for use in constructing DSP designs. Each of these blocks are cycle and bit accurate. Once the design is completed, the hardware implementation les can be generated using the Generate button available on the System Generator token properties editor. One option is to select HDL Netlist which allows the FPGA implementation steps of RTL synthesis and place and route to be performed interactively using tool specic user interfaces. Alternatively, you can select Bitstream as the Compilation target and System Generator will automatically perform all implementation steps.
3.4
Creating Input Vectors using MATLAB
Simulink is built on top of MATLAB allowing the use of the full MATLAB language for input signal generation and output analysis. You can use the From Workspace and To Workspace blocks from the Simulink Source and Sink libraries. Input values must be specied as an n row 2 column matrix where the rst column is the simulation time and the second column includes the input values. This is a very popular way of generating input vectors for System Generator designs (Figure 5).
Figure 5: Creating Input Vectors using MATLAB.
3.5
MCode Block
One of the blocks that deserves a special introduction is an MCode block. It is a container for executing a user-supplied MATLAB function within Simulink. The block executes an M-code to calculate block outputs during a Simulink simulation. The blocks Simulink interface is derived from the MATLAB function description and from the block mask parameters. There is one input port for each parameter of the function and one output port for each value the function returns. Port names and ordering correspond to the names and ordering of parameters and return values. The MCode block supports a limited subset of the MATLAB language that is useful for implementing arithmetic functions, nite state machines and control logic. It has the following three primary coding guidelines that must be followed: All block inputs and outputs must be of Xilinx xed-point type. The block must have at least one output port. The code for the block must exist on the MATLAB path or in the same directory where the model le that uses that block is. 6
To illustrate the functionality of the MCode block, we show a simple example that performs z = max(x, y ) function. The le xlmax.m contains function xlmax, given as:
function z = xlmax(x, y) if x > y z = x; else z = y; end
An MCode block based on the function xlmax will have input ports x and y and output port z . Figure 6 shows how to set up an MCode block to use the function xlmax.
Figure 6: MCode Block Properties.
Some of the MATLAB language constructs that MCode block supports are: Assignment statements. Simple and compound if/else/elseif statements. switch statements. Arithmetic expressions involving only addition and subtraction. Addition, Subtraction, Multiplication, Division by power of two. Relational operators. Logical operators. For the rest of the MATLAB constructs/functions that can be used in MCode le please refer to the Xilinx System Generator help documentation. To further illustrate functionality of the MCode block we give an example for constructing a simple Finite State Machine (FSM) that detects the pattern 1101 7
0/0 seen_none
0
1/0
0/0
0/0
1
seen_110
seen_1
1/1 1/0 0/0
2
1/0
seen_11
Figure 7: FSM for detecting 1101 pattern.
in an input stream of bits. Figure 7 shows a behavioral function of the FSM. The M-function that is used by the MCode block contains a transition function, which computes the next state based on the current state and the current input. The M-function in this example denes persistent state variables to store the state of the nite state machine in the MCode block. The following M-code, which denes function detect 1101 is contained in le fsm.m:
% This FSM detects the 1101 sequence % Bits are loaded in a serial manner function matched = detect 1101 (d in) seen seen seen seen none 1 11 110 = = = = 0; 1; 2; 3;
% the state is a 2bit register persistent state, state = xl state(seen none, { xlUnsigned, 2, 0}); matched = 0; switch state case seen none if d in == 1 state = seen 1; else state = seen none; end case seen 1 if d in == 1
state = seen 11; else state = seen none; end case seen 11 if d in == 1 state = seen 11; else state = seen 110; end case seen 110 if d in == 1 state = seen 1; matched = 1; else state = seen none; end otherwise state = seen none; end end
The previous M-code has an internal state variable that holds its value from one simulation step to the next. A state variable is declared with the MATLAB keyword persistent and must be initially assigned with an xl state function call. The state variable is declared as persistent, and the rst assignment to state is the result of the xl state invocation. The xl state function takes two arguments. The rst is the initial value and must be a constant. The second is the precision of the state variable. In our example, the line persistent state, state = xl state(seen none, xlUnsigned, 2, 0) denes a variable state as a 2-bit unsigned variable (register) with the binary point at the position 0 and initializes it with the value seen none. Figure 8 shows the complete solution to the previous example. Note that, since we have 4 states, 2 bits are sucient for the state variable. There is no need in allocating more bits. However, supposing we had 5 states, we would have to allocate 3 bits giving 8 possible states. In other words, we would have 3 unused states. Even though they are unused, it is very important to include them in the switch-case block: Suppose you only dene states 0 to 4 and after the power-up, the FPGA starts in state 6. The FSM will always remain there since the transition from state 6 is not dened in this case. Therefore, you must also dene states 5 to 7. A logical implementation would be to switch to a well-dened state, for example the starting state. Instead of adding a separate case block for every unused state, you can also use the otherwise block. This will make sure that we do not forget any of the unused states. In the above code example, the otherwise block has also been added, even though with four states, there are no unused states. Nevertheless it is a good practice to always include one, since it avoids errors when adding or removing states later on. A more detailed explanation of the MCode block is given in the Xilinx System Generator help documentation.
3.6
Reinterpret, Convert, Concat, Slice and BitBasher blocks
Besides an MCode block, we would like to give a closer overview of few more blocks that will be useful during the Dragon project. Since some of their 9
Figure 8: Finite State Machine for detecting 1101 pattern.
features are not very obvious at the rst glance, we will outline their properties and try to avoid any ambiguity.
Figure 9: Reinterpret, Convert, Concat, Slice and BitBasher blocks.
3.6.1
Reinterpret block
As its name states, the Reinterpret block forces the bits of an input signal to a new type without any regard for the numerical value or location of the binary point. It basically reinterprets the data type of the input signal. The block allows for unsigned data to be reinterpreted as signed data and vice versa. It also allows for the reinterpretation of the datas scaling, through the repositioning of the binary point within the data. It is important to note that this block does not change the number of input bits (the number of bits at the output will always be the same as the number of bits at the input). In hardware, this block does not consume any resources. An example of this blocks use is as follows: if the input type is 6 bits 10
wide and signed, with 2 fractional bits and the output type is forced to be unsigned with 0 fractional bits, then an input of -1.5 (1110.10 in binary, twos complement) would be translated into an output of 58 (111010 in binary). The block parameters are: Force Arithmetic Type When checked, the output type will be forced to the arithmetic type chosen according to the setting of the Output Arithmetic Type parameter. When unchecked, the arithmetic type of the output will be unchanged from the arithmetic type of the input. Force Binary Point When checked, the binary point position of the output will be forced to the position supplied in the Output Binary Point parameter. When unchecked, the arithmetic type of the output will be unchanged from the arithmetic type of the input. 3.6.2 Convert block
The Convert block converts each input signal to a value of a desired arithmetic type. For example, a number can be converted to a signed (twos complement) or unsigned value. In contrast to the Reinterpret block, this block may change the number of input bits and can convert any type of the input to any type of the output. In short, the block will try to preserve the input value if possible. An example of this blocks use is as follows: if the input type is 6 bits wide and signed, with 2 fractional bits and the output type is forced to be signed, 6 bits wide with 4 fractional bits, then an input 1110.10 (-1.5 in decimal) will be translated into an output 10.1000 (again -1.5 in decimal). We provide another example to illustrate that one has to be careful when using the Convert block: if the input type is again 6 bits wide and signed, with 2 fractional bits and the output type is forced to be unsigned, 6 bits wide with 0 fractional bits, then the same input 1110.10 (-1.5 in decimal) will be translated into an output of 111110 (62 in decimal). The input is rst quantized (truncated) and set to 1110 as the output is specied to have 0 fractional bits. Then, the sign extension is applied to expand the input to 6 bits and the nal result we get is 111110. The block parameters are: Output Precision Determines the arithmetic type of the output signal. You can choose Boolean, Unsigned or Signed (twos complement) type. If the Unsigned or Signed format is chosen, you can further specify the total Number of bits and the Binary point. Quantization Quantization errors occur when the number of fractional bits is insucient to represent the fractional portion of a value. The options are to Round to the nearest representable value (or to the value furthest from zero if there are two equidistant nearest representable values), or to Truncate (i.e. to discard bits to the right of the least signicant representable bit). Overow Overow errors occur when a value lies outside the representable range. For overow the options are to Saturate to the largest positive/smallest negative value, to Wrap (i.e. to discard bits to the left of the most signicant representable bit), or to Flag as error (an overow 11
as a Simulink error) during simulation. Flag as error is a simulation only feature. The hardware generated is the same as when Wrap is selected. In hardware, rounding and saturating require resources, truncating and wrapping do not. 3.6.3 Concat block
The Concat block concatenates n input ports (2 n 1024) into a single output port at the bit level. The rst and the last input ports are labeled hi and lo, respectively. Input ports between these two ports are not labeled. The input to the hi port will occupy the most signicant bits of the output and the input to the lo port will occupy the least signicant bits of the output. All the inputs need to be Unsigned type with the binary points at zero position. There is only one block parameter labeled as Number of inputs which species the number of input ports n. The Reinterpret block provides signed-to-unsigned conversion capabilities that can extend the functionality of the Concat block. 3.6.4 Slice block
The Slice block allows you to slice o a sequence of bits from your input data and create a new data value. This value is presented as the output of the block. The output data type is unsigned with its binary point at zero. The block provides several mechanisms by which the sequence of bits can be specied. Parameters specic to the block are as follows: Width of slice (Number of bits) Species the number of bits to extract. Boolean output Tells whether single bit slices should be type Boolean. Specify range as Allows you to specify either the bit locations of both end-points of the slice (Two bit locations) or one end-point along with number of bits to be taken in the slice (Upper bit location + width or Lower bit location + width). Oset of top bit Species the oset for the ending bit position from the LSB, MSB or binary point. Relative to Species the bit slice position relative to the MSB, LSB, or Binary point of the top or the bottom of the slice. Oset of bottom bit Species the oset for the ending bit position from the LSB, MSB or binary point. An example of this blocks use is as follows: If the input signal is 16 bits wide and signed with 13 fractional bits, then the following settings will always result in slicing o a sequence of bits as represented in Figure 10: Specify range as: Two bit locations; Oset of top bit: 11, Relative to: LSB of input; Oset of bottom bit: 7, Relative to: LSB of input.
12
Specify range as: Two bit locations; Oset of top bit: -2, Relative to: Binary point of input; Oset of bottom bit: -6, Relative to: Binary point of input. Specify range as: Two bit locations; Oset of top bit: -4, Relative to: MSB of input; Oset of bottom bit: -8, Relative to: MSB of input. Width of slice (number of bits): 5; Specify range as: Upper bit location + width; Oset of top bit: 11, Relative to: LSB of input. Width of slice (number of bits): 5; Specify range as: Upper bit location + width; Oset of top bit: -2, Relative to: Binary point of input. Width of slice (number of bits): 5; Specify range as: Upper bit location + width; Oset of top bit: -4, Relative to: MSB of input. Width of slice (number of bits): 5; Specify range as: Lower bit location + width; Oset of bottom bit: 7, Relative to: LSB of input. Width of slice (number of bits): 5; Specify range as: Lower bit location + width; Oset of bottom bit: -6, Relative to: Binary point of input. Width of slice (number of bits): 5; Specify range as: Lower bit location + width; Oset of bottom bit: -8, Relative to: MSB of input.
15 Sp p1 p0 11 7 0
p-1 p-2 p-3 p-4 p-5 p-6 p-7 p-8 p-9 p-10 p-11 p-12 p-13 Sliced Bits
Figure 10: An example for the Slice block.
3.6.5
BitBasher block
The BitBasher block performs slicing, concatenation and augmentation of inputs attached to the block. The operation to be performed is described using Verilog and the block may have up to four output ports. The number of output ports is equal to the number of expressions. The block does not cost anything in hardware. The block parameters dialog box can be invoked by double-clicking the icon in your Simulink model. Parameter specic to the Basic tab is BitBasher Expression that species a bitwise manipulation expression based on Verilog syntax. Multiple expressions (limited to a maximum of 4) can be specied using new line as a separator between expressions. Parameters specic to the Output tab are as follows: Output: Refers to the port on which the data type is specied. Output type: Arithmetic type to be forced onto the corresponding output. 13
Binary Point: Binary point location to be forced onto the corresponding output. For further help on the BitBasher block and Verilog syntax please refer to the Xilinx System Generator help documentation.
Exercises
The rst three exercises of this section are intended to help students in further understanding of areas such as xed-point arithmetic, nite state machines and multi-rate systems. The last exercise will be used to test the FPGA board. All the four exercises need to be completed using the Xilinx System Generator tool.
4.1
Fixed-Point Arithmetic
The most convenient way to represent a real number r is by the means of oating-point representation as it is shown in the following equation: r = m be . Here, m is known as the mantissa, b is the base and e is the exponent. However, in majority of commercially available processors on the market today there is no hardware support for oating-point arithmetic. This is due to the cost of extra silicon that needs to be added for the Floating Point Unit (FPU). By implementing algorithms using xed-point (integer) arithmetic, a signicant improvement in execution speed can be observed as most of the processors have the ecient Arithmetic Logic Units (ALU) that support the xed-point arithmetic. In the real world, most of the physical signals around us are usually represented using the real number (oating-point) representation. On the other hand, making an ecient algorithm that will process the mentioned signals implies the use of an ecient xed-point arithmetic. Now, the question that comes naturally is: How do we represent a oating-point (real) number by the means of xed-point arithmetic? What we need to make sure is that the signals are represented accurately and that we do not lose any signicant information. The problem can be easily solved by using large amounts of memory to store the required information with enough accuracy. This, of course, is not a good approach as the increased need for the storage elements will slow down the whole computation process and increase the hardware size. Hence, it is of a crucial importance to precisely estimate what is the accuracy we really need when performing a xed-point arithmetic. Figure 11 depicts a xed-point representation of a real number r. To be more specic, this is a signed twos complement xed-point representation. The most signicant bit S represents a sign bit and is equal to 0 for positive and 1 for negative numbers, respectively. The whole number representation has w bits where I bits, including the sign bit, are used for the integer part and F bits serve to represent the fractional part of the number. The value of r can be evaluated as:
I 2
r = 2I 1 S +
i=F
2i ri .
14
Sign bit
Binary point
w-1
S
0 rI-2 2I-2
r1 r0 21 20
r-1 2-1
r-F+1 2-F+1
r-F 2-F
2I-1
Integer Word length
Fraction
Figure 11: Twos complement xed-point representation.
Next, we will learn how to evaluate the number of necessary bits to represent a given real number r, |r| R with desired accuracy. First, we consider the twos complement xed-point representation as shown in Figure 11. Let us assume that the number of bits used to represent the integer part of the number (including the sign bit) is equal to I and the number of bits used for the representation of the fractional part is equal to F . To assure that the integer part is large enough to store the value of r, the following equation needs to be satised: 2I 1 Int(r) 2I 1 1 , where Int(r) represents the integer part of r. By solving the previous equation we get the relation for size of the integer part: I = log2 R + 2 . (1)
Second, if the real number r is greater than or equal to 0 (0 r R), we can use an unsigned number representation as it is shown in Figure 12. The
Binary point
w-1 rI-1 rI-2 2I-1 2I-2
0 r1 r0 21 20 r-1 2-1
r-F+1 2-F+1
r-F 2-F
Integer Word length
Fraction
Figure 12: Unsigned xed-point representation.
15
value of r can be evaluated as:

I 1
r=
i=F
2i ri .
Again, to assure that the integer part is large enough, the following equation needs to be satised: 0 Int(r) 2I 1 . Since for this case we dont need to use an additional bit for the sign information, the solution of the previous equation is given as: I = log2 R + 1 . (2)
To evaluate the number of bits for the fractional part of the real number r, we need to dene a resolution for a xed-point variable. The resolution (denoted with ) is governed by the following equation: = 1 2F
where F is the number of bits required for a particular resolution. Therefore, the minimal number of bits for a fractional part is given by: F = log2 1 . (3)
Equation 3 is used both for the twos complement and the unsigned xed-point representation. Example 1 - Fixed-Point Arithmetic Let us consider a simple example with two 8-bit variables 1 x < 1 and 2 y < 2 with as much resolution as possible. The rst variable x is ranging from 1 to 0.9921875 and the second variable y is ranging from 2 to 1.984375. Therefore, for x we can write Ix = 1, Fx = 7 while for y it holds Iy = 2, Fy = 6 where Ix (Iy ) is the number of bits for integer and Fx (Fy ) is the number of bits of fractional part of variable x (y ), respectively. Figure 13 further illustrates the properties of variables x and y .
Sx 20
x-1
x-2
x-3
x-4
x-5
x-6
x-7
2-1 2-2 2-3 2-4 2-5 2-6 2-7 y0 20 y-1 2-1 y-2 2-2 y-3 2-3 y-4 2-4 y-5 2-5 y-6 2-6
Sy 21
Figure 13: Twos complement xed-point representation of x and y . Next, we will learn how to add and multiply two signed xed-point variables. An addition is a pure integer type of operation, but care must be taken to 16
align the binary points and attention must be paid to handling overow of the addition. To calculate z = x + y we rst need to estimate the number of bits for the integer and the fractional part of z (Iz and Fz ). Given the boundaries for x and y we can evaluate that the variable z is ranging from 3 to 2.9765625 (3 z < 3). Following Equations 1 and 3 we can estimate Iz and Fz as: Iz = log2 3 + 2 = 3 , Fz = log2 1 =7 .
We choose such that we have as much resolution as possible. This basically means that Fz is evaluated as Fz = max(Fx , Fy ). To avoid possible errors that can occur due to the overow of the addition, we need to apply a sign extension to both variables x and y , and to align the binary points by shifting the variables to the right. Figure 14 shows the variables x and y after the mentioned transformations.
Sign Extension Aligned Binary Point
x y + z
Sx 22 Sy 22
Sx 21 Sy 21
Sx 20 y0 20
x-1 2-1 y-1 2-1
x-2 2-2 y-2 2-2
x-3 2-3 y-3 2-3
x-4 2-4 y-4 2-4
x-5 2-5 y-5 2-5
x-6 2-6 y-6 2-6
x-7 2-7
Sz 22
z1 21
z0 20
z-1
z-2
z-3
z-4
z-5
z-6
z-7
2-1 2-2 2-3 2-4 2-5 2-6 2-7
Figure 14: Fixed-point addition z = x + y . As it is shown in Figure 14, the result z is represented using 10 bits and that is more than we had for representing variables x and y (8 bits). Therefore, we need to reduce the resolution of z by truncating the 2 least signicant bits of z (z7 and z6 ). This truncating will lower the accuracy of the result z and hence, z will range from 3 to 2.96875. A xed-point multiplication is simpler than the xed-point addition. When performing a xed-point multiplication p = xy the product p is 2w bits long if the multiplier x and multiplicand y are w bits long. The integer and the fractional bits of the product are calculated in a very simple manner: Ip = Ix + Iy =1+2=3 , Fp = Fx + Fy = 7 + 6 = 13 . The xed-point multiplication is again a pure integer type of operation and it is depicted in Figure 15. Again, if there were only 8 bits for representing the 17
result, we would need to truncate the product by discarding 8 least signicant bits (p6 . . . p13 ).
x y
Sx
x-1 x-2 x-3 x-4 x-5 x-6 x-7
20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 Sy 21 y0 y-1 y-2 y-3 y-4 y-5 y-6
20 2-1 2-2 2-3 2-4 2-5 2-6
p
Sp 22 p1 21 p0 20 p-1 p-2 p-3 p-4 p-5 p-6 p-7 p-8 p-9 p-10 p-11 p-12 p-13 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13
Figure 15: Fixed-point multiplication p = x y .
4.1.1
Exercise 1 Fixed-Point Arithmetic
Using Xilinx System Generator implement the Arithmetic Unit (AU) that will perform the following arithmetic operation: A=BC +D (4)
where the range limits of the inputs are given as |B | 1.8, |C | 1 and |D| 2.8. All inputs are represented as 8-bit signed twos complement numbers. The word size of the AU is 8 bits. Each operand needs to be represented with as much resolution as possible.
18
4.2
Finite State Machine
To give a theoretical denition we can say that a nite state machine (FSM) is a model of behavior composed of a nite number of states where the next state is computed based on the current state and the current input. The nite state machines are widely used for controlling the complex systems. 4.2.1 Exercise 2 Finite State Machine
A goal of this exercise is to make a simple FSM that controls the trac lights. The FSM has four states (RED, GREEN, ORANGE and IDLE) and a behavioral description as given in Figure 16. A default state of the trac lights is IDLE and it changes to RED when the start signal arrives. If the reset signal is generated the state of the trac lights has to be changed to IDLE. Both start and reset signals are provided as the impulse signals and are active on 1. The other transitions between states are determined by the value of the counter. Duration of the RED state is 10 cycles, GREEN is 20 and ORANGE lasts for 2 cycles. The FSM should be implemented as an MCode block with three inputs (rst, start, cntr) and two outputs (light, rst cntr). The counter should also be implemented as a separate block.
start=0 IDLE start=1 ( )
rst=1 (
rst=1 ( cntr=TORANGE rst=1 ( cntr=TGREEN )
) RED cntr=TRED
cntr<TRED
ORANGE cntr<TORANGE
GREEN
cntr<TGREEN
Figure 16: The FSM of the trac lights.
19
4.3
Multi-Rate Systems
A typical example of the multi-rate systems is a base-station receiver in a mobile phone network as shown in Figure 17. The tower has multiple antennas to provide a full coverage of the area. The diagram shows that this results in two receiver channels. In each of these channels, there is some form of complex mixing, resulting in real and imaginary channels. Often DSP systems such as this one, need to down sample the input signals prior to the digital ltering steps performed during equalization and demodulation. Doing so, one can simplify the lter design and hardware signicantly.
Figure 17: Base-station receiver in a mobile phone network.
4.3.1
Exercise 3 Multi-Rate Systems
After completing this exercise, you will be able to change the sample rates in a DSP system, convert a serial stream of data into a parallel word and convert a parallel word into a serial stream of data. Open a new Simulink diagram and create the simple diagram shown below. Use the Counter Limited block from the Simulink Sources library and set the upper limit of the counter to 10. Set the quantization of the Gateway In block to Fix 8 0. Simulate the counter for 10 simulation cycles and observe the results.
Figure 18: Initial set-up.
As shown in Figure 19, add a Downsample block from the Xilinx Blockset Index library between the Gateway In and Gateway Out blocks, then resimulate the design. What do you observe? 20
Figure 19: Down sampling.
Replace the Down Sample block with an Up Sample block and re-simulate the design. The System Generator token is going to generate an error that indicates your sample rate is incorrect. Why? Double-click on the System Generator token and change the Simulink System Period to 1/2. Re-simulate the design. Add Sample Time probes from the Xilinx Blockset Index library before and after the Up Sample block and connect the outputs of the probes to the Simulink Sinks as shown in Figure 20. These probes do not add any hardware to the design, but oer a powerful debugging tool for complex multi-rate systems. Re-simulate the design to observe the sample rate in the Display sinks.
Figure 20: Up sampling. In the next two steps, you will explore the rate changing eects of using the Serial to Parallel and Parallel to Serial blocks from the Xilinx Blockset. Open a new blank model, then create a design shown in Figure 21. Set the limit on the Counter Limited block to 1. This will simply generate a sequence 01010101010101 . . . Set the output of the Serial to Parallel block to UFix 8 0. Explain the diagrams you get on the scope. What happens if you change the input of Serial to Parallel to be 1-bit (by changing the Gateway In parameters)? Now, replace the Serial to Parallel block with the Parallel to Serial block. Leave the output quantization at the default UFix 1 0 and change the sample rate in the System Generator token from 1 to 1/8. Re-simulate the design and record the input and output sample rates. Explain the diagrams on the scope. 21
Figure 21: Serial-to-Parallel conversion.
4.4
FPGA Board Testing
A goal of this exercise is to learn basics about the FPGA board that will be used later, during the Dragon project. For that purpose we will use additional documents that describe the FPGA board. The material can be found at Toledo, under the Cursus Informatie, section Introductiesessies / Fase 3 - System Generator. Two simple exercises will be demonstrated to you. First, we will check the functionality of the LEDs on board and second, the USB connection will be tested.
Practical suggestions
When creating System Generator les for the Dragon project, start from the le template.mdl, which can be found on Toledo. This le already denes all the correct settings for the System Generator token and makes some pin connections to avoid problems.
References
[1] Xilinx, Xilinx System Generator Help Documentation, http://www.xilinx. com/ [2] Erick L. Oberstar, Fixed-Point Representation and Fractional Math, Oberstar Consulting, 2007.
22

System Generator

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

System Generator

Uploaded by

Copyright:

Available Formats

An introduction to Xilinx System Generator

and for 32-bit machines type

Design Creation Basics

Figure 2: Launching Simulink.

Dening the FPGA boundary

Adding the System Generator Token

Figure 4: System Generator Token.

Creating the DSP Design

Creating Input Vectors using MATLAB

Figure 5: Creating Input Vectors using MATLAB.

Figure 6: MCode Block Properties.

1/1 1/0 0/0

Figure 7: FSM for detecting 1101 pattern.

Reinterpret, Convert, Concat, Slice and BitBasher blocks

Figure 8: Finite State Machine for detecting 1101 pattern.

Figure 9: Reinterpret, Convert, Concat, Slice and BitBasher blocks.

Figure 10: An example for the Slice block.

Integer Word length

Figure 11: Twos complement xed-point representation.

w-1 rI-1 rI-2 2I-1 2I-2

Integer Word length

Figure 12: Unsigned xed-point representation.

value of r can be evaluated as:

x-1 2-1 y-1 2-1

x-2 2-2 y-2 2-2

x-3 2-3 y-3 2-3

x-4 2-4 y-4 2-4

x-5 2-5 y-5 2-5

x-6 2-6 y-6 2-6

2-1 2-2 2-3 2-4 2-5 2-6 2-7

x-1 x-2 x-3 x-4 x-5 x-6 x-7

20 2-1 2-2 2-3 2-4 2-5 2-6

Figure 15: Fixed-point multiplication p = x y .

Exercise 1 Fixed-Point Arithmetic

Finite State Machine

start=0 IDLE start=1 ( )

rst=1 ( cntr=TORANGE rst=1 ( cntr=TGREEN )

Figure 17: Base-station receiver in a mobile phone network.

Exercise 3 Multi-Rate Systems

Figure 18: Initial set-up.

Figure 19: Down sampling.

Figure 21: Serial-to-Parallel conversion.

FPGA Board Testing

You might also like