You are on page 1of 35

Vivado HLS

Update

Copyright 2013 Xilinx


.

Vivado High-Level Synthesis:


Accelerated IP Generation and Integration
C based IP Creation

User Preferred System Integration Environment

C,C++
C++or
orSystemC
SystemC
C,

System Generator for DSP

C Libraries
Floating point

math.h
Fixed point
OpenCV

VHDLor
orVerilog
Verilog
VHDL

Vivado IP Integrator
Vivado
IP Catalog

Vivado RTL Integration

Page 2

Copyright 2013 Xilinx


.

Vivado HLS Video Libraries

C Video Libraries
Available within Vivado HLS header files
hls_video.h library
hls_opencv.h library

Enable Migration of OpenCV Designs into Xilinx FPGA


Libraries target real-time Full HD video processing
Libraries support standard AXI4 Interfaces for easy system integration

Page 3

Copyright 2013 Xilinx


.

Video Library: 12 New Functions


AXI4-Stream IO Functions

Video Data Modeling


Linebuffer class

Window class

AXIvideo2Mat

Mat2AXIvideo

OpenCV Interface Functions


cvMat2AXIvideo
IplImage2AXIvideo

AXIvideo2cvMat
AXIvideo2IplImage

cvMat2hlsMat
IplImage2hlsMat

hlsMat2cvMat
hlsMat2IplImage

CvMat2AXIvideo

AXIvideo2CvMat

CvMat2hlsMat

hlsMat2CvMat

Video Functions
AbsDiff
AddS
AddWeighted
And
Avg
AvgSdv
Cmp
CmpS
CornerHarris
CvtColor
Dilate

Page 4

Duplicate
EqualizeHist
Erode
FASTX
Filter2D

MaxS
Mean
Merge
Min
MinMaxLoc
GaussianBlur
MinS
Harris
Mul
HoughLines2
Not
Integral
PaintMask
InitUndistortRectifyMap Range
Max
Reduce

Copyright 2013 Xilinx


.

Remap
Resize
Scale
Set
Sobel
Split
SubRS
SubS
Sum
Threshold
Zero

C Test Bench: Interface Library


Interface Libraries convert to/from OpenCV image to HLS type
HLS MAT format: synthesizable and AXI4 Stream support
Standard OpenCV
files, formats & types

#include
#include "hls_opencv.h"
"hls_opencv.h"
//Top
//Top Level
Level CC Function
Function
int
main
(int
argc,
int main (int argc, char**
char** argv)
argv) {{
IplImage*
IplImage*
IplImage*
IplImage*

src
src
dst
dst

==
==

HLS Video Libraries

cvLoadImage(INPUT_IMAGE);
cvLoadImage(INPUT_IMAGE);
cvCreateImage(cvGetSize(src),
cvCreateImage(cvGetSize(src), src->depth,
src->depth, src->nChannels);
src->nChannels);

Convert to Xilinx AXI4


Video Stream

AXI_STREAM
AXI_STREAM src_axi,
src_axi, dst_axi;
dst_axi;
IplImage2AXIvideo(src,
IplImage2AXIvideo(src, src_axi);
src_axi);
image_filter(src_axi,
image_filter(src_axi, dst_axi,
dst_axi, src->height,
src->height, src->width);
src->width);
AXIvideo2IplImage(dst_axi,
AXIvideo2IplImage(dst_axi, dst);
dst);

Convert Xilinx AXI4


Video Stream back to
OpenCV types

cvSaveImage(OUTPUT_IMAGE,
cvSaveImage(OUTPUT_IMAGE, dst);
dst);

Page 5

Function to Synthesize

Copyright 2013 Xilinx


.

C Function to Synthesize
HLS Video Library Functions
Drop-in Replacement for OpenCV and provide High QoR
#include
#include "hls_video.h"
"hls_video.h"
HLS Video & AXI Struct Libraries
#include
"ap_axi_sdata.h";
#include "ap_axi_sdata.h";
//Top
//Top Level
Level CC Function
Function for
for Synthesis
Synthesis
void
void image_filter(AXI_STREAM&
image_filter(AXI_STREAM& inter_pix,
inter_pix, AXI_STREAM&
AXI_STREAM& out_pix,
out_pix, int
int rows,
rows, int
int cols)
cols) {{
//Create
AXI
streaming
interfaces
for
the
core
//Create AXI streaming interfaces for the core
RGB_IMAGE
RGB_IMAGE img_0(rows,
img_0(rows, cols);
cols);
..etc..
..etc..
RGB_IMAGE
RGB_IMAGE img_5(rows,
img_5(rows, cols);
cols);
RGB_PIXEL
pix(50,
50,
50);
RGB_PIXEL pix(50, 50, 50);
#pragma
#pragma HLS
HLS dataflow
dataflow
hls::AXIvideo2Mat(inter_pix,
hls::AXIvideo2Mat(inter_pix, img_0);
img_0);

Convert Xilinx AXI4 Video Stream to


HLS Mat data type

hls::Sobel(img_0,
hls::Sobel(img_0, img_1,
img_1, 1,
1, 0);
0);
hls::SubS(img_1,
pix,
img_2);
hls::SubS(img_1, pix, img_2);
hls::Scale(img_2,
hls::Scale(img_2, img_3,
img_3, 2,
2, 0);
0);
hls::Erode(img_3,
img_4);
hls::Erode(img_3, img_4);
hls::Dilate(img_4,
hls::Dilate(img_4, img_5);
img_5);

HLS Video functions are drop-in


replacement for OpenCV function &
provide high QoR

hls::Mat2AXIvideo(img_5,
hls::Mat2AXIvideo(img_5, out_pix);
out_pix);

Convert HLS Mat type to Xilinx AXI4


Video Stream

}}

Page 6

Copyright 2013 Xilinx


.

Application Note XAPP1167


Accelerating OpenCV Applications with Zynq using
Vivado HLS Video Libraries
Video Processing data types
Compares Video Architectures
Advantages of Video Streaming
Review Video Interfaces
Reference Design with source files
and project directories

Download XAPP1167 from Xilinx.com


QuickTake: Leveraging OpenCV and High-Level
Synthesis with Vivado
Page 7

Copyright 2013 Xilinx


.

Accelerator AXI Interconnect


Zynq PS

IP Control from ARM


AXI4-Lite & GP Port

HLS Accelerator

GP Port

High Throughput Access to


Memory

AXI4 Lite

Zynq PS
HP Port

AXI4-Stream using AXI-DMA


AXI4-Master

ACP Port

The Accelerator is the master

HLS Accelerator
AXI
DMA

AXI4 Stream

Zynq PS

External Memory Access : HP


L2 Cache Access: ACP

HP Port
ACP Port

Data transfer between HLS


IP blocks
AXI4-Stream
.

Copyright 2013 Xilinx


.

HLS Accelerator
AXI4 Master

IP Integrator Supported

IP Integrator Requires an Early


Access License in 2013.1

Vivado HLS IP can be exported to IP Integrator


Export to the Vivado IP Catalog (was previously called IP-XACT format)
Data types supported: IPI can propagate

Add to IP Catalog
Vivado HLS IP

Vivado IP Integrator (IPI)

Export to Vivado
IP Catalog

Add IP block
& connect up

Supported with Two New Tutorials


Page 9

Copyright 2013 Xilinx


.

HLS IP Integration
IP Integrator (IPI) Public Release 2013.2
HLS Output Fully Supported in IPI
Three Tutorials on using HLS IP inside IPI
Two connect HLS IP to the Zynq PS; One connects HLS IP with Xilinx IP

HLS IP Blocks are identified in IPI

HLS and System


Generator IP shown
inside IPI

Page 10

Copyright 2013 Xilinx


.

Improved Software Driver Support


Software Drivers are Created for AXI4-Lite interfaces
Now includes support for Linux Systems
Drivers are also now created for Vivado IP Catalog format

Add all files to the software


project: ifdef statements ensure
automatic configuration

Files are in
the Drivers
sub-directory

Page 11

Copyright 2013 Xilinx


.

Enhanced Report File


Easier to find hot-spots
The term throughput has been changed to Interval or Initiation Interval
All reports and documentation

Top-Level function
Latency and Interval
Latency and Interval for
all instances at this
level of hierarchy
All loops and sub-loops
at this level of hierarchy

Page 12

Copyright 2013 Xilinx


.

Analysis Perspective
A New Perspective for Design Analysis
Allows Interactive Analysis

Module Hierarchy
Hierarchical Summary
and Navigation

Performance View
Scheduled operations.

Loops : shown in Yellow are


expandable and collapsible
Modules: shown in Green
open the view on sub-blocks

Performance Profile
Latency and Interval
summary for this block

Page 13

Copyright 2013 Xilinx


.

Performance View
Hierarchical Navigation

Loop Hierarchy

Operations, loops and


functions

Page 14

Select operations and rightclick to cross reference with


the C source and HDL

Scheduled States

Copyright 2013 Xilinx


.

Resource Analysis

Resource View

Scheduled operations
associated with resource:
anything on the same row
shares the same resource

Resource Profile

Resource summary for this


block

Page 15

Copyright 2013 Xilinx


.

Analysis Perspective Tutorials


Fully Supported by Two New Tutorials
Design Analysis
Design Optimization

Page 16

Copyright 2013 Xilinx


.

Assertion Support
Assertions are supported for Synthesis
Can be used to define bit-widths for synthesis
Replaces the need for a Tripcount directive
Without Assertions

With Assertions

SUM_X:for
SUM_X:for (i=0;i<=xlimit;
(i=0;i<=xlimit; i++)
i++) {{
X_accum
X_accum +=
+= A[i];
A[i];
X[i]
=
X_accum;
X[i] = X_accum;
}}

assert(xlimit<32);
assert(xlimit<32);
SUM_X:for
SUM_X:for (i=0;i<=xlimit;
(i=0;i<=xlimit; i++)
i++) {{
X_accum
+=
A[i];
X_accum += A[i];
X[i]
X[i] == X_accum;
X_accum;
}}
assert(ylimit<16);
assert(ylimit<16);
SUM_Y:for
SUM_Y:for (i=0;i<=ylimit;
(i=0;i<=ylimit; i++)
i++) {{
Y_accum
+=
B[i];
Y_accum += B[i];
Y[i]
Y[i] == Y_accum;
Y_accum;
}}

SUM_Y:for
SUM_Y:for (i=0;i<=ylimit;
(i=0;i<=ylimit; i++)
i++) {{
Y_accum
Y_accum +=
+= B[i];
B[i];
Y[i]
=
Y_accum;
Y[i] = Y_accum;
}}
** Loop
Loop Latency:
Latency:
+----------+-----------+----------+
+----------+-----------+----------+
|Target
|Target IIII |Trip
|Trip Count
Count |Pipelined
|Pipelined ||
+----------+-----------+----------+
+----------+-----------+----------+
|-|- SUM_X
||
SUM_X |1
|1 ~~ 256
256 |no
|no
|-|- SUM_Y
|1
~
256
|no
|
SUM_Y |1 ~ 256 |no
|
+----------+-----------+----------+
+----------+-----------+----------+

Page 17

Loop
Loop Latency:
Latency:
+----------+-----------+----------+
+----------+-----------+----------+
|Target
|Target IIII |Trip
|Trip Count
Count |Pipelined
|Pipelined ||
+----------+-----------+----------+
+----------+-----------+----------+
|-|- SUM_X
||
SUM_X |1
|1 ~~ 32
32 |no
|no
|-|- SUM_Y
||
SUM_Y |1
|1 ~~ 16
16 |no
|no
+----------+-----------+----------+
+----------+-----------+----------+

Copyright 2013 Xilinx


.

Index counter
hardware is
accurately
sized

Improved Tutorials
Vivado HLS is now provided with 10 Tutorials
22 Labs which cover all aspects of Vivado HLS
Tutorial

Summary

Design

Introduction

Basic walkthrough of GUI operations (Csim, Synth, RTL


Sim, IP package)
C simulation and using the debugger
Explain design, port and AXI interface synthesis (simple
HLS design to allow analysis of IO)
Review of a floating point and fixed windowing algorithm

FIR

Using the Analysis Perspective to optimize performance


of multi-hierarchy, multi-loop design.
Improving performance using pipelining at loop and
function level and impact of IO.
Verify and view trace files using Vivado Xsim and
Modelsim (incl. Floating Point simulation)
Connecting to an IP core using IPI

DCT

C Validation
Interface Synthesis
Arbitrary Precision
Design Analysis
Design Optimization with Pipelining
RTL Verification
Creating IP for an IP Integrator Design
Creating IP for a Zynq Design
Creating IP for a System Generator
Design

Page 18

Connecting to Zyqn with IPI and integrating driver files


into SDK design (interrupt handling etc).
Packaging a design for Sys Gen and verifying IO in Sys
Gen (connecting interfaces etc.)

Copyright 2013 Xilinx


.

Filter Window
Sorter Design
Hamming Window

Matrix Multiplier
DUC
Windower, FFT IP
Core, Sorter
Accelerator
YUV

Improved AXI4 & SystemC Support


SystemC
AXI4 Master, Streams and Lite protocols now supported
Lite: Use the RESOURCE directive to assign ports (as C/C++)
Stream: Use the RESOUCE directive on sc_fifo_in and sc_fifo_out ports
Master: Use the AXI4M_bus_port class

AXI4M_bus_port<sc_fixed<32, 8> > bus_if;


Difference between SystemC and Vivado AP types fully documented
SystemC design no longer require to be explicitly specified
The add_files -type option retired (and check-box in the GUI C/C++ or SystemC)

AXI4 Master Interface


Now supported on Array ports
Array ports can be synthesized with ap_bus IO protocol

Page 19

Copyright 2013 Xilinx


.

RTL cosimulation of Floating Point Designs


Floating Point Designs
The IEEE operators are now in the RTL simulation model
This requires the Xilinx IEEE library is used when RTL-cosimulation is
performed

Auto Support provided: No Action Required

SystemC RTL
Verilog and VHDL using the Xilinx Vivado (Xsim) simulator
Verilog and VHDL using the Mentor Graphics ModelSim simulator
Verilog and VHDL using the Xilinx Isim simulator.

All other 3rd party HDL simulators


The libraries must be pre-compiled before simulating floating point designs
Open Vivado and refer to : compile_simlib help
Note: this is Vivado, not Vivado HLS

Page 20

Copyright 2013 Xilinx


.

DSP48 Adder Resource

Adders supported for implementation in DSP48


Adders in the C code can be targeted to a AddSub_DSP RESOURCE
Ensures the adder or subtractor is implemented in a DSP48

Resource Specification
Targets the adder or subtractor to a DSP48 Resource
(*
(* USE_DSP48
USE_DSP48 == "YES"
"YES" *)
*)
module
module adders_add_32ns_32ns_32_1_AddSub_DSP_0
adders_add_32ns_32ns_32_1_AddSub_DSP_0 (a,
(a, b,
b, s);
s);
endmodule
endmodule
module
module adders_add_32ns_32ns_32_1(
adders_add_32ns_32ns_32_1( )
)
adders_add_32ns_32ns_32_1_AddSub_DSP_0
adders_add_32ns_32ns_32_1_AddSub_DSP_0 U1
U1 ((
.a(
.a( din0
din0 ),),
.b(
.b( din1
din1 ),),
.s(
.s( dout
dout ));
));
endmodule
endmodule
Page 21

Copyright 2013 Xilinx


.

DSP48 Adder Implementation

Adders /Subtractors Targeted to a DSP48


Solution 1

Page 22

Solution 2

Copyright 2013 Xilinx


.

FFT and FIR IP in HLS

The Xilinx FFT and FIR IP are available in Vivado HLS


C simulates with a bit-accurate model
Fully configurable within the C++ source code
Pre-defined C++ structs allow the IP to be configured & accessed

Supported only for C++


Implemented with templates

High-Quality Implementation
Same hardware as implemented by RTL versions of this IP
Functionality fully described in Xilinx Documentation
LogiCORE IP Fast Fourier Transform v9.0 (document PG109)
LogiCORE IP FIR Compiler v7.1 (document PG149)

Page 23

Copyright 2013 Xilinx


.

IP Examples
Examples Included in Vivado HLS Release
Access from the Welcome Screen
Or from C:\Xilinx\Vivado_HLS\2013.3\examples\design
Assuming the standard PC install path

Examples IP Designs
1024-point FFT and Inverse FFT (fixed point)
Single FFT 1024-point (fixed point)
FIR with 2 interleaved channels
3 FIRs connected in series (HB, HB, SRRC)
Updating coefficients using FIR CONFIG channel
SRRC (Square Root Raise Cosine) FIR filter

Page 24

Copyright 2013 Xilinx


.

FFT Function
Using the FFT
#include
#include "hls_fft.h
"hls_fft.h
hls::fft<STATIC_PARAM>
hls::fft<STATIC_PARAM> ((
INPUT_DATA_ARRAY,
INPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY,
OUTPUT_STATUS,
OUTPUT_STATUS,
INPUT_RUN_TIME_CONFIGURATION);
INPUT_RUN_TIME_CONFIGURATION);

//// Static
Static Parameterization
Parameterization Struct
Struct
//// Input
Input data
data fixed
fixed or
or float
float
//// Output
Output data
data fixed
fixed or
or float
float
//// Output
Status
Output Status
//// Input
Input Run
Run Time
Time Configuration
Configuration

Include the hls_fft.h library in the code


This defines the FFT and supporting structs and types
Allows hls::fft to be instantiated in your code

Use the STATIC_PARAM template parameter to parameterize the FFT


The STATIC_PARAM template parameter defines all static configuration values
The Library provides a pre-defined struct hls::ip_fft::params_t to perform this

Optionally modify the default parameters by creating a new user defined


STATIC_PARAM struct based on the default

Page 25

Copyright 2013 Xilinx


.

FIR Function
Using the FIR
#include
#include "hls_fir.h
"hls_fir.h
//// Create
Create an
an instance
instance of
of the
the FIR
FIR
static
static hls::FIR<STATIC_PARAM>
hls::FIR<STATIC_PARAM> fir1;
fir1;

//// Static
Static parameterization
parameterization

//// Execute
Execute the
the FIR
FIR instance
instance fir1
fir1
fir1.run(INPUT_DATA_ARRAY,
//// Input
fir1.run(INPUT_DATA_ARRAY,
Input Data
Data
OUTPUT_DATA_ARRAY);
//
Output
OUTPUT_DATA_ARRAY); // Output Data
Data

Include the hls_fir.h library in the code


This defines the FIR and supporting structs and types
Allows hls::FIR to be instantiated in your code
Unlike the FFT, the FIR is instantiated as a class and executed with the run method

Create the STATIC_PARAM template parameter to configure the FIR


The STATIC_PARAM template parameter defines all static configuration values
The library provides a pre-defined struct hls::ip_fir::params_t to perform this

There are no default values for the Coefficients


You Must Always create a user defined struct based on hls::ip_fir::params_t

Page 26

Copyright 2013 Xilinx


.

Using the FFT and FIR IP


FFT and FIR support pipelined implementations
The functions themselves cannot be pipelined
They should be parameterized for pipelined operation

The data arguments are always arrays


These will be implemented as AXI4 Streams in the RTL
By default, arrays are implemented as BRAM interfaces

Recommendation
Use these IP in regions where dataflow optimization is used
This will auto-convert the input and output arrays into streaming arrays

Alternatively, a Requirement:
The input and output arrays must be marked as streaming using the
command set_directive_stream (pragma STREAM)

Page 27

Copyright 2013 Xilinx


.

Fixed Point Math Functions


Further support for math functions
The hls_math.h library
Now includes fixed-point functions for sin, cos and sqrt

Type

Accuracy (ULP)

Implementation Style

cos

ap_fixed<32,I>

16

Synthesized

sin

ap_fixed<32,I>

16

Synthesized

sqrt

ap_fixed<W,I>
ap_ufixed<W,I>

Synthesized

Function

The sin and cos functions are all 32-bit ap_fixed<32,Int_Bit>


Where Int_Bit specifies the number of integer bits

The sqrt function is any width but must have a decimal point
Cannot be all intergers or all bits

The accuracy above is quoted with respect to the equivalent floating


point version
Page 28

Copyright 2013 Xilinx


.

AXI4 Stream Interface: Ease of Use

Native Support for AXI4 Stream Interfaces


Native = An AXI4 Stream can be specified with set_directive_interface
No longer required to set the interface then add a resource
This AXI4 Stream interface is part of the HDL after synthesis
This AXI4 Stream interface is simulated by RTL co-simulation
Interface Type axis is AXI4 Stream

set_directive_interface mode axis foo portA


Or
#pragma HLS interface axis port=portA

Page 29

Copyright 2013 Xilinx


.

Pre-2013.3 Approach to AXI Streams


#if 1
// Use New Method
#pragma HLS interface axis port=portA

Existing Functionality Deprecated


BUT NOT REMOVED!!
We dont want to break existing designs

#else
// Or use old Method
#pragma HLS interface ap_fifo port=portA
#pragma HLS resource core=AXI4Stream variable=portA \
metadata="-bus_bundle Agroup
#end

Warning:
If you use the method for adding AXI4 Streams before 2013.3
This is were you set the interface as a FIFO then add an AXI Resource

You will get a FIFO interface in the RTL


And the AXI4 Stream adapter is added during export_design

Recommendation
Change existing AXI4 Stream directives to use the INTERFACE
directive
Page 30

Copyright 2013 Xilinx


.

AXI4 Master Interface: Pipeline Support


Transaction involving an AXI4 Master Interface is now Pipelined
Prior to 2013.3 this interface would not pipeline
Each transfer was an atomic process
The for-loop/memcpy waits until a transfer completes before starting next transfer
This was the limiting factor in the pipeline interval

Improved performance in 2013.3


Accesses to an AXI master interface can now be pipelined
The performance will be much better than before

Further improvements in 2014.1


Existing limitations: Cannot configure the based address, infer bursts, reads
and writes cannot be performed simultaneously (sequential only)
We expect to get more performance in 2014.1
At that time well publish statistics and make more noise about this feature

Page 31

Copyright 2013 Xilinx


.

Enhanced Support for Exporting IP


Sys Gen and AXI Stream Interfaces
Design with AXI Stream interfaces now
be exported to System Generator
The AXI Interfaces will be present and
can be connected
Previously, AXI interfaces were not
supported in Sys Gen

AXI Lite Drivers


Software drivers are now included in
the IP package
When creating a local repository in
SDK simply point to the IP package
No need to manually copy files
Further EoU enhancements coming

Page 32

Copyright 2013 Xilinx


.

New Clang Front-end


Vivado HLS has upgraded its front-end parser
Now using clang instead of gcc
Provides 64-bit support on windows
In addition this enables continued growth of features and functionality
More optimizations possible, messages can reference line and column etc.

Clang Side-effect: Different command options


The new front-end does not support all gcc flags
For example, -fpermissive is now ignored as this is not supported by clang
If an option is not supported but provided, it will be ignored
Clang Options: http://
clang.llvm.org/docs/UsersManual.html#command-line-options

Clang Side-Effect: More strict Syntax Checking


Some existing working designs may fail
Not expected to occur often, but is possible
Example fpermissive workaround : memcpy(dest, src), if src is volatile
pointer, cast it to a constant pointer to pass syntax checking

Page 33

Copyright 2013 Xilinx


.

Design Hubs: Easier Access to Documentation


DocNav Designs Hubs
Improved Ease-of-Use
Find things faster
Open Docs at the exact page

Standard
Introduction
Docs and
Videos

High Level Synthesis

Getting Started Videos


Tutorials
Key Concepts
FAQs
These and the solution center will
be updated in the coming weeks
Others such as Designing with
Video etc will be added
Ideas for topics are welcome

Page 34

App Notes and


Videos all
grouped

Copyright 2013 Xilinx


.

Thank You

Page 35

Copyright 2013 Xilinx


.

You might also like