You are on page 1of 18

Deep Learning (CNN)

on FPGA
Group members:
(NC) Ali Ahmad Qureshi
(NC) Ahmed
(NC) Syed Muhammad Saqib
(PC) Muhammad Zeeshan Jilani
Supervisors: Dr. Sajid Gul Khawaja, Lec Aamir Javed
INTRODUCTION

In modern age, AI applications are rapidly increasing as well


as the use of CNN’s but when it comes to real time
processing the results are not satisfying particularly in terms
of inference(testing) time.
PROBLEM STATEMENT

Designing a hardware accelerator for implementation of CNN


(Tiny YOLO as an example case) with respect to practical
applications such as self driving cars.
OBJECTIVES

Current objectives are:


 Development of our own custom architecture.
 Interfacing of the USB camera with the zed board.
 Implementation of a complete CNN on hardware.
 Testing , which will be performed on the zed board.
STATE OF ART
 Intel has launched Xeon processor which is coupled with
FPGA.
 Microsoft launches BrainWave (acceleration framework) on
Azure that can run on Tensorflow based on Catapult (FPGA
cloud).
 Xilinx buys DeePhi, startup that was deploying DNN models
to FPGA using techniques like Deep Compression, Network
Pruning (DNNDK)
 Xilinx invests in TeraDeep that provides RTL acceleration
code for SoC.
 Even NVIDIA open-sources it's hardware blocks
implemented in Verilog under codename NVLDA that can
be deployed.
STATE OF ART
SYSTEM LEVEL DESIGN
SYSTEM LEVEL DESIGN(INTERNAL WORKING)
PROGRESS
We started our design using C++ and implemented all the components of Neural
Nets from the very scratch.
PROGRESS
A single layer of tiny yolo was successfully implemented using Vivado HLS which
converts the C code into RTL logic which can then be imported in form IP in
Vivado. Shown below is the block design of 1st layer of tiny YOLO.
PROGRESS
Application written in Xilinx SDK.
PROGRESS
Camera interfacing is also partially being done using petalinux.
PROGRESS
In parallel, single layer has been implemented in verilog for optimizing the use of
available resources.
PROGRESS
Unquantized(32 bit float)
• mAP=57

Quantized 20 bits
• mAP = 48

Quantized 18 bits
• mAP=40.1

Quantized 16 bits
• mAP=30.8

Quantized 14 bits
• mAP=25.9
PROGRESS

20 bits

18 bits

16 bits

14 bits
THINGS TO DO NEXT

We have implemented one layer of our CNN on hardware,


next we will be aiming to complete the rest of the layers
and then move on to the testing phase.
TIMELINE
THANK YOU

You might also like