You are on page 1of 23

INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 1

Software-defined Radios:
Architecture, State-of-the-art, and Challenges
Rami Akeela, and Behnam Dezfouli
Internet of Things Research Lab, Department of Computer Engineering, Santa Clara University, USA
Email: rakeela@scu.edu, bdezfouli@scu.edu

Abstract—Software-defined Radio (SDR) is a programmable central significance for designers in the industry due to wire-
transceiver with the capability of operating various wireless less protocols that advance rapidly and consistently, hence
arXiv:1804.06564v1 [cs.NI] 18 Apr 2018

communication protocols without the need to change or update requiring a hardware that can keep up with the evolution. For
the hardware. Progress in the SDR field has led to the escalation
of protocol development and a wide spectrum of applications, example, the authors in [5] proposed a platform, OpenRadio,
with more emphasis on programmability, flexibility, portability, for programming both PHY and MAC layers while offering
and energy efficiency, in cellular, WiFi, and M2M communication. a high level of abstraction. Rather than including yet another
Consequently, SDR has earned a lot of attention and is of equipment to deal with a new standard or recurrence band,
great significance to both academia and industry. SDR designers the equipment of a formerly introduced platform can adjust
intend to simplify the realization of communication protocols
while enabling researchers to experiment with prototypes on to the particulars of another standard. In a military scenario,
deployed networks. This paper is a survey of the state-of-the- for example, the needs of these platforms change in light of
art SDR platforms in the context of wireless communication terrible conditions that develop during a mission, which may
protocols. We offer an overview of SDR architecture and its not have been predicted when designed initially, leading to the
basic components, then discuss the significant design trends development and utilization of new protocols.
and development tools. In addition, we highlight key contrasts
between SDR architectures with regards to energy, computing Software-defined Radio (SDR) is a technology for radio
power, and area, based on a set of metrics. We also review existing communication. This technology is based on software-defined
SDR platforms and present an analytical comparison as a guide wireless protocols, as opposed to hardware-based solutions.
to developers. Finally, we recognize a few of the related research This translates to supporting various features and functionali-
topics and summarize potential solutions. ties, such as updating and upgrading through reprogramming,
Index Terms—SDR, Wireless Communication, Programmabil- without the need to replace the hardware on which they
ity, Co-design, LTE, WiFi, IoT. are implemented. This opens the doors to the possibility of
realizing multi-band and multi-functional wireless devices.
I. I NTRODUCTION The driving factors for the high demand of SDR include
DVANCES in wireless technologies have altered con- network interoperability, readiness to adapt to future updates
A sumers communication habits. Wireless technologies are
an essential part of users daily lives, and its effect will become
and new protocols, and more importantly, lower hardware and
development costs. In a report [6], the SDR market is projected
even larger in the future. In a technical report, the World to be more than USD 29 billion by the year 2021. Global
Wireless Research Forum (WWRF) has predicted that 7 trillion Industry Analysts [7] highlights some of the market trends
wireless devices for 7 billion people will be deployed by for SDR as follows: (i) increasing interest from the military
2020 [1]. When these huge numbers of devices are connected sector in building communication systems and large-scale
to the Internet to form an Internt of Things (IoT) network, deployment in developing countries, (ii) growing demand for
the first challenge is to adjust the basic connectivity and public safety and disaster preparedness applications, and (iii)
networking layers to handle the large numbers of end points. building virtualized base stations (BSs). SDRs are also ideal
There is an increasing number of wireless protocols that have for developing future space communications [8]–[10], Global
been developed, such as ZigBee, BLE, LTE, and new WiFi Navigation Satellite System (GNSS) sensors [11], Vehicle-to-
protocols for loT and Machine-to-Machine (M2M) communi- Vehicle (V2V) communication [12]–[14], and IoT applications
cation purposes due to different demanding requirements, one [15], [16], where relatively small and low-power SDRs can be
of which is high energy efficiency [2]. Wireless standards, utilized.
in general, are adapting quickly in order to accommodate SDRs are implemented through employing various types
different user needs and hardware specifications [3], [4]. This of hardware platforms, such as General Purpose Processors
has called for a transceiver design with the ability to handle (GPPs), Graphics Processing Units (GPUs), Digital Signal
several protocols, including the existing ones and those being Processors (DSPs), and Field Programmable Gate Arrays
developed. In order to accomplish this task, one needs to (FPGAs). Each of these platforms is associated with their
realize the protocols’ need for a flexible, re-configurable, and own set of challenges. Some of these challenges are: utilizing
programmable framework. the computational power of the selected hardware platform,
Both consumer enterprise and military frameworks have keeping the power consumption at a minimum, ease of design
a need for programmable platforms. Programmability is of process, and cost of tools and equipment. The research com-
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 2

TABLE I Digital Front End RF Front End Antenna


K EY A BBREVIATIONS

ADC Analog-to-Digital Converter Signal Interpolation Digital RF Power


DAC
ASIC Application-specific Integrated Circuit Processing Filter Mixer Up-Converter Amplifier
BS Base Station - FFT/IFFT
CUDA Compute Unified Device Architecture - FIR/IIR Digital
Digital
- Decoding Up
DAC Digital-to-Analog Converter Converter
Local
Oscillator
DSP Digital Signal Processor (DUC)
FFT Fast Fourier Transform
FLOPS Floating Point Operations Per Second (a) Transmitter
FPGA Field Programmable Gate Array
GPP General Purpose Processor
GPU Graphics Processing Unit RF Front End Digital Front End
Antenna
HLS High Level Synthesis
NFV Network Function Virtualization
SDR Software-defined Radio Power RF Digital Lowpass Signal
SDR Software-defined Network ADC
Amplifier Tuner Mixer Filter Processing
SNR Signal-to-noise Ratio
SoC System on Chip Digital
Digital
USRP Universal Software Radio Peripheral Down
Local
Digital Converter
Oscillator
Analog (DDC)

(b) Receiver
munity and industry have both developed SDRs that are based
on the aforementioned hardware platforms. A few examples Fig. 1. SDR architecture. Sub-figure (a) shows SDR from a receiver’s point
include USRP [17], Sora [18], Atomix [19], Airblue [20], of view, and sub-figure (b) shows SDR from a transmitter’s point of view.
and WARP [21]. Each SDR is unique with regards to the
design methodology, development tools, performance, and end
application. digital signal processing, and digital front end, which includes
In this paper, we first present an overview of SDR archi- channel selection and modulation/demodulation, take place
tecture as well as the analog and digital divides of the system in the digital domain. This is usually performed in software
and interconnection of components. We then introduce the running on processors, such as GPPs and DSPs. However, it
criteria based on which the different hardware platforms are can also run on programmable hardware, i.e., FPGAs.
classified. We thoroughly examine the architecture and design In general, from the transmitter point of view, a baseband
approaches employed by these hardware platforms and present waveform needs to be produced, then an Intermediate Fre-
their strengths and weaknesses in the context of SDR imple- quency (IF) waveform, generate an RF waveform, then send
mentation. Furthermore, we provide an analytical comparison it through the antenna. From the receiver point of view, this RF
of hardware platforms as a guide for design decision making. signal is sampled and de-modulated, then decoded. To provide
Moreover, we discuss the use of respective development tools more details to the process, we study the receiving end of the
and present a summary to help explain their functionalities system as follows.
and the platforms they support. Afterwards, we review the The RF signal from the antenna is amplified, with a tuned
SDR platforms developed by both industry and academia, and RF stage that amplifies a range of the frequency band. This
analyze and compare them using the criteria discussed before. amplified RF signal is then converted to an analog IF signal.
Finally, we identify current challenges and the open research The Analog-to-Digital Converter (ADC) digitizes this IF signal
topics related to future SDR development. into digital samples. Then, it is fed into a mixer stage. The
This paper is organized as follows: Section II provides a mixer has another input coming from a local oscillator with a
description of SDR architecture as well as the classification frequency set by the tuning control. The mixer then translates
process used to summarize the various design approaches the input signal to a baseband. The next stage is essentially
adopted. Section IV lists some of the corresponding develop- a FIR filter that permits only one signal. This filter limits the
ment tools and platforms. Section V presents an analysis and signal bandwidth and acts as a decimating low-pass filter. The
comparison of the commercially and academically developed digital down-converter includes a large number of multipliers,
SDR platforms. Research questions and future trends are adders, and shift-registers, in the hardware in order to accom-
highlighted in Section VI. Section VII presents an analysis plish the aforementioned tasks. Next, the signal processing
of the existing literature on SDR surveys. We conclude the stage performs tasks such as de-modulation and decoding.
paper in Section VIII. A list of key abbreviations used in this This stage is typically handled by a dedicated hardware like
paper can be found in Table I. an Application Specific Integrated Circuit (ASIC) or other
programmable alternatives like FPGA or DSP [22].
II. C ONCEPTS AND A RCHITECTURE As shown in Figure 1 (a) and (b), at a high level, a typical
In this section, we examine the general architecture of SDR transceiver consists of the following components: Signal
SDRs, their main components, and their processing require- Processing, Digital Front End, Analog RF Front End, and an
ments. As explained in the previous section, SDRs play a vital antenna.
role in wireless standard development due to their flexibility 1) Antenna: SDR platforms usually employ several anten-
and ease of programmability. This is due to the fact that most nas to cover a wide range of frequency bands [23]. Antennas
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 3

are often referred to as ”intelligent” or ”smart” due to their In a SDR transceiver, the following tasks are executed in the
ability to select a frequency band and adapt with mobile digital front end:
tracking or interference cancellation [22], [24]. In the case of – In the transmitting side (Figure 1(a)), the Digital Up
SDRs, an antenna needs to meet a certain list of requirements Converter (DUC) translates the aforementioned baseband
such as self-adaptation (i.e., flexibility to tuning to several signal to IF. The DAC that is connected to the DUC then
bands), self-alignment (i.e., beamforming capability), and self- converts the digital IF samples into an analog IF signal.
healing (i.e., interference rejection) [24]. Afterwards, the RF up-converter converts the analog IF
2) RF Front End: This is a RF circuitry that its main func- signal to RF frequencies.
tion is to transmit and receive the signal at various operating – In the receiving side (Figure 1(b)), the ADC converts
frequencies. Its other function is to change the signal to/from the IF signal into digital samples. These samples are
the Intermediate Frequency (IF). The process of operation is subsequently fed into the next block, which is the Digital
divided into two, depending on the direction of the signal (i.e., Down Converter (DDC). The DDC includes a digital
Tx or Rx mode): mixer and a numerically-controlled oscillator. DDC ex-
tracts the baseband digital signal from ADC, and after
– In the transmission path, digital samples are converted
being processed by the Digital Front End, this digital
into an analog signal by the Digital-to-Analog Converter
baseband signal is forwarded to a high-speed digital
(DAC), which in turn feeds the RF Front End. This analog
signal processing block [29].
signal is mixed with a preset RF frequency, modulated,
and then transmitted. 5) Signal Processing: Signal processing operations, such
– In the receiving path, the antenna captures the RF signal. as encoding/decoding, interleaving/deinterleaving, modula-
The antenna input is connected to the RF Front End tion/demodulation, and scrambling/descrambling, are per-
using a matching circuitry to guarantee an optimum formed in this block. Encoding for the channel serves as an
signal power transfer. It then passes through a Low Noise error correcting code. Specifically, the encoded signal includes
Amplifier (LNA), which resides in a close proximity to redundancy that is utilized by the receiver’s decoder to re-
the antenna, to amplify weak signals and minimize the construct the original signal from the corrupted received sig-
noise level. This amplified signal, with a signal from the nal. Examples of error correcting codes include Convolutional
Local Oscillator (LO), are fed into the mixer in order to Codes, Turbo Codes, and Low Density Parity Check (LDPC)
down convert it to the IF [25]. [30]. The decoder constitutes the most computationally inten-
sive part of the Signal Processing block, due to data transfer
3) Analog-to-Digital and Digital-to-Analog Conversion: and memory schemes [31]. The second part that is regarded
The DAC, as mentioned in the previous section, is responsible as highly complex and expensive (in terms of area and power)
for producing the analog signal to be transmitted from the is the Fast Fourier Transform (FFT) and Inverse FFT (IFFT),
digital samples. On the receiver side, the ADC resides, and is as part of the modulation phase [32].
an essential component in radio receivers. The ADC is respon- The signal processing block is commonly referred to as
sible for converting continuous-time signal to a discrete-time, the baseband processing block. When discussing SDRs, the
binary-coded signals. ADC performance can be described by baseband block is at the heart of the discussion, as it makes
various parameters [26], [27] including: (i) Signal-to-Noise up the bulk of the digital domain of the implementation. This
Ratio (SNR): the ratio of signal power to noise power in implementation runs on top of a hardware circuitry that is
the output, (ii) resolution: number of bits per sample, (iii) capable of processing signals efficiently. Examples include
Spurious-free Dynamic Range (SFDR): the strength ratio of ASICs, FPGAs, DSPs, GPPs, and GPUs. The second part
the carrier signal to the next strongest noise component or spur, of the implementation is the software, which provides the
and (iv) power dissipation. Advances in SDR development functionality and high-level abstractions to execute signal
have provided momentum for ADC performance improve- processing operations. In the next section we examine the
ments. For example, since ADC’s power consumption affects aforementioned hardware platforms and analyze in detail the
the lifetime of battery-powered SDRs, more energy efficient various design approaches.
ADCs have been developed [28].
4) Digital Front End: The Digital Front End has two main
functions [27]: III. D ESIGN A PPROACHES

– Sample Rate Conversion (SRC), which is a functionality In this section, we discuss the classification of the various
to convert the sampling from one rate to another. This is SDR design methodologies of the baseband processing block,
necessary since the two communication parties must be namely GPP, GPU, DSP, FPGA, and co-design based method-
synchronized. ologies. In this classification, we analyze and compare SDR
– Channelization, which includes up/down conversion in platforms based on a set of performance metrics in a criteria
the transmitter and receiver side, respectively. It also which we introduce. This criteria includes:
includes channel filtering, where channels that are divided – Flexibility and Reconfigurability. The capability for the
by frequency are extracted. Examples include interpola- modulation and air-interface algorithms and protocols to
tion and low-pass filters, as can be observed in Figure evolve by merely loading new software onto the platform
1. [10].
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 4

– Adaptability. The SDR platform can adjust its capabilities GPUs are processors specifically designed to handle graphics-
as network or traffic operational requirements change. related tasks and they efficiently process large blocks of
– Computational Power. The processing rate of the SDR streaming data in parallel. SDR platforms comprised of both
platform, namely Giga Operations per Second (GOPS). GPPs and GPUs are flexible and have a higher level of pro-
– Energy Efficiency. The total power consumption (typi- cessing power. However, this results in a lower level of power
cally within a few hundreds milli watts), especially for efficiency (e.g., GPP’s power efficiency is ∼9GFLOPS/W for
mobile [33] and IoT deployments. single-precision, compared to 20GFLOPS/W for GPU [42]).
– Cost. The total cost of the SDR platform, including GPUs act as co-processors to GPPs because a GPP is required
time-to-market, development, and hardware costs. to act as the control unit and transfer data from external
memory. After a transfer is completed, signal processing
algorithms are executed by the GPU.
While GPUs are typically used for processing graphics, they
A. GPP-based
are also useful at signal processing algorithms. Over the past
One of the first approaches to realizing SDR platforms is few years, theoretical peak performance for GPUs and GPPs
using a General Purpose Processor (GPP), or the commonly for single and double precision processing has been growing
known generic computer microprocessors such as x86/64 and [43]. For example, comparing Intel Haswell’s 900 GFLOPs
ARM architectures. Examples of SDR platforms that utilize [44] with NVIDIA GTX TITAN’s 4500 GFLOPS [45] for
GPPs include Sora [18], KUAR [34], and USRP [17]. single precision, it is apparent that GPUs have a computational
1) Definition and Uses: A GPP is a digital circuit that is power that far exceeds their GPP counterparts [43]. Their
clock-driven and register-based, and is capable of processing multi-core architectures and parallel processors are the main
different functions and operates on data streams represented attractive features, in addition to their relatively reasonable
in binary system [35]. These GPPs can be used for several prices and credit card sizes. These features makes them
purposes, making them extremely useful for unlimited number good candidates as co-processors in GPP-based SDRs, where
of applications, eliminating the need for building application- they can play a vital role in accelerating computing-intensive
specific circuits, and thus reducing the overall cost of run- blocks [46]. Another advantage is their power efficiency, which
ning applications. GPPs are generally a preferable hardware keeps improving with every new model (e.g., it went from
platform by researchers in academia due to their flexibility, 0.5 to 20GFLOPS/W for single-precision) [42]. To take full
abundance, and ease of programmability, which is one of advantage of GPUs, it is a condition that algorithms conform
the main requirements in SDR platforms [36]. In addition, to their architecture. From an architectural perspective, GPUs
researchers prefer GPPs since they are more familiar with them have a number of advantages that makes them preferable so-
and their software frameworks, compared to DSPs and FPGAs. lutions to applications such as video processing. In particular,
From the performance point of view, GPPs are being enhanced GPUs employ a concept called Single Program Multiple Data
rapidly, credited not only to technological advances in terms of (SPMD) that allows multiple instruction streams to execute
CMOS technology [37], but also to the increase of the average the same program. In addition, due to their multi-threading
number of instructions processed per clock cycle. The latter is scheme, data load instructions are more efficient. GPUs also
achieved through different means, and in particular, utilizing present a high computational density, where cache to ALU
parallelism within and between processors. This has led to the ratio is low [47].
evolution of multi-core GPPs [38]. In Table II, the authors of [46] confirmed that the signal
2) Adoption and GPUs: Architecturally, the instruction set detection algorithm (which includes intensive FFT computa-
of GPPs include instructions for different operations such as tions) shows a faster parallel processing in the case of GPU
Arithmetic and Logic Unit (ALU), data transfer, and I/O. over GPP, while operating in real-time. This is due to the
A GPP processes these instructions in the sequential order. availability of cuFFT library developed for NVIDIA GPUs
Because of sequential processing, GPPs are not convenient for for more efficient FFT processing [48]. With regards to the
high-throughput computing with real-time requirements (i.e., architectural advantage of GPUs, several hundred CUDA cores
high throughput and low latency) [39]. For example, using can perform a single operation at the same time, as opposed
GNU Radio [40] to implement IEEE 802.11 standard, which to a few cores in the case of multi-core GPPs.
requires 20MHz sampling rate, would be challenging, since Examples of using GPUs alongside GPPs to build SDR
GNU Radio is restricted by the limited processing capabilities platforms is the work in [49], where the authors built a
of GPPs. This leads to the GPP cores (of the PC attached) to framework on a desktop PC in addition to using a GPU to
reach saturation and frames become corrupted and discarded. implement an FM receiver. The authors in [46] studied real-
Moreover, wireless protocols require predictable performance time signal detection using an SDR platform composed of
in order to guarantee meeting timing constraints. However, a laptop computer and an NVIDIA Quadro M4000M [45].
conditional branch instructions in GPP’s instruction sets lead Examples of GPUs available in the market can be found
to out-of-order execution, which makes it unfeasible to achieve in Table III. In this table, we show two examples of high
predictability. performing GPUs (> 5500 GFLOPS) suitable for SDRs with
To remedy the limitation of GPPs, researchers have pro- strict timing and performance requirements. We also show
posed multiple solutions, one of which is the addition of two more examples of less powerful and less expensive GPUs
co-processors, such as Graphic Processing Unit (GPU) [41]. suitable for prototyping SDRs in academic environments.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 5

TABLE II
P ERFORMANCE OF S IGNAL D ETECTION A LGORITHM ON GPP AND GPU [46]

Processing Platform of Signal Detection Algorithm


ADC Data Length (ms)
GPP Serial Processing (ms) GPP Parallel Processing (ms) GPU Parallel Processing (ms)
1 13.487 1.254 0.278
10 135.852 12.842 2.846
100 1384.237 131.026 29.358
1000 13946.218 1324.346 321.254

TABLE III
C OMPARISON OF GPU S

NVIDIA GeForce AMD Radeon NVIDIA GeForce AMD Radeon


GTX 980 Ti [45] R9 390X [50] GTX 680 [45] RX 560 [50]
GFLOPS 5632 5913 3090 2611
Power Consumption (W) 250 363 356 180
Frequency (MHz) 1000 1050 1006 1175
Cost (USD) 870 520 300 150

3) Shortcomings: State-of-the-art GPP and GPU-based modulation/demodulation, filtering, and encoding/decoding are
platforms, such as Sora [18] and USRP [17], utilize desktop commonly and frequently used in applications that include
computers to realize the systems. However, these platforms speech recognition, image processing, and communication
consume a significant amount of power for a performance systems. DSPs, however, implement them more quickly and
goal and their form factor is large, which makes it impossible efficiently due to their architecture (e.g., RISC-like architec-
for real-world deployments. It is worth noting that GPPs ture, parallel processing) which is specifically optimized to
and GPUs alike, present scaling limitation while meeting handle arithmetic operations, especially multiplications. Since
Koomey’s Law. This law states that energy efficiency of DSPs are capable of delivering high performance with lower
computers doubles roughly every 18 months [51]. This lim- power, they are better candidates for SDR deployment [58],
itation calls for alternatives that provide higher computing compared to GPPs. Examples of DSPs especially designed
power while keeping the energy efficiency the same. One for SDR platforms are TI TMS320C6657 and TMS320C6655.
alternative is the hybrid or co-design approach, where software These DSPs are both equipped with hardware accelerators for
and hardware implementations are combined. This will be complex functions like the Viterbi and Turbo Decoders [59].
discussed in more details in Section III-D. 2) Adoption: As discussed in the previous section, GPPs
When both GPP and GPU are used for a SDR design, data provide an average performance for a wide range of appli-
transfer operations between GPP and GPU can be bottlenecks cations. Needless to say, this performance level might be
and cause performance loss, especially for meeting real- sufficient for research and academia, but if the system is to
time requirements [52]. However, there are continuous efforts be deployed commercially, certain performance requirements
to reduce or eliminate the time overhead of data transfers must be met. To this end, compared to GPPs, DSPs are tailored
by introducing multi-stream scheduling for pipelining of the for processing digital signals efficiently, utilizing features
memory copy tasks. This would ensure no stalls in the pipeline like combined multiply-accumulate operations (MAC units)
and thus enhancing processing parallelism [53], [54]. Finally, and parallelism [60]. DSP manufacturers usually sell these
although the processing power of microprocessors is being products in two flavors: optimized for performance, and energy
constantly improved, balancing between sufficient computing optimized. Therefore, when used in SDRs, high performance
power and a specific goal for energy consumption and cost, and energy efficient products can be employed in BSs and
stays a very difficult task now and in the future. This is true edge devices, respectively.
especially with the growing need for more data to be processed In terms of the instruction set, DSPs can be categorized into
and blocks that can handle data processing in parallel. two groups: (i) Single Instruction Multiple Data (SIMD) archi-
tecture, and (ii) Multiple Instruction Multiple Data (MIMD)
architecture, as described by Michael J. Flynn in what is
B. DSP-based known as Flynn’s Taxonomy [61], [62]. This taxonomy is a
The DSP-based solution can be considered as a special method of classifying various architectures depending on the
case of GPP-based solutions, but due to its popularity and number of concurrent instructions and data streams, as follows:
unique processing features, it deserve a separate discussion. – A SIMD-based DSP can execute an instruction on mul-
An example of DSP-based SDR is the Atomix platform [19] tiple data streams at the same time. This architecture can
which utilizes TI TMS320C6670 DSP [55]. be very efficient in cases when there exists high data
1) Definition and Uses: DSP is a particular type of mi- parallelism within the algorithm [63]. This indicates that
croprocessor that is optimized to process digital signals [56]. there are similar operations that can be performed on
To help understand how DSPs are distinguished from GPPs, different datasets at the same time. Examples of SIMD-
we should first note that both are capable of implementing based DSPs include the Cell processor presented in [64]
and processing complex arithmetic tasks [57]. Tasks like which supports 256 GFLOPS. More examples of DSPs
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 6

that are optimized for low power are NXP CoolFlux DSP C. FPGA-based
[65] and Icera Livanto [66]. A SDR employing a SIMD
DSP is the SODA architecture [67]. It has been a common Another approach towards realizing SDRs is to use a
practice to add more cores in order to achieve a better programmable hardware such as FPGAs. Example of FPGA-
trade off between performance and power. With each based SDR platforms are Airblue [20], Xilinx Zynq-based
extra core utilizing Very Long Instruction Word (VLIW), implementation of IEEE 802.11ah [73], and [74] that used the
a higher level of parallelism can be accomplished as well. same FPGA board to implement a complete communication
– On the other hand, MIMDs have the ability to operate system with channel coding.
on multiple data streams executing multiple instructions 1) Definition and Uses: An FPGA is an array of pro-
at any point in time. This is essentially an extension grammable logic blocks, such as general logic, memory, and
of the SIMD architecture, where different instructions multiplier blocks, that are surrounded by a routing fabric,
or programs run on multiple cores concurrently. This is which is also programmable [75]. This circuit has the ca-
especially important and useful in cases where parallelism pability of implementing any design or function, with ease
is not uniform across different blocks, but MIMD ar- of updating it. Although FPGAs consume more power and
chitecture allows for parallel execution which leads to occupy more area than ASICs, the programmability feature is
speed improvements. Examples of MIMD-based DSPs the reason behind their increasing adoption in a wide range
include Texas Instruments SMJ320C80 and SM320C80 of applications. Furthermore, when the reconfiguration delay
DSPs with 100 MFLOPS [59]. is in the order of milliseconds, the SDR can switch between
different modes and protocols seamlessly [76]. Another major
Since DSPs are customized to meet certain signal difference is that, ASIC fabrication is expensive (at least a
processing-related needs, it is crucial to clarify these cus- few tens of thousands of dollars) and requires a few months,
tomizations in order to understand how DSPs stand out and whereas FPGAs can be quickly reprogrammed, and their cost
how they are successful at not only meeting the requirements, is within a few tens to a few thousands of dollars, at most.
but also becoming a vital player in the wireless communication The low end product cycle, along with attractive hardware
field. These customizations, which are mostly architecture- processing advantages, such as high speed performance, low
related, are as follows. power consumption, and portability, compared to processors
such as GPPs and DSPs, present FPGAs as contenders that
In [68], the authors discuss the energy efficiency of DSPs.
offer the best of both worlds [75].
In general, DSPs consume more power than ASICs, however,
there exists DSPs that are optimized for low power wireless In a study by [77], the authors compared the performance of
implementations, such as TI C674x DSP [59]. One of the Xilinx FPGAs [78] against 16-core GPPs. The calculation of
methods to lower power consumption is using multiple data peak performance for GPPs was performed through multiply-
memory buses (e.g., one for write, and two for reads). This ing the number of floating point function units on each core by
paves the way for higher memory bandwidth, and allows the number of cores and by the clock frequency. For FPGAs,
for multiple operand instructions, which in turn results in performance is calculated through picking a configuration,
fewer cycles. Also, as discussed above, VLIW architectures adding up the Lookup Tables (LUTs), flip-flops, and DSP
along with specialized instructions can provide a higher level slices needed, then multiplying them by the appropriate clock
of efficiency, and hence lower energy consumption. These frequency. The authors calculated the theoretical peaks for 64-
improvements can be seen in DSPs such as TI TMS320C6x bit floating point arithmetic and showed that Xilinx Virtex-7
[59] and ADI TigerSHARC [69]. These techniques, coupled FPGA is about 4.2 times faster than a 16-core GPP. This can
with proven power saving techniques like clock gating and been seen in Figure 2. Even with a one-to-one adder/multiplier
putting parts of or the entire system in sleep mode, further configuration, the V7-2000T achieved 345.35GFLOPS, which
reduce the power consumption. Examples of DSPs available in is better than a 16-core GPP. From Intel [44], Stratix 10
the market can be found in Table IV. In this table, we present FPGAs can achieve a 10 Tera FLOPS peak floating point
three examples of DSPs that do no include co-processors, and performance [79]. This is due to the fixed architecture of the
three DSP-based SoCs that, in addition to DSP cores, include GPP, where not all functional units can be fully utilized, and
extra soft cores as control processors. the inherent parallelism of FPGAs and their dynamic architec-
ture. In addition, despite having lower clock frequencies (up to
3) Shortcomings: Although DSPs have been widely 300MHz), FPGAs can achieve better performances due to their
adopted for SDR implementations for decades [72], they architectures which allow higher levels of parallelism through
present shortcomings as follows. As more applications call for custom design [80]. In a study by [81], the authors compared
increasing parallelism and reconfigurability in order to handle the performance and power efficiency of FPGAs to that of
computationally intensive tasks, DSPs can be insufficient. GPPs and GPUs using double-precision floating point matrix-
In addition, programming DSPs to achieve higher levels of vector multiplication. The results show that FPGAs are capable
parallelism predictability can be challenging. This opened the of outperforming the other platforms, while maintaining their
door for parallel architectures like FPGAs, or multi-core GPPs, flexibility. In another study by [47], the authors thoroughly
or even a hybrid of both, to be adopted for SDRs. In addition, analyzed and compared FPGAs against GPUs via the imple-
power consumption of DSPs is generally higher than FPGAs mentations of various algorithms. The authors concluded that
due to operating at high frequencies. although both architectures support a high level of parallelism,
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 7

TABLE IV
C OMPARISON OF DSP S AND DSP- BASED S O C S

DSP only SoC


TI C66x CEVA Analog Devices TI Keystone II Analog Devices Qualcomm Snapdragon 820
(TMS320C6652) [59] (XC-4500) [70] (ADSP-21369) [69] (66AK2G02) [59] (ADSP-SC573) [69] (Hexagon 680) [71]
GFLOPS 9.6 40 2.4 28.8 5.4 No Floating Point
Memory (Kb) 1088 No Info 2000 1024 768 No Info
Frequency (MHz) 600 1300 400 600 450 2000
Cost (USD) ∼25 No Info ∼20 ∼20 ∼20 ∼70
ARM ARM Qualcomm Kyro 385 (CPU)
Soft Core N/A N/A N/A
Cortex-A15 Cortex-A5 Adreno 530 (GPU)

700
through the implementation of several techniques discussed
GPP
600
FPGA (1-to-1)
in [76]. These techniques can be at a system, device, and/or
500
FPGA (Best) architecture level, such as clock gating and glitch reduction.
Table V presents a summary of widely-used FPGA platforms.
GFLOPS

400

V7-2000T
3) Shortcomings: One of the challenges of using FPGAs,
300
however, can be the prior knowledge about target hardware
200
V6-SX475T
architecture and resources that a developer needs to possess
100
V4-LX200
V5-SX240T Sixteen-core in order to design an application efficiently for FPGAs. In the
Six-core
0
Dual-core Quad-core SDR domain, designing the platform has typically been the
90nm 65nm 40-45nm
CMOS Technology
28-32nm
job of software engineers, and thus the process can be time-
consuming and less trivial to incorporate this experience into
Fig. 2. Peak performance of GPPs versus FPGAs when performing 64-bit hardware design. However, as it will be discussed in Section
floating point operations [77]. It can be observed that FPGAs increased their IV-A, adoption of FPGA solutions can be made more feasible
floating point performance by an order of magnitude compared to GPPs.
through HLS tools.

which is crucial to signal processing applications, FPGAs offer D. Hybrid Design (a.k.a., co-design)
a larger increase in parallelism, whereas GPUs have a fixed The fourth approach towards realizing SDRs is the hybrid
parallelism due to their data path and memory system. approach, where both hardware and software-based techniques
2) Adoption: Over the past decade, FPGAs have signifi- are combined into one platform. This is commonly referred
cantly advanced and become more powerful computationally, to as the co-design or hybrid approach. Examples of SDRs
and now exist in many differnet versions such as Xilinx that adopted the co-design approach include WARP [21] and
Kintex UltraScale [78] and Intel Arria 10 [44] [82], [83]. In CODIPHY [92].
addition, the availability of various toolsets gave FPGAs an 1) Definition: Hardware/software co-design as a concept
advantage by making them more accessible. This is supported has been around for over a decade, and it has evolved at a
by the availability of compilers that have the capability of faster rate in the past few years due to an increasing interest
generating Register-transfer Level (RTL) code, such as Verilog in solving integrated circuit design problems with a new and
and VHDL, that is needed to run on FPGAs, from high-level different approach. Even with GPPs becoming more powerful
programming languages. This process is typically referred to than ever, and with multi-core designs, it is clear that in order
as High Level Synthesis (HLS). Examples of such compilers to achieve higher performance and realize applications that
include HDL Coder [84] for MATLAB code [85] and Xilinx demand real-time processing, designers had to shift attention
HLS [86] or Altera Nios II C2H compiler [87] for C, C++, to new design schemes that utilize hardware solutions, namely,
and SystemC. We will explain some of these tools in Section FPGAs and ASICs [93], [94]. Co-design indicates the use
IV. of hardware design methodology, represented by the FPGA
HLS allows software engineers to design and implement fabric, and software methodology, represented by processors.
applications, such as SDRs, on FPGAs using a familiar pro- As more applications, such as automotive, communication,
gramming language to code, namely C, C++, SystemC, and and medical, grow in complexity and size, it has become
MATLAB, without the need to posses a prior rich knowledge a common practice to design systems that integrate both
about the target hardware architecture (refer to Section IV-A). software (like firmware and operating system) and hardware
These compilers can also be used to speed up or accelerate [95]. This has been made feasible in the recent years thanks to
parts of the software code running on a GPP or DSP that the advances in high-level synthesis and developing tools that
are causing slowdowns or setbacks to the overall performance. not only have the capability to produce efficient RTL from
This will be further discussed in Section III-D. Further, FPGAs software codes, but also define the interface between the both
can achieve high performance while still consuming less sides. The industry has realized the huge market for co-design,
energy than previously discussed processors [88] (e.g., Intel and provided various SoC boards, that in addition to the FPGA
Stratix 10 FPGA can achieve up to 100 GFLOPS/W [89], fabric, contain multiple processors. For example, the Xilinx
compared to 23 GFLOPS/W for NVIDIA GeForce GTX 980 Zynq board [78] includes an FPGA fabric as well as two ARM
Ti [45]). In addition, power dissipation can be further lowered Cortex-A9 processors [96]. In addition to the aforementioned
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 8

TABLE V
C OMPARISON OF FPGA S AND FPGA- BASED S O C S

FPGA only SoC


Xilinx Kintex-7 Intel Cyclone V GX Lattice ECP3-70 Xilinx Zynq-700 Intel Cyclone V SE SoC Microsemi SmartFuion2
(XC7K70T) [78] (C5) [44] (LFE3-70EA) [90] (Z-7020 XC7Z020) [78] (A5) [44] (M2S090) [91]
Logic Cells (K) 65.6 77 67 85 85 86.31
Memory (Mb) 4.86 4.46 4.42 4.9 3.97 4.488
DSP Slices 240 150 128 220 87 84
Cost (USD) 130 185 80 110 110 155
Dual-core ARM Dual-core ARM
Soft Core N/A N/A N/A ARM Cortex-M3
Cortex-A9 Cortex-A9

14000
advantages, there are other reasons that make co-design even
16nm
more interesting including, faster time-to-market, lower power 12000

consumption (when optimized for this), flexibility, and higher 10000

Number of DSP Slices


processing speeds, as typically hardware in these systems is 8000

used as an acceleration to software bottlenecks [97].


6000 20nm
Adopting the co-design methodology in essence is a mat-
4000 28nm 16nm
ter of partitioning the system into synthesizable hardware 20nm
40nm 28nm 28nm
and executable software blocks. This process depends on a 2000
65nm
90nm
strict criteria that is developed by the designer [98], [99]. 0
Virtex-4 Virtex-5 Virtex-6 Kintex-7 Virtex-7 Zynq-7000 Kintex Virtex Kintex Virtex
The authors in [100] and [101] discuss their partitioning Xilinx FPGA Series
UltraScale UltraScale UltraScale+ UltraScale+

methodologies and present the process of making the proper


architectural decisions. Common methods typically provide Fig. 3. Number of DSP Slices in Xilinx FPGAs [78]. The values on top of
useful information to the designer to help make the best the bars refer to the CMOS technology used.
decision of what to implement in hardware and what to keep
in software. This information can include possible speedups, for handling intensive signal processing tasks, as depicted in
communication overheads, data dependencies, and locality and Figure 3. This also can be seen from USRP E310 model, which
regularity of computations [101]. Examples of SoC boards incorporates a Xilinx Zynq SoC [78].
available in the market can be found in Table V. 3) Shortcomings: A downside of adopting SoCs for co-
2) Adoption: As mentioned in Section II, SDRs can be con- design is that their prices are generally higher, compared
sidered inherently hybrid or heterogeneous systems, thereby to previously mentioned design approaches, due to having
implying the need for both hardware and software blocks. multiple components on the same board, i.e., processor and
This is due to the fact that the control part is usually taken FPGA fabric. Other factors contributing to this include extra
care of by a general processor, and other functions, such memory and sophisticated interfaces. Another challenge of co-
as signal processing, by a specialized processor like DSPs, design is shared memory access, e.g., external DDR memory,
and sometimes are accelerated using dedicated hardware like between the processor and FPGA fabric. The study of [106]
FPGAs [102]. This design approach fits well with SDRs and shows that the number of memory read and write operations
can be fully utilized to meet certain requirements that pertain performed by a GPP is higher than that of FPGAs. This is
to their attractive features. For example, accelerating portions due to the fact that processors perform operation on registers,
of a block or moving it entirely to the FPGA fabric can while FPGAs operate on buffers. Since memory accesses add
help to push the processing time to the limit in order to up to the overall latency, this can cause a bottleneck to the
achieve a real-time performance for real-life deployment. In overall performance. In addition, the authors have developed
addition, through careful implementation of RTL optimiza- a methodology for predicting shared memory bandwidth by
tion techniques, the development of power efficient systems using a functionally-equivalent software. This enables the
for mobile and IoT applications would be possible. On the designers to be aware of any bottlenecks before implementing
other hand, running most of the MAC layer operations on a the entire co-design.
processor, or multi-processors, can be advantageous for easy
reconfiguration. Therefore, different partitioning schemes can E. Comparison
be adopted to meet the requirements of the application at hand. When we covered different design methodologies and hard-
It is worth noting that FPGA vendors, namely Intel [44] ware platforms for a wide selection of SDR platforms, we
and Xilinx [78], are widening their product base with more intended to compare them analytically one-on-one, using a
SoCs and Multi-processor SoCs (MPSoCs) [103], due to the cross-platform implementation of one of the wireless commu-
the growing demand for such devices. An example of an nication protocols. What was available in literature, instead,
SDR realized on an MPSoC is the work by [104]. In a white was a series of abstract comparisons using a set of bench-
paper, National Instruments (the company that owns USRP marks targeting High Performance Computing (HPC), and not
[17]) predicts that the future of SDRs is essentially a co- necessarily SDR applications. It is somewhat difficult to draw
design implementation [105], especially with the introduction conclusions from these numbers alone, since a performance
of FPGAs that are equipped with a large number of DSP slices comparison in the SDR field requires testing them in real-life.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 9

In Table VI, we provide a high-level comparison between and their results lead to the conclusion that GPUs and FPGAs
three major design approaches as a guideline for designers perform better than GPPs. It is obvious from above studies
towards choosing the method that best meets their application that trade-offs are to be expected when a particular design
specifications. In this comparison, we focus on the features methodology is adopted, hence careful analysis should be
that are important to SDR design. However, we do not make carried out beforehand. Other comparative studies include
assumptions on what the best approach is and believe it [111]–[113] with similar results and conclusions.
is the developer’s responsibility to make the best judgment
depending on the application area. Please note that in this table
IV. D EVELOPMENT T OOLS
we did not include GPUs, as they typically act as co-processors
to GPPs and their addition generally improves performance. As we mentioned in Section III-C, HLS is an abstract
We also did not include co-design since it combines GPPs method of designing hardware using a high-level programming
with FPGAs. language. Developers of FPGA and co-design based SDRs
As Table VI shows, while GPPs are easy to program and can benefit from HLS as it requires no prior experience with
extremely flexible, they lack the power to meet specifications hardware design. Unlike the rest of the development tools,
in real-time and are very inefficient in terms of power. To HLS tools share a common theme and offer similar features.
increase their performance, multiple cores with similar instruc- Thus, we first discuss HLS in this section. Next, we review
tion sets are included in the same GPP platform to exploit the common development tools that are typically used in the
parallelism and perform more operations per clock cycle. process of SDR design and implementation for different design
However, hardware replication (i.e., adding more cores to approaches.
GPPs) may not necessarily translate to a higher performance.
GPUs tackle this by offering the same control logic for
A. High Level Synthesis (HLS)
several functional units. The sequential portion of the code
runs on the GPP, which can be optimized on multi-core HLS has been a hot research topic for over a decade,
GPPs, while the computationally intensive portion runs on a with both academia and industry trying to make hardware
several-hundred-core GPU, where the cores operate in parallel. design more accessible by every developer [114]. HLS is
Another example of a customized processor is DSP, which the process of converting an algorithmic specification of the
performs significantly better than GPPs, while at the same design described by a high-level programming language to
time maintaining the ease-of-use feature that GPPs possess, an RTL implementation. HLS provides a new level of design
making them very attractive options. They are also more power abstraction through exploring the micro-architecture and any
efficient and better fit for signal processing applications. On hardware constraints. The resulting RTL is highly optimized,
the other hand, they are more expensive, which is the main in terms of power, throughput and latency, and reasonably
trade-off. Finally, FPGAs combine the flexibility of processors comparable to a hand-tuned code. Figure 4 depicts this pro-
and efficiency of hardware. FPGAs can achieve a high level cess. The major difference between RTL and C is the absence
of parallelism through dynamic reconfiguration, while yielding of timing constraints in the high-level model, which is merely
better power efficiency [42]. FPGAs are typically more suit- a behavioral description of the system with no details about the
able for fixed-point arithmetic, like signal processing tasks, underlying hardware. The second difference is the processing
but in the recent years their floating-point performance has architecture: while GPP architecture is fixed, the best possible
increased significantly [81], [107]. However, the designers are processing architecture is built by the compiler for FPGA
expected to know a lot more about the hardware, which is [115]. In addition, HLS can speed up the development cycle
sometimes a deterring feature. (time to market) to several weeks, down from several months
In a comparative analysis by [108], authors studied the [116]. This is because the task of producing an optimized RTL
performance and energy efficiency of GPUs and FPGAs using is handled by the HLS tool and the developer’s efforts are
a number of benchmarks in terms of targeted applications, focused on describing the system’s algorithmic description.
complexity, and data type. The authors concluded that GPUs In [117] the authors presented LegUP, an open-source HLS
perform better for streaming applications, whereas FPGAs tool. This tool is capable of profiling a code to identify
are more suitable for applications that employ intensive FFT frequently executed sections of the code for hardware acceler-
computations, due to their ability to handle non-sequential ation (i.e., moving them to the FPGA fabric). The authors in
memory accesses in a faster and more energy efficient manner. [118] survey HLS compilers and their capability to provide an
Similarly, in [42], the authors review and report the sustainable accurate estimation of functional area and timing, comparable
performance and energy efficiency for different applications. to results from hand-tuned hardware designs. In an effort to
One of their findings related to SDRs is that FPGAs should be help the developer make the right decision to pick an HLS tool
used for signal processing without floating point, confirming that yields the best results for their application, the authors
aforementioned results. In addition, the authors in [109] report in [116] present a study where they compared three of the
that GPUs are ten times faster than FPGAs with regards to FFT industry tools, namely Vivado HLS [86], Intel FPGA SDK for
processing, while authors in [81] demonstrate that the power OpenCL [119], and MaxCompiler [120], through developing
efficiency of FPGAs is always better than GPUs for matrix LDPC decoders, which are often used as error correcting
operations. Finally, authors in [110] compare GPPs, GPUs, blocks in SDRs. All three tools successfully synthesized LDPC
and FPGAs through the implementation of LDPC decoders, decoders and implemented them on Intel [44] and Xilinx [78]
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 10

TABLE VI
C OMPARISON OF SDR D ESIGN A PPROACHES

GPP DSP FPGA


Computation Fixed Arithmetic Engines Fixed Arithmetic Engines User Configurable Logic
Execution Sequential Partially Parallel Highly Parallel
Throughput Low Medium High
Data Rate Low Medium High
Data Width Limited by Bus Width Limited by Bus Width High
Programmability Easy Easy Moderate
Complex Algorithms Easy Easy Moderate
I/O Dedicated Ports Dedicated Ports User Configurable Ports
Cost Moderate Low Moderate
Power Efficiency Low Moderate High
Form Factor Large Medium Small

C / C++ / MATLAB / openCL High-Level Language While some of them are vendor-specific, some tools work with
a variety of FPGA boards. Examples mentioned in the table all
Pipelining / dataflow / unrolling HLS Tool
provide a set of area and timing optimizations such as resource
Verilog / VHDL Synthesizable HDL Code sharing, scheduling, and pipelining. However, not all of them
Vivado / Quartus RTL Synthesis and Place-and-Route Tool are capable of generating testbenches for the design.
FPGA Implementation Bit Stream
B. Tools
Fig. 4. HLS design flow. In this section we review the existing software tools for
SDR development. For each design methodology, we discuss
a compatible development tool and list its features. We also
FPGA boards. The difference, however, was in the logic uti- provide an overall comparison between them to highlight the
lization and performance. Similarly, the authors in [121], com- differences.
pare the same aforementioned list of compilers quantitatively 1) MATLAB and Simulink: Most designers start with mod-
and qualitatively using several financial engineering problems elling and simulating the system using Mathworks MATLAB
(e.g., Monte Carlo-based Option Pricing) and compare the [85] and Simulink [132]. Through the availability of a wide
performance of several FPGA boards. Their results show range of built-in functions and toolboxes, especially for signal
that both Intel FPGA SDK for OpenCL and MaxCompiler processing and communication, developing and testing appli-
performed better than Vivado HLS due to their ability to cations became very common and widely adopted. However,
extract parallelism more effectively. In [122], the authors in order to use these models for different platforms, developers
comprehensively review recent HLS tools and provide through would need to use MATLAB Coder [133] and Simulink Coder
a careful analysis a methodology based on C benchmarks to [134] to generate C/C++ codes. The generated codes can
compare some of these tools and their optimization features. be used with Embedded Coder [135] to optimize them and
The various benchmarks implemented demonstrate that some generate software interfaces with AXI drivers for the sake of
tools are better suited for certain applications than the rest, running on embedded processors and microprocessors, like the
with no specific tool dominating the HLS field. The authors dual ARM cortex A9 MPcore [96] on the ZedBoard [136].
also show that open-source HLS tools such as LegUP [123] Alternatively, developers can use the HDL Coder [84] to
can be as effective as their commercial counterparts. Other generate synthesizable RTL (Verilog or VHDL) code to be
surveys and analyses include [124], [97], [125], which focused implemented on FPGAs or ASICs. It also has support for
on open-source tools, and [126], which studied some of the Xilinx [78] and Intel [44] SoC devices by providing some
trade-offs of HLS-generated designs and how reliable they information and optimizations pertaining to resource utiliza-
are when errors are injected. All of the studies above prove tion and distributed pipelining. Figure 5 shows the design
the feasibility and reliability of HLS tools to generate RTL flow for SoC platforms that the aforementioned tools offer
codes, despite having different development and optimization and how they are connected. Examples of using MATLAB
solutions. and Simulink to develop an SDR are the works by [137] and
Examples of HLS tools include Xilinx Vivado HLS [86] and [138], where the authors used the RTL-SDR very low-cost
SDSoC [127], Intel FPGA SDK for OpenCL [119], Cadence SDR dongle [139] (∼ $20) with a desktop computer to design
Stratus High-level Synthesis (combining Cadence C-to-Silicon an academic curriculum for teaching DSP and communications
and Forte Cynthesizer) [128], Synopsys Synphony C Compiler theory.
[129], Maxeler MaxCompiler [120], MATLAB HDL Coder 2) Vivado HLS and SDSoC: Xilinx Vivado HLS [86] is a
[84], and LegUP [123], which unlike the rest of the tools is design environment for high-level synthesis. This tool offers a
vendor-independent (works with all types of FPGA boards, variety of features to tweak and improve the RTL netlist output
such as Xilinx [78], Intel [44], Lattice [90], and Microsemi that is compatible and optimized for Xilinx FPGA boards.
[91]). It accepts input specifications described in several languages
Table VII presents a summary of commercial HLS tools. (e.g., C, C++, SystemC, and OpenCL), and generates hardware
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 11

TABLE VII
HLS T OOLS

Xilinx Vivado Intel FPGA Cadence Stratus Synopsys Synphony Maxeler


[86] [119] [128] [129] [120]
HLS SDK for OpenCL High-level Synthesis C Compiler MaxCompiler
Input C/C++/SystemC C/C++/SystemC C/C++/SystemC C/C++ MaxJ
Output VHDL/Verilog/SystemC VHDL/Verilog VHDL/Verilog VHDL/Verilog/SystemC VHDL
Testbench Yes No Yes Yes No
Optimizations Yes Yes Yes Yes Yes
Compatibility Xilinx FPGA Intel FPGA All All All

TABLE VIII
D EVELOPMENT T OOLS AND P LATFORMS

MATLAB & Simulink [130] Vivado HLS & SDSoC [78] LegUP [123] GNU Radio [40] LabView [131] CUDA [48]
Input MATLAB/Graphical C/C++/SystemC C Graphical/Python/C++ Graphical C/C++/Fortran/Python
Output MATLAB/C++/RTL C/RTL C/RTL C/RTL C/RTL Machine Code
Platform GPP/GPU/DSP/FPGA GPP/FPGA GPP/FPGA GPP/GPU/DSP/FPGA GPP/GPU/DSP/FPGA GPU
Licence commercial commercial open-source open-source commercial commercial

such as decoders, demodulators, and filters. It is also capable


of connecting these blocks and managing data transfer in a
MATLAB/Simulink reliable fashion. In addition, it supports USRP systems [17].
Algorithm Design & Modeling
One of the attractive features of GNU Radio is the ability to
HDL Coder define and add new blocks. This can be done via programming
in C++ or Python. An example of using GNU Radio is the
Embedded System Tool Integration Simulink Coder + Embedded Coder
work by [142], where the author uses it with a USRP to realize
different types of transceivers such as TDMA and CSMA, and
FPGA Bitstream SW Code
showcases some of its capabilities. Similarly, the authors in
Xilinx/Intel SoC Device [143] successfully achieve real-time communication between
two computers using USRP [17] and RTL-SDR [139].
Fig. 5. Mathworks SoC design flow [130].
4) LabVIEW: A widely used tool from National Instru-
ments [131] that offers a visual programming environment for
test, automation and control applications used by both industry
modules in Verilog or VHDL. Developers have several options and academia. It is similar to GNU Radio and Simulink,
to optimize the solution in terms of area and timing through the where the design can be realized schematically by connecting
use of directives (guidelines for the optimization process) and a chain of various blocks together, each of which performs
pragmas for RTL optimization. These optimizations include a certain function. It also offers complete support for USRP
loop unrolling, loop pipelining, and operation chaining. SD- [17] enabling rapid prototyping for communications systems.
SoC [127] is another tool by Xilinx [78]. The major difference Designing different blocks of the system can be achieved
between the two tools is that the latter has the capability using high-level languages such as C or MATLAB, or using
to provide solutions for SoCs. SDSoC is built on top of a graphical dataflow. An SDR platform development using
Vivado HLS and has the same C-to-RTL conversion capability. LabVIEW is the work by [144], where the author describes a
The main advantage of using SDSoC is that it automatically wireless communication course design that incorporates USRP
generates data movers, which are responsible for transferring and LabVIEW, due to their ease of use, to help teach students
data between the software on the processor and the hardware basic concepts. Similarly, in [145] the authors designed an
on the FPGA. SDR platform, namely FRAMED-SOFT, that includes two
A similar tool to SDSoC that is open-source is LegUP types of USRPs and is intended for an academic environment.
[123]. It was developed at the University of Toronto, as part 5) CUDA: Developed by NVIDIA, it issues and man-
of an academic research effort to design an HLS tool that is ages computing platforms and programming models for data-
capable of taking in a C code as an input and providing three parallel computing on GPUs [48]. Developers typically use
possible outputs: a synthesizable RTL code for an FPGA, a CUDA when GPUs are part of the processing architecture as
pure software executable, and a hardware/software co-design co-processors, and want to take full advantage of their power
solution for a SoC. by speeding up applications. As discussed in Section III-A2,
3) GNU Radio: It is an open-source software development in order to identify application components that should run
toolkit that provides signal processing blocks to implement be on GPP and the parts that should be accelerated by the
SDRs [40], [140]. It runs on desktop or laptop computers, GPU, one needs to look at the tasks at hand. Programming
and with the addition of simple hardware such as USRP B200 languages that can be used in CUDA include C, C++, Python,
[17], can build a basic SDR. It is often used by academia Fortran, and MATLAB [85]. The toolkit includes, in addition
and the research community for simulation as well as quick to the rich library for GPU-related acceleration functions, a
setup of SDR platforms. Similar to System Generator tool compiler, development tools, and CUDA runtime, to develop
[141] and Simulink [132], it includes different kinds of blocks applications and optimize them for systems that incorporate
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 12

a Xilinx Virtex II Pro P30 FPGA, which has two PowerPC


USB
Controller 405 cores [78]. The main advantage of this platform is the
degree of flexibility that it offers. Developers have the op-
ADC ADC
tion to implement communication standards in three different
Daughterboard Daughterboard
methods: (i) they can fit the entire design onto the FPGA
Rx ADC ADC Rx
and assign few tasks to the host’s GPP (full hardware), (ii) a
FPGA
Board full software implementation, where the design is implemented
entirely on the GPP with minimal FPGA involvement, and (iii)
Daughterboard DAC DAC Daughterboard
a hybrid hardware/software co-design implementation, where
Tx DAC DAC Tx the developer can partition the design in any way that fits their
criteria.
LimeSDR. Resembling the general basic architecture of
Fig. 6. USRP board architecture. RF daughterboard selection depends on the USRP (e.g., USRP B200 [17]), Lime Microsystems [146]
application specifications in terms of frequency coverage.
develops a series of SDRs that is based on Lime Microsys-
tems latest generation of field programmable RF transceiver
GPUs. (FPRF) technology, in addition to an Intel FPGA [44] and a
microcontroller. It is then connected to a computer via USB
V. P LATFORMS 3.0. The SDR platform has the responsibility of delivering
the wireless data, while the GPP (computer) has the task
In this section, we list the different types of SDRs from
of processing the incoming signals and generating the data
the architecture and design point of view. We analyze them,
to be transmitted by the SDR. The GPP in this case is the
examine their strengths and shortcomings, and discuss their
source of computing power for baseband. LimeSDR supports
impact on SDR development.
signal bandwidth ∼30-60MHz. Developers of LimeSDR also
developed LimeSuite software, which is used to model SDRs.
A. GPP-based This tool, source code, firmware, and schematics, are open-
USRP N-Series. Universal Software Radio Peripheral source.
(USRP) is the most common SDR platform known to the Ziria. It is a programming platform that uses a domain-
developers’ community [17]. The cost of this platform is specific language (DSL) named Ziria, and an optimizing
around $4000-5000. It provides a hardware platform for the compiler [147]. The application of Ziria is the implementation
GNU Radio Project [140]. There are two generations available: of the physical layer (PHY) of wireless protocols. Ziria has a
USRP1 and USRP2. USRP1 (released in 2004) is connected to 2-layer design:
a generic computer through USB, with the addition of a small – A lower layer that is an imperative language, which is a
FPGA. The FPGA board has two roles: routing information, mixture of C and MATLAB [85] code, with features of
and limited signal processing. This generation was capable of the two languages carefully chosen to guarantee efficient
supporting a ∼ 3MHz bandwidth due to USB 2.0 limitation. compilation.
The second generation, USRP2, was released in 2008, and it – A higher layer, which is the language used to specify and
supports 25MHz bandwidth by utilizing gigabit Ethernet. It stage stream processors.
includes a Xilinx Spartan 3 FPGA [78] for local processing The Ziria optimizing compiler consists of two parts: the
operations. frontend and the backend. The frontend parses the Ziria
USRP, in general, is a board with ADC and DAC, an RF code, putting it in an abstract representation, then applying
front end, a PC host interface, and an FPGA. This board several optimizations on it. The backend compiles the resulting
consists of a motherboard and typically four daughterboards optimized code into an optimized low-level execution model.
(two transmitters Tx, and two receivers Rx), as depicted The particular benefits of this platform are as follows: The
in Figure 6. The daughterboards process analog operations first one is easy and dynamic reconfiguration, due to the
like filtering and up/down conversions. They are modular so dynamic staging of the control graph. This is unlike GNU
they can deal with applications operating up to 6GHz. The Radio [40] C++ templates that allow limited reconfigurability.
FPGA board, depending on the USRP series, handles a few In addition, its code optimization can operate on data-flow
signal processing operations, and the majority of operations components, and often yields a faster execution on GPPs (e.g.,
are offloaded to the connected host system. USRP platforms through the use of LUTs). In general, a Ziria code is very
can be easily set up to use. However, while their performance concise and easy to use. For example, an implementation of
is suitable for research experiments and quick prototyping, a WiFi scrambler in Ziria needs thirteen lines only. Ziria is
these platforms do not necessarily meet the requirements of an interesting research based on data-flow languages, which
communication standards. In fact, the minimum bandwidth are typically used in embedded systems. It also builds upon
of the RF, PC host, or FPGA component used affects the popular functional reactive programming framework.
throughput and timing characteristics of the platform. Sora. Sora is a fully programmable software radio platform
KUAR. Another platform that is similar to USRP is the on PC architecture. It requires C programming on multi-
Kansas University Agile Radio (KUAR) platform [34]. The core GPP, and yields high performance that includes high
basic architecture is composed of a generic computer and processing speed and low latency. The Sora platform uses
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 13

the Ziria language discussed above to write high-level SDR


Antenna FPGA CPU
descriptions, and is tested for real-time operation [18]. Unlike
WARP [21] (which we will explain in Section V-E), Sora can text
Packet
text
Buffer Transfer Buffer GPU
accommodate various RF front ends. Bus

Since PC hardware is not intended for signal processing WARP GPU Server
of wireless protocols, their performance can be limited. We
discussed some of the limitations of GPPs in the context of
Fig. 7. GPU-accelerated SDR Platform using WARP [52]. The GPU server
SDRs in Section III-A. These limitations were the motivation is used for baseband processing, while WARP is used for radio control and
behind the development of Sora. The overall setup of Sora interface. Offloading signal processing tasks to the GPU has significantly
includes a soft-radio stack that combines a multi-core GPP improved the overall performance.
and a radio control board, which consists of a Xilinx Virtex-
5 FPGA [78], PCIe-x8 interface, and DDR2 SDRAM. Sora
communication systems are in a constant state of development
uses both hardware and software techniques to address the
and they increase in complexity and sophistication. This calls
challenges of using PC architectures for high speed SDR. It is
for higher computational performance and, along with it, a
the first SDR platform that enables users to develop high speed
higher level of flexibility. In order to realize Iris on Zynq
wireless implementations entirely in software on a standard PC
platform, the components are first translated into C++ using
architecture.
Cmake tools and then they are ported to the platform. HLS
In Sora, new techniques are proposed for efficient PHY
tools, like Xilinx HLS [86], can be used to accelerate parts of
implementation. Some of these techniques include: (i) ex-
the design that are considered to be the bottlenecks. This is
ploiting large high-speed cache memory to minimize memory
done by running system profiling on the software components,
accesses, (ii) extensive use of LUTs, where they would trade
and one of these profilers is Linux Perf. Acceleration, in
memory for calculation and still well fit into L2 cache, (iii)
particular, refers to running parts of the software (after re-
exploiting data parallelism in PHY, and (iv) utilizing wide-
writing it in RTL) on the FPGA fabric in order to achieve
vector SIMD extension in GPP. One of the main novelties
higher performance.
of Sora is efficiently partitioning and scheduling the PHY
processing across cores in GPP. The second innovation is core
dedication for real-time support. This was accomplished by B. GPU-based
exclusively allocating enough cores for SDR processing in
multi-core systems. They showcased its effectiveness through WiMAX SDR. The authors in [150] built a GPU-based
SoftWiFi, which is a full implementation of IEEE 802.11a/b/g platform to realize a WiMAX system. In their study, they
PHY and CSMA MAC with 9000 lines of C code and real-time also compared the performance of GeForce 9800GTX GPU
performance. They also successfully implemented a 3GPP LTE [45] against a TMS320C6416 DSP [59] via implementing
Physical Uplink Shared Channel (PUSCH), or Soft-LTE, with the Viterbi decoder algorithm. The results indicate that the
5000 lines of C code and a peak rate of 43.8Mbps with throughput of the GPU is 181.6Mbps, with a considerable
20MHz. difference compared to the DSP’s 2.07Mbps.
Some of the critiques to Sora is that its FPGA is not
programmable and its capabilities are not fully utilized. In OFDM for WiFi Uplink SDR. In [52], the authors used the
addition, the authors do not share the details of its internal WARP framework [21] as a basis to realize their NVIDIA
routines. Finally, Sora only works on GPPs, with no clear GPU-based SDR platform. They utilized the inherent paral-
mapping to DSPs. lelism of GPUs, and with the help of CUDA [48], they were
Iris. It is a cross-platform SDR, developed to support highly able to achieve real-time performance on WARP. They used
reconfigurable radio networks [148]. It is built using a plu- this platform to design and implement a Single Input Single
gin architecture, namely components, to achieve modularity. Output (SISO) OFDM system for WiFi uplink communication.
These components process data streams and perform different Figure 7 depicts the architecture of this enhanced (accelerated)
operations on them. An engine is used to run, load and WARP SDR. For this platform, they used: (i) a WARP version
maintain the components. Similar to System Generator [141] 3, which consists of a Xilinx Virtex-6 FPGA [78] for radio
and Simulink [132] tools, Iris engine can be used to link control and interface, and (ii) a GPU server, which consists of
components together to build a complete radio system. XML an Intel i7-3930K six-core 3.2GHz CPU [44] for transceiver
is used to specify the components used in the program, their configuration, and four NVIDIA GTX TITAN graphic cards
parameters, and how they should be linked. The main features [45] for baseband processing. Each TITAN is comprised of
of the Iris architecture include: (i) runtime configurability, (ii) 2880 core Kepler GPU running at 889MHz. The accelerated
support for the entire network stack (all layers), not just the WARP achieves less than 3ms latency and higher than 50Mbps
PHY layer, (iii) support for advanced processing platforms over-the-air throughput.
including FPGAs.
There are multiple studies on using Iris. An example is the Signal Detection SDR. In [46] the authors designed and
work in [149], where the authors discuss the process through studied real-time signal detection using an SDR platform
which they were able to implement Iris on Xilinx Zynq composed of a laptop computer with an Intel Xeon E3-1535M
SoC [78]. The motivation behind this work is the fact that processor [44] and an NVIDIA Quadro M4000M GPU [45].
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 14

SDRAM SDRAM SDRAM SDRAM


developers must learn the architecture in order to implement
SDRs.
ARM Ardbeg. In [156], a commercial prototype based on
Streaming Memory System
SODA architecture was presented. The main enhancements
Host Stream
of Ardbeg compared to SODA are optimized SIMD design,
Processor Controller
Stream Register File
Network
Interface
VLIW support, and a few special ASIC accelerators, which are
dedicated to certain algorithms such as Turbo encoder/decoder,
block floating point and arithmetic operations.
Microcontroller

ALU Cluster 0

ALU Cluster 1

ALU Cluster 2

ALU Cluster 3

ALU Cluster 4

ALU Cluster 5

ALU Cluster 6

ALU Cluster 7
Atomix. Typically, programmers need to do one of three
tasks to the software workflow of DSPs: tapping into a signal
processing chain, tweaking a block, or inserting/deleting a
block. To simplify these tasks, modularity is crucial. How-
Fig. 8. Imagine Processor Architecture. This platform employs a real-time ever, designing a modular software for DSPs is challeng-
stream processor for baseband processing. ing considering the particular requirements, such as latency
sensitivity and high throughput, that must be supported. The
main challenge is the need for programmers to define and
For 1000ms long samples, this design achieves around 75% manage everything manually and explicitly. In other words,
reduction in parallel processing time, compared to GPPs. it is necessary to use bare metal features, like moving data
across cores, managing SRAMs, and parallelizing software.
C. DSP-based In order to address these concers, Atomix [19] describes
Imagine Processor-based SDR. Authors of [151] proposed the software in blocks, named atoms. An atom can be used
one of the earliest SDR solutions that is fully based on to implement any operation. This can be signal processing
a DSP. This SDR employs the Imagine stream processor, or system handling. Atoms can be used for realizing blocks,
developed at Stanford University in 2001 [152]. The Stan- flowgraphs, and states in wireless stacks. In addition, simple
ford Imagine project aimed at providing a signal and image control flow makes atoms composable. The easy modification
processor that was C programmable and was able to match feature mentioned above is due to declarative language. It
the high performance and density of an ASIC. It is based on is important to note that an Atomix signal processing block
stream processing [153]–[155], which is similar to dataflow implements a fixed algorithmic function, operates on fixed data
programming in exploiting data parallelism and is suitable for lengths, is associated with a specific processor type, and uses
signal processing applications. This work paved the way to only the memory buffers passed to it during invocation. The
the development of GPUs. blocks will run fixed sets of instructions executing uninter-
As Figure 8 shows, Imagine processor uses VLIW-based rupted on fixed resources using fixed memories. This results
ALU clusters arranged in a SIMD fashion to handle data in having fixed execution times. Atoms can also be composed
streams. At the middle of the architecture, there is the Stream to build larger atoms. Using Atomix, radio software can be
Register File, which stores data from other components, built entirely out of atoms and is easily modifiable. Atomix-
thereby minimizing memory accesses. The performance of this based radio also meets throughput and latency requirements.
platform has been evaluated through implementing complex Developers define the atoms in C. Then, the blocks are
algorithms relevant to W-CDMA cellular system. The results composed into flowgraphs and states using Atomix interface.
show higher performance compared to TI C67 DSP [59]: The next step involves developing a parallelized schedule and
channel estimation and detection are improved by 48x and resource assignment in order to meet latency and throughput
42x, respectively. requirements. The software is then compiled by Atomix into a
SODA. In [67], the authors introduce the Signal-processing low-level C. The compiled code, along with Atomix libraries,
On-Demand Architecture (SODA), which is an SDR platform will be compiled into a binary. Atomix was used to build
based on multi-core DSPs. It offers full programmability and IEEE 802.11a receiver only, namely Atomix11a. Evaluation
targets various radio standards. The SODA design achieves of Atomix11a shows that it exceeds receiver sensitivity re-
high performance, energy efficiency, and programmability. quirements, operates in indoor environments robustly, has low
This is attributed to a combination of features that include processing latency, and atoms have low timing variability.
SIMD parallelism and hardware support for 16bit computa- Although the power consumption reported is 7W, it does not
tions, since most algorithms operate on small values. The basic include the power consumed by the front end, USRP2, which
processing element is an asymmetric processor consisting of a is about 14W. A shortcoming of Atomix is that it is intended
scalar and SIMD pipeline, and a set of distributed scratchpad only for synthesis on a variety of DSPs, not for GPPs, GPUs,
memories that are fully managed in software. SODA is a multi- or FPGAs.
core architecture, with one ARM Cortex-M3 processor [96] BeagleBoard-X15. A collaborative project between Texas
for control purposes and multiple processing elements for DSP Instruments [59], Digi-Key [157], and Newark element14
operations. Using four processing elements can meet the com- [158], BeagleBoard is an open-source SoC computer [159].
putational requirements of 802.11a and W-CDMA. Compared It features TI Sitara AM5728 [59], which includes two C66x
to WARP and Sora, as a single-chip implementation, SODA DSPs [59], two ARM Cortex-A15, two ARM M4 cores [96],
is more appropriate for embedded scenarios. As with WARP, and two PowerVR SGX544 GPUs [160]. With its relatively
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 15

low price (∼$270), the DSPs along with the co-processors and experimented with a set of protocol changes. Airblue
make a powerful platform for implementing standalone SDRs. demonstrated that it was easily modifiable and still meets
An example of using the BeagleBoard (an older model but timing requirements. Airblue achieves a higher speed than
same general architecture) is the work by [161], where it was Sora for cross-layer communication (between MAC and PHY
used to implement the Public Safety Cognitive Radio (PSCR) layers), which typically has the latency requirement of a few
[162] through GNU Radio [40]. PSCR is FM radio-based and microseconds.
was developed by the Center for Wireless Telecommunications
(CWT) at Virginia Tech. E. Hybrid Design
USRP Embedded (E) Series. This is the embedded version
of USRP, where they incorporate Xilinx SoCs [78] to develop
D. FPGA-based
standalone SDRs. USRP E310, for example, is based on Xilinx
Airblue. This work [20] introduces an FPGA-based SDR Zynq 7020, which yields high performance and is energy
platform for PHY and MAC layers. Airblue is a method to efficient. This USRP is stand-alone and suitable for mobile
implement radios on FPGA to achieve configurability. This applications. It supports frequency range 70MHz to 6GHz and
is done using an HDL language called Bluespec, through features a 2 × 2 MIMO RF front end.
which all hardware blocks of a radio transceiver are written. WARP. The Wireless open-Access Research Platform is
In Bluespec, a developer describes the execution semantics of another example of a co-design specifically developed to
the design through Term Rewriting Systems (TRS). TRS is prototype wireless protocols [21]. It is programmable and
a computational paradigm based on the repeated application scalable, however, parts of the device are implemented in
of simplification rules [163]. The next step is compiling the ASIC, which makes it less flexible. WARP v3 contains a
TRS into RTL codes. TRS has the capability of generating Xilinx Virtex-6 FPGA [78], which includes two MicroBlaze
efficient hardware designs. The main difference between a processors and a Gigabit Ethernet peripheral. It requires the
Verilog interface and a Bluespec interface is that the former use of Xilinx Embedded Development Kit (EDK) [78] to
is merely a collection of wires with no semantic meaning, design SDRs. It is also open-source, with 802.11 reference
while the latter includes handshake signals for blocks com- design made available to the research community. WARP
munication. Therefore, Bluespec facilitates latency insensitive has become widely used in the research community due to
designs which are essential to system construction via modular its effectiveness in implementing various wireless protocols
composition. Using Airblue, developers may find the need to (e.g., 802.11 g/n PHY and MAC) and achieving real-time
modify the building blocks, or modules, to add new features, performance. It also supports MIMO since it has multiple RF
make algorithmic modifications, tune the performance to meet interfaces.
throughput or timing requirements, or make FPGA-specific PSoC 5LP. Developed by Cypress Semiconductor [164],
optimizations. this SoC is composed of an ARM Cortex-M3 GPP [96], an
In order to reach modular refinements, the design of a analog system, and a digital system that uses Universal Digital
configurable radio must have two design properties, latency- Blocks (UDBs). A UDB is a programmable digital block
insensitivity and data-driven control. In addition to flexibility, based on Programmable Logic Devices (PLDs) for realizing
Airblue meets tight latency requirements by implementing synchronous state machines, i.e., they resemble an FPGA, but
both PHY and the MAC on FPGA and connecting them are smaller. All parts are reconfigurable and programmable
with streaming interfaces, which allows data to be processed by using the PSoC Creator software IDE, which includes a
as soon as it is ready. Another advantage of Airblue is the graphical design editor. A few developers have used it to build
implementation of highly reusable designs through parameter- standalone SDRs with simplicity [165]. In addition to its low
izations (i.e., same RTL block can be instantiated several times price ($15), PSoC is a good candidate to quickly prototype
using different parameter values). Also, several techniques that and get familiar with SDRs.
permit designers to reuse existing designs to implement run- Zynq-based SDR. The authors in [166] developed an
time parameterized designs are developed. SDR using the Xilinx Zynq ZC706 and ZedBoard SoCs to
Airblue essentially is a HLS platform that offers modular implement IEEE 802.11a. For their RF front end, they used
refinement, where modifying one module does not affect Analog Devices FMComm3 AD9361 [69]. They used two
the rest of the system, similar to the approach adopted by Zynq boards to compare their performances. They both in-
Atomix. This is why, when compared to WARP, Airblue clude dual-core ARM Cortex-A9 [96], with ZC706 containing
is more flexible, since WARP was not designed with the Kintex-7 and ZedBoard containing Artix-7 FPGAs. In addition
aforementioned principles in mind. The authors in [20] also to HDL Coder [84] and Embedded Coder [135] to generate
studied HLS tools and compared them to Bluespec. They RTL and software executable, they used Mathworks Simulink
found them to be more effective than Bluespec in early stages [132] to generate the model. They reported an average of 2W
of the design; nevertheless, Bluespec is capable of yielding power consumption for Tx and Rx, compared to 5 to 8W
a more optimized final RTL. They argued that HLS will be reported by Atomix [19].
of limited use in final FPGA implementations, especially for
high performance blocks required by future wireless protocols, F. Comparison
hence Bluespec is the language of choice for Airblue. For Table IX compares the SDR platforms discussed above in
performance evaluation, the authors have implemented 802.11 an effort to provide a reference guide for developers. SDR
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 16

platforms are compared according to the following criteria: In [167], the authors introduce a method based on RS-
– Programmablity: As protocols evolve, a platform is re- 422 serial communication and High Level Data Link Control
programmed by simply adding or exchanging parts of (HDLC) protocol to update DSP and FPGA systems. Their
the design. For example, when 3GPP issues an update method is fast, stable and easy to implement. The authors in
for LTE standard, instead of replacing the entire radio, [171] show a new method for remotely updating Xilinx FPGAs
an SDR is reprogrammed to accommodate the upgrade. [78] by storing the new design or code in Serial Peripheral
– Flexibility: A platform is capable of handling future wire- Interface (SPI) flash memories. They utilized the Xilinx Quick
less protocols as requirements become more demanding. Boot application, along with the KC705 Evaluation Kit from
This means an SDR should be able to support tighter Xilinx, in order to develop their method. However, it is not
timing requirements for next generation protocols. always practical or feasible to use a download cable in order
– Portability: A platform is standalone and readily de- to update the system. The authors in [170] tackle this issue
ployed. This is generally a requirement for mobile and and the problem of downtime during an update or in the
IoT applications. case of a failure. With this method, there exists two images,
– Modularity: A design’s components are separated and re- namely a factory mode configuration image, which acts as
combined without internal module changes. For example, the baseline, and an application mode configuration image,
a developer may need to exchange a Viterbi decoder with which is used for some specific functions. They show the
a Turbo decoder without having to worry about the rest capability of updating and running a new application mode
of the design. (design) image, in addition to rolling back into the factory
– Computing Power: Since performance depends on the mode image when no application mode image is available.
protocol used, we merely evaluate the capability of plat- Others have focused their efforts on improving the security
forms to implement a given protocol, and not be limited of remote updates, such as [172] and [173]. While there
to a subset of the protocol (e.g., implementing the Viterbi exists some solutions to the remote update process, a few
decoder only). challenges including partial reconfiguration of FPGAs are not
– Energy Efficiency: Similar to computing power, we evalu- yet resolved.
ate the capability to implement protocols efficiently, while
keeping power consumption at a minimum. B. Centralized Algorithms and Network Slicing
– Cost: The cost of the hardware equipment. If a platform
requires a PC, its cost is not included. Also, when the In order to simplify the design and provisioning of large-
price of the entire setup is unknown, the price of the scale networks, software-defined networking (SDN) has been
basic platform (not including RF front end or interfaces) proposed to centralized network control. In this architecture, a
is shown. controller (or multiple controllers) communicate with network
devices (data plane) to collect their status information and
configure their operation. Recent studies show that wireless
VI. R ELATED R ESEARCH AND P OTENTIAL S OLUTIONS networks can significantly benefit from the central network
Although in the previous sections we have highlighted some view established in the controller to design more efficient
of the challenges of building SDRs, in this section, we present network control algorithms such as channel assignment and
additional research areas that are still being faced by the mobility control.
research and development community. These challenges are Centralized control of network resources is the enabler
technical or practical. of network slicing, which refers to the abstraction, slicing
and sharing of network resources. Three levels of slicing is
applicable to wireless networks: (i) spectrum (a.k.a., link virtu-
A. Remote System Update alization): requires frequency, time or space multiplexing. (ii)
One of the main features and motivations of SDRs is their infrastructure: the slicing of physical devices such as BSs. (iii)
reconfigurability and flexibility. In order to take full advantage network: this refers to the slicing of the network infrastructure.
of this, the process of updating a SDR platform should be Compared to wired networks, slicing the resources of wireless
quick and easy. Remote stand alone SDRs are usually FPGA networks is significantly more challenging due to the variable
and DSP based, with more FPGAs being used. Hence, most of nature of wireless channel. Meanwhile, since SDRs enable the
the research has been focused on remote updates of FPGAs. slicing of resources both at the spectrum and infrastructure
Since FPGAs are volatile, which are configured during system level, they can be used to augment SDNs towards a fine-
power-on through their flash memories, update is traditionally grain allocation of resources. For example, compared to ASIC-
done using Joint Test Action Group (JTAG) method [167]. based transceivers, SDRs provide a significantly higher level
However, with more SDRs deployed and adopted in wireless of control over the parameters of physical and MAC layer.
and cellular networks, remote update becomes necessary and a It should be noted that when centralized network control is
challenge. With remote over the air (OTA) updates, it becomes employed, the delay of communication between the controller
possible to send patches to current designs, or even upload an and SDR platforms as well as the delay of programming and
entirely new and improved design to mobile networks and BSs applying new configurations should be bounded within the
[168], [169]. Some of the challenges faced by the research specification requirements. For example, in a dense and mobile
community include speed, reliability, cost, and security [170]. environment, the controller may employ a centralized channel
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 17

TABLE IX
C OMPARISON OF E XISTING SDR P LATFORMS

Programm- Flex- Port- Modul- Computing Energy Cost


Soft Core FPGA
ability ability ability arity Power Efficiency (USD)
Imagine Stream
Imagine-based [151] X × × × Medium Low N/A N/A
Processor
Xilinx ∼ 4 − 5K
USRP X300 [17] X X × X High Low PC
Kintex-7 Total
Dual-core ARM Xilinx ∼ 3K
USRP E310 [17] X X X X High High
Cortex-A9 Artix-4 Total
Pc + Xilinx
KUAR [34] X × × × Medium Low N/A
2× PowerPC cores Virtex II Pro
Intel ∼ 300
LimeSDR [146] X X × X High Low PC
Cyclone IV Board Only
Depends
Ziria [147] X X × × High Low PC N/A
on App
Xilinx ∼ 900
Sora [18] X X × × High Low PC
Virtex-5 Board Only
ARM Cortex-M3
SODA [67] X X X × High High N/A N/A
+ Processing Elements
Dual-core ARM Xilinx ∼ 1.2K
Iris [148] X X X X High High
Cortex-A9 Kintex-4 Total
∼ 200
Atomix [19] X X X X High Medium TI 6670 DSP N/A
DSP Only
2× TI C66x DSPs + ∼ 270
BeagleBoard-X15 [159] X X X X High Medium N/A
2× ARM Cortex-A15 & 2× M4 Board Only
Intel ∼ 1.3K
Airblue [20] X X X X High High N/A
Cyclone IV Board Only
2× Xilinx Xilinx ∼ 7K
WARP v3 [21] X × X X High High
MicroBlaze cores Virtex-6 Total
10
PSoC 5LP [164] X × X X Low High ARM Cortex-M3 N/A
Board Only
Dual-core ARM Xilinx ∼ 1.2K
Zynq-based [166] X X X X High High
Cortex-A9 Kintex-4 Total

and power control algorithm to instruct the nodes adjust their these operations are offloaded to a general computing platform.
channels based on the decisions made centrally. In this case, Such an architecture reduces the load of edge devices and BSs
it is essential to ensure the delay of sending configuration as can benefit from powerful processors to implement complex
messages to multiple SDR platforms meets the requirements signal processing operations. For example, when multiple
of central algorithm. Furthermore, it is essential to ensure all SDRs are connected to a computing platform, sophisticated
the SDRs apply the configuration at the same time, otherwise signal processing algorithms could be developed to cope with
serious interference and collision might happen. Although challenges such as interference.
protocols such as OpenFlow [174] and Netconf [175] have To leverage NFV for SDRs, developers have been attempt-
been designed for interactions between the controller and ing to tie them together to achieve complete flexibility across
data plane, their implications on the performance of wireless the platform [178], [179]. Although several wireless SDN
networks have not been investigated. Specifically, it is essential architectures have been proposed to address the challenges
to evaluate the effect of hardware platform (i.e., GPP, DSP, of wireless communication [180]–[183], most of them do not
FPGA) on the delay of applying configurations. These chal- benefit from the features of SDRs.
lenges have not been addressed yet.
D. Energy Efficiency
C. Network Functions Virtualization (NFV) Battery-powered devices in an IoT network face the chal-
One of the new topics is the concept of Network Functions lenge of minimizing power consumption in order to extend
Virtualization (NFV), which offers an alternative way of battery-life before they are due for a replacement, which is
designing and managing networking functions. The concept a costly operation. Ready-to-deploy SDR systems may use
is very similar to SDRs, in the sense that various network high-performing platforms, such as FPGAs, without providing
functions can run in software on top of different hardware solutions or alternatives to batteries [184]. Even in the case of
platforms. These platforms are typically high-volume servers, BSs with on-grid power sources, it has become crucial to lower
storage devices, and cloud-computing data centers [176]. Some power consumption in order to reduce CO2 emissions [185].
of the functions that are virtualized via this method are load This is particularly important for cellular network operators,
balancing, firewalls, intrusion detection devices and WAN where BSs consume more that 50% of the total energy
accelerators. This flexibility is what makes NFV very attractive consumed in the network [186]. To address these concerns,
for network operators, carriers, and manufacturers, in addition energy harvesting mechanisms have been introduced. Energy
to cost reductions [177]. From the SDR point of view, instead harvesting or scavenging is the process of deriving power
of performing signal processing operations on the platform, from external sources, such as solar and wind energies (also,
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 18

referred to as green energy), and stored for consumption along- F. Security


side internal sources (e.g., battery or electrical grid sources) SDRs simplify security provisioning. For example, when a
[187]. As this green energy is a viable option for powering new security mechanism does not require hardware replace-
BSs [188], Ericsson (a major telecommunication company) has ment (e.g., 802.11i’s WPA), it can be implemented through
invested into and designed green-energy-powered BSs moti- reprogramming SDRs. On the other side, the reprogramma-
vated by environmental and financial reasons [189]. Therefore, bility feature of SDRs exposes security threats, whether they
hybrid power operation has been accepted as a solution to are standalone or part of a SDN architecture. For example,
lower electrical grid energy consumption and cost [186]. assume an 802.11 network employs SDR-based BSs (i.e.,
In [187], the authors present a survey on energy harvesting access points). In this case, a denial of service (DoS) attack
technologies with regards to transducers, such as antennas can be implemented by instructing a large number of clients
and solar cells, that can utilize renewable energy sources, and to associate with a BS. Also, if the controller is compromised,
cover some of the applications in the IoT and M2M world. then all the SDRs might be reprogrammed to be nonfunctional.
The authors in [188] overview the cellular network operators Therefore, it is important to identify security threats and
that have adopted the hybrid solution and started using green take them into account when designing SDR-based wireless
energies to power their BSs. The authors in [190] discuss the networks.
issue of implementing an energy harvesting transmitter in a Offloading SDR processing to general processing platforms
cognitive radio. They also derive the optimal spectrum sensing through NFV enables the deployment of sophisticated cen-
policy that maximizes the expected total throughput under tral algorithms for detecting abnormal activities and network
energy causality and collision constraints. Energy causality breaches. For example, by analyzing the signal strength re-
states that the total consumed energy should be equal to or ceived from a client at one or multiple BSs, a denial of service
less than the total harvested energy, whereas the collision attack caused by generating excessive interference could be
constraint states that the probability of accessing the occupied detected. Proposing security mechanisms that benefit from the
spectrum is equal to or less than the target probability of a features of SDRs is an important future direction, especially
collision. The authors concluded that a battery-operated radio for large-scale and public networks.
can operate for a long time by optimizing both the energy and
spectral efficiency. Energy harvesting is often associated with
VII. E XISTING S URVEYS
what is known as ”Green IoT”, which is the new trend for IoT
networks and devices, where the main focus is making them Joseph Mitola III was the first to use the term ”Software
more energy efficient. Another relevant work is [191], [192], Radio” in 1993, and published an important work introducing
where the authors present an overview of the challenges and and explaining the concept of using software rather than
existing solutions of energy-efficient IoT systems. traditionally used hardware for designing radio systems [194].
In an interesting early survey (1999) [195] on the ”then”
emerging technology, namely SDR, the authors made the case
E. Co-Design that SDRs have a great potential to advance and facilitate
the development of communication standards such as WiFi
By definition, co-design is the process of realizing system- and cellular networks. In a review published a few years
level goals through exploiting the trade-offs between software later [22], the author paid close attention to the hardware
and hardware in an integrated design. This yields a higher component of SDRs, such as programmable RF modules and
design quality, and optimizes the cost and design cycle time, high-performance DACs and ADCs, as more technological
which in turn shortens the time-to-market time. As co-design advancement had been made.
is adopted for more applications, developers typically face the There are very few notable works to survey and review
problem of partitioning and scheduling. Finding the optimal different SDR platforms and testbeds. One such survey is
design partition is not trivial. While system profilers can assist the work by [38], where the author discusses the challenges
in providing an insight about the system’s behavior and help SDR developers face. These challenges include size, weight,
identify the parts of the code that can be accelerated on power, software architecture, security, certification and busi-
hardware, partitioning should be an automatic process and ness opportunities. While this work is important for compiling
requires no external involvement. There are several algorithms information about these challenges and presenting them in
proposed to address the challenges of partitioning, such as one place, it stays away from enumerating the different SDR
PSO, FCM, and FCMPSO [193]. These are optimization platforms, implementation approaches and their applications
algorithms used to mapping embedded applications to Directed in the communication standards world.
Acyclic Graphs (DAGs) for multi-core architectures. However, In that regard, the authors of [196] compare the SDR plat-
in SDRs, the problem is even more complicated due to forms developed by the year 2012 and give a brief description
having two layers of operation, namely PHY and MAC, and of what each platform entails. It lacks, however, any detailed
partitioning needs to take into account strict real-time re- comparison based on the different categories of computational
quirements. It is a delicate equilibrium between performance, power, energy efficiency, flexibility, adaptability, and cost. It
cost, and correct operation. Even with the process being more is through these comparisons that a designer is able to make
challenging than other design approaches, the benefits of co- an informed decision on what platform to adopt for their
design for complex systems outweigh the initial cost. specific application. The authors of [197] attempt to list and
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 19

review several DSP-based SDR platforms from both academia R EFERENCES


and industry, with more focus on commercial solutions, and [1] “WWRF (2009) Visions and research direction for the wireless world,”
then providing a simple comparison between them in terms Tech. Rep., 2009.
of programmability, power, and flexibility. However, what this [2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and
M. Ayyash, “Internet of Things: A Survey on Enabling Technologies,
work lacks, in addition to being outdated, is the discussion of Protocols, and Applications,” IEEE Communications Surveys and Tu-
FPGAs (mentioned only one example) and hardware/software torials, 2015.
co-design solutions, and a methodical analysis of different [3] IEEE, “IEEE 802.11, The Working Group Setting the Standards for
Wireless LANs.” [Online]. Available: http://www.ieee802.org/11/
design approaches based on a set of performance metrics. [4] 3GPP, “3GPP - Release 16.” [Online]. Available: http://www.3gpp.
Authors of [198] presented a survey of a few SDR platforms org/release-16
that had been developed more than a decade ago. The authors [5] M. Bansal, J. Mehlman, S. Katti, and P. Levis, “OpenRadio,” in
Proceedings of the first workshop on Hot topics in software defined
of [199] presented a paper that laid out the development and networks - HotSDN ’12, New York, USA, 2012, p. 109.
evolution of SDRs over the past several years and discussed [6] ReportBuyer, “Software Defined Radio Market by Application,
the motivation behind gaining even more attention recently. Component, End User, Type - Global Forecast to 2021,”
Tech. Rep., 2016. [Online]. Available: https://www.reportbuyer.
Another work is the comparison conducted by the authors com/product/4364831/software-defined-radio-market-by-application-
in [200] between the Imec Bear platform [201] and a few component-end-user-type-global-forecast-to-2021.html
multicore-based SDR platforms. Other attempts include the [7] Global Industry Analysts Inc, “Software Defined Radio (SDR)
- Global Strategic Business Report,” Tech. Rep., 2017.
work by [202], where the authors focused on discussing the [Online]. Available: https://www.researchandmarkets.com/research/
analog end of the SDR concerning signal sampling, process- pc9x7g/software defined
ing, and the hardware/software responsible for handling these [8] B. Paillassa and C. Morlet, “Flexible satellites: software radio in the
sky,” in 10th International Conference on Telecommunications, 2003.
tasks. The work by [203] compares several SDR platforms to ICT 2003., vol. 2, pp. 1596–1600.
prove the feasibility and reliability of using SDRs in education, [9] P. Angeletti, R. D. Gaudenzi, M. Lisi, I. Introduction, S. Division, and
industry, and government. Another outdated survey is [204]. C. Scientist, “From Bent Pipes to Software Defined Payloads: Evolution
and Trends of Satellite Communications Systems,” System, no. June,
Other surveys that are relevant to SDR platforms is the work pp. 10 – 12, jun 2008.
by [205], where the author discusses and reviews the state- [10] P. Angeletti, M. Lisi, and P. Tognolatti, “Software Defined Radio: A key
of-the-art in microwave technology in transceivers. The paper technology for flexibility and reconfigurability in space applications,”
in 2014 IEEE Metrology for Aerospace (MetroAeroSpace), may 2014,
explores the development of SDRs using different technologies pp. 399–403.
in radio frequency engineering. It is a comprehensive study of [11] J. Seo, Y.-H. Chen, D. S. De Lorenzo, S. Lo, P. Enge, D. Akos, and
several research topics such as the design of tunable radio J. Lee, “A real-time capable software-defined receiver using GPU for
adaptive anti-jam GPS sensors.” Sensors (Basel, Switzerland), vol. 11,
frequency components, linear and power efficient amplifiers, no. 9, pp. 8966–91, 2011.
linear mixers, and interference rejection techniques. Whereas, [12] W. Xiang, F. Sotiropoulos, and S. Liu, “xRadio: An Novel Software De-
authors in [206] provide a very thorough study on the several fined Radio (SDR) Platform and Its Exemplar Application to Vehicle-
to-Vehicle Communications.” Springer, Cham, 2015, pp. 404–415.
security threats and challenges in SDRs, and go over their [13] K. D. Singh, P. Rawat, and J.-M. Bonnin, “Cognitive radio for vehicular
certification process. As SDR platforms grow in popularity, ad hoc networks (CR-VANETs): approaches and challenges,” EURASIP
and more communication protocols are realized using them, Journal on Wireless Communications and Networking, vol. 2014, no. 1,
p. 49, dec 2014.
it becomes essential that developers are prepared for security [14] M. Kloc, R. Weigel, and A. Koelpin, “SDR implementation of an
issues and well-equipped with tools that protect their systems. adaptive low-latency IEEE 802.11p transmitter system for real-time
wireless applications,” in 2017 IEEE Radio and Wireless Symposium
(RWS), jan 2017, pp. 207–210.
VIII. C ONCLUSION [15] Y. Chen, S. Lu, H.-S. Kim, D. Blaauw, R. G. Dreslinski, and T. Mudge,
“A low power software-defined-radio baseband processor for the In-
In this paper, we provided a comprehensive overview of ternet of Things,” in 2016 IEEE International Symposium on High
the various design approaches and hardware platforms adopted Performance Computer Architecture (HPCA), mar 2016, pp. 40–51.
[16] Y. Park, S. Kuk, I. Kang, and H. Kim, “Overcoming IoT Language
for SDR solutions. This includes GPPs, GPUs, DSPs, FPGAs, Barriers Using Smartphone SDRs,” IEEE Transactions on Mobile
and co-design. We explained the basic architectures and ana- Computing, vol. 16, no. 3, pp. 816–828, mar 2017.
lyzed their advantages and disadvantages. Due to the different [17] “Ettus Research - Networked Software Defined Radio (SDR).”
[Online]. Available: https://www.ettus.com/
features of design approaches, it was important to compare [18] K. Tan, H. Liu, J. Zhang, Y. Zhang, J. Fang, and G. M. Voelker,
them against each other in terms of computational power and “Sora: high-performance software radio using general-purpose multi-
power efficiency. We then reviewed the major current and past core processors,” Communications of the ACM, vol. 54, no. 1, p. 99,
jan 2011.
SDR platforms, whether they were developed by the industry [19] M. Bansal, A. Schulman, and S. Katti, “Atomix: A Framework for
or in academia. Finally, we discussed some of the research Deploying Signal Processing Applications on Wireless Infrastructure,”
challenges and topics that are predicted to improve in the near in 12th USENIX Symposium on Networked Systems Design and Imple-
mentation (NSDI 15), 2015, pp. 173–188.
future, helping to advance SDRs and their wide adoption. [20] M.-C. Ng, K. Fleming, M. Vutukuru, S. Gross, Arvind, and H. Bal-
We believe that SDR solutions are going to be mainstream akrishnan, “Airblue: A system for cross-layer wireless protocol devel-
and that their ability to implement different wireless com- opment,” Symp. Architectures for Networking & Communications Syst.
(ANCS), pp. 1–11, 2010.
munication standards with high levels of flexibility and re- [21] “WARP Project.” [Online]. Available: https://warpproject.org/trac
programmability will be considered the norm. This paper poses [22] A. Haghighat, “A review on essentials and technical challenges of
as an exhaustive overview of this phenomenon, its enabling software defined radio,” in MILCOM 2002. Proceedings, 2002, pp.
377–382.
technologies, applications, and the current research tackling [23] U. L. Rohde and T. T. N. Bucher, Communications Receivers: Princi-
this issue from different angles. ples and Design, 4th ed. McGraw-Hill Education, 1988.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 20

[24] T. J. Rouphael, RF and digital signal processing for software-defined [52] K. Li, M. Wu, G. Wang, and J. R. Cavallaro, “A high performance
radio : a multi-standard multi-mode approach. Newnes, 2009. GPU-based software-defined basestation,” in 2014 48th Asilomar Con-
[25] J. J. Carr, The technician’s radio receiver handbook : wireless and ference on Signals, Systems and Computers, nov 2014, pp. 2060–2064.
telecommunication technology. Newnes, 2001. [53] K. Li, B. Yin, M. Wu, J. R. Cavallaro, and C. Studer, “Accelerating
[26] R. Walden, “Analog-to-digital converter survey and analysis,” IEEE massive MIMO uplink detection on GPU for SDR systems,” in 2015
Journal on Selected Areas in Communications, vol. 17, no. 4, pp. 539– IEEE Dallas Circuits and Systems Conference (DCAS), oct 2015, pp.
550, apr 1999. 1–4.
[27] T. Hentschel, M. Henker, and G. Fettweis, “The digital front-end of [54] J. G. Millage, “GPU Integration into a Software Defined Radio
software radio terminals,” IEEE Personal Communications, vol. 6, Framework,” Ph.D. dissertation, Iowa State University, 2010.
no. 4, pp. 40–46, 1999. [55] “TMS320C6670 Multicore Fixed and Floating-Point System-on-
[28] C. Bowick, J. Blyler, and C. J. Ajluni, RF circuit design. Chip — TI.com.” [Online]. Available: https://www.ti.com/product/
Newnes/Elsevier, 2011. tms320c6670
[29] M. N. O. Sadiku and C. M. Akujuobi, “Software-defined radio: a brief [56] L. R. Rabiner and B. Gold, “Theory and application of digital signal
overview,” IEEE Potentials, vol. 23, no. 4, pp. 14–15, oct 2004. processing,” Englewood Cliffs, N.J., Prentice-Hall, Inc., 1975. 777 p.,
[30] L. C. Choo and Z. Lei, “CRC Codes for Short Control Frames in 1975.
IEEE 802.11ah,” in 2014 IEEE 80th Vehicular Technology Conference [57] S. W. others Smith, The scientist and engineer’s guide to digital signal
(VTC2014-Fall), sep 2014, pp. 1–5. processing. California Technical Pub. San Diego, 1997.
[31] F. Berns, G. Kreiselmaier, and N. Wehn, “Channel decoder architecture [58] S. A. Dyer and B. K. Harms, “Digital Signal Processing,” 1993, pp.
for 3G mobile wireless terminals,” in Proceedings Design, Automation 59–117.
and Test in Europe Conference and Exhibition, pp. 192–197. [59] “SMJ320C80 Digital Signal Processor — TI.com.” [Online]. Available:
[32] P.-Y. Chiueh, Tzi-Dar and Tsai, OFDM baseband receiver design for http://www.ti.com/product/SMJ320C80
wireless communications. John Wiley & Sons, 2008. [60] A. Antoniou, Digital Signal Processing. Mcgraw-Hill, 2016.
[33] B. Dezfouli, M. Radi, and O. Chipara, “REWIMO,” ACM Transactions [61] D. A. Patterson and J. L. Hennessy, “Computer organization and
on Sensor Networks, vol. 13, no. 3, pp. 1–42, aug 2017. design,” Morgan Kaufmann, pp. 474–476, 2007.
[34] G. J. Minden, J. B. Evans, L. Searl, D. DePardo, V. R. Petty, R. Rajban- [62] M. J. Flynn, “Some Computer Organizations and Their Effectiveness,”
shi, T. Newman, Q. Chen, F. Weidling, J. Guffey, D. Datla, B. Barker, IEEE Transactions on Computers, vol. C-21, no. 9, pp. 948–960, sep
M. Peck, B. Cordill, A. M. Wyglinski, and A. Agah, “KUAR: A 1972.
Flexible Software-Defined Radio Development Platform,” in 2007 2nd [63] R. Duncan, “A survey of parallel computer architectures,” Computer,
IEEE International Symposium on New Frontiers in Dynamic Spectrum vol. 23, no. 2, pp. 5–16, feb 1990.
Access Networks, apr 2007, pp. 428–439. [64] M. Gschwind, H. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and
[35] K. Kant, Microprocessors and microcontrollers. Prentice-Hall Of T. Yamazaki, “Synergistic Processing in Cell’s Multicore Architecture,”
India, 2014. IEEE Micro, vol. 26, no. 2, pp. 10–24, mar 2006.
[36] T. Kazaz, C. Van Praet, M. Kulin, P. Willemen, and I. Moerman, [65] “nxp — Microcontrollers and Processors.” [Online]. Avail-
able: http://www.nxp.com/products/microcontrollers-and-processors:
“Hardware Accelerated SDR Platform for Adaptive Air Interfaces,”
apr 2017. MICROCONTROLLERS-AND-PROCESSORS
[66] “Livanto ICE8060.” [Online]. Available: http : / /
[37] C.-H. Lin, B. Greene, S. Narasimha, and J. Cai, “High performance
www.icerasemi.com/products/livanto chipsets/livanto soft baseband/
14nm SOI FinFET CMOS technology with 0.0174m2 embedded
Livanto ICE8060/index.html
DRAM and 15 levels of Cu metallization,” in 2014 IEEE International
[67] Y. Lin, H. Lee, M. Woh, Y. Harel, S. Mahlke, T. Mudge, C. Chakrabarti,
Electron Devices Meeting, dec 2014, pp. 3.8.1–3.8.3.
and K. Flautner, “SODA: A low-power architecture for software radio,”
[38] T. Ulversoy, “Software Defined Radio: Challenges and Opportunities,”
in Proceedings - International Symposium on Computer Architecture,
IEEE Communications Surveys & Tutorials, vol. 12, no. 4, pp. 531–
vol. 2006, no. 2, may 2006, pp. 89–100.
550, 2010.
[68] A. Gatherer, T. Stetzler, M. McMahan, and E. Auslander, “DSP-based
[39] R. Kamal, Embedded systems : architecture, programming and design. architectures for mobile communications: past, present and future,”
New Delhi: Tata MaGraw-Hill, 2003. IEEE Communications Magazine, vol. 38, no. 1, pp. 84–90, 2000.
[40] “GNU Radio.” [Online]. Available: https://www.gnuradio.org/ [69] “TigerSHARC Processors — Analog Devices.”
[41] K. Vachhani, “Multiresolution analysis: An unified approach using [70] “CEVA- Leading Licensor of Signal Processing IP.” [Online].
Discrete Wavelet Transform on GNU radio,” in 2015 International Available: https://www.ceva-dsp.com/
Conference on Green Computing and Internet of Things (ICGCIoT), [71] “Qualcomm — Wireless Technology and Innovation.” [Online].
oct 2015, pp. 887–892. Available: https://www.qualcomm.com/
[42] M. Vestias and H. Neto, “Trends of CPU, GPU and FPGA for high- [72] G. Frantz, “Digital signal processor trends,” IEEE Micro, vol. 20, no. 6,
performance computing,” in 2014 24th International Conference on pp. 52–59, 2000.
Field Programmable Logic and Applications (FPL), sep 2014, pp. 1– [73] R. Akeela and Y. El Ziq, “Design and Verification of IEEE 802.11ah
6. for IoT and M2M Applications,” 2017 IEEE International Conference
[43] “CPU vs GPU performance - michaelgalloy.com.” [Online]. Available: on Pervasive Computing and Communications Workshops (PerCom
http://michaelgalloy.com/2013/06/11/cpu-vs-gpu-performance.html Workshops), pp. 491–496, mar 2017.
[44] “Intel Xeon Processors.” [Online]. Available: https://www.intel.com/ [74] X. Cai, M. Zhou, and X. Huang, “Model-Based Design for Software
content/www/us/en/products/processors/xeon.html Defined Radio on an FPGA,” IEEE Access, vol. 5, pp. 8276–8283,
[45] “NVIDIA - Visual Computing Technologies.” [Online]. Available: 2017.
http://www.nvidia.com/content/global/global.php [75] I. Kuon, R. Tessier, and J. Rose, “FPGA Architecture: Survey and
[46] A. Fisne and A. Ozsoy, “Grafik processor accelerated real time soft- Challenges,” Foundations and Trends in Electronic Design Automation,
ware defined radio applications,” in 2017 25th Signal Processing and vol. 2, no. 2, pp. 135–253, 2007.
Communications Applications Conference (SIU), may 2017, pp. 1–4. [76] N. Grover and M. K. Soni, “Reduction of Power Consumption in FP-
[47] B. Cope, P. Y. Cheung, W. Luk, and L. Howes, “Performance Compar- GAs -An Overview,” Information Engineering and Electronic Business,
ison of Graphics Processors to Reconfigurable Logic: A Case Study,” vol. 5, no. 5, pp. 50–69, 2012.
IEEE Transactions on Computers, vol. 59, no. 4, pp. 433–448, apr [77] D. Strenski, C. Kulkarni, J. Cappello, O. Design, P. Sundararajan,
2010. F. Programmable, G. Arrays, B. High, and T. Graphical, “Latest
[48] “CUDA Toolkit Documentation.” [Online]. Available: http://docs. FPGAs Show Big Gains in Floating Point Performance,” pp. 1–
nvidia.com/cuda/ 7, 2014. [Online]. Available: https://www.hpcwire.com/2012/04/16/
[49] P. Szegvari and C. Hentschel, “Scalable Software Defined FM-radio re- latest fpgas show big gains in floating point performance/
ceiver running on desktop computers,” in 2009 IEEE 13th International [78] “Xilinx - All Programmable.” [Online]. Available: https://www.xilinx.
Symposium on Consumer Electronics, may 2009, pp. 535–539. com/
[50] “Amd — processors — graphics and technology.” [79] D. Lewis, G. Chiu, J. Chromczak, D. Galloway, B. Gamsa,
[51] J. Koomey, S. Berard, M. Sanchez, and H. Wong, “Implications of V. Manohararajah, I. Milton, T. Vanderhoek, and J. Van Dyken, “The
Historical Trends in the Electrical Efficiency of Computing,” IEEE Stratix 10 Highly Pipelined FPGA Architecture,” in Proceedings of the
Annals of the History of Computing, vol. 33, no. 3, pp. 46–54, mar 2016 ACM/SIGDA International Symposium on Field-Programmable
2011. Gate Arrays - FPGA ’16, New York, USA, 2016, pp. 159–168.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 21

[80] K. Sano and S. Yamamoto, “FPGA-Based Scalable and Power-Efficient Codesign,” in 2017 Euromicro Conference on Digital System Design
Fluid Simulation using Floating-Point DSP Blocks,” IEEE Transactions (DSD), aug 2017, pp. 533–537.
on Parallel and Distributed Systems, vol. 28, no. 10, pp. 2823–2837, [107] K. Underwood and Keith, “FPGAs vs. CPUs: trends in peak floating-
oct 2017. point performance,” in Proceeding of the 2004 ACM/SIGDA 12th
[81] S. Kestur, J. D. Davis, and O. Williams, “BLAS Comparison on FPGA, international symposium on Field programmable gate arrays - FPGA
CPU and GPU,” in 2010 IEEE Computer Society Annual Symposium ’04, New York, USA, 2004, p. 171.
on VLSI, jul 2010, pp. 288–293. [108] B. Betkaoui, D. B. Thomas, and W. Luk, “Comparing performance
[82] R. Woods and Wiley InterScience, FPGA-based implementation of and energy efficiency of FPGAs and GPUs for high productivity
signal processing systems. John Wiley & Sons, 2008. computing,” in 2010 International Conference on Field-Programmable
[83] U. Meyer-Baese, Digital Signal Processing with Field Programmable Technology, dec 2010, pp. 94–101.
Gate Arrays, ser. Signals and Communication Technology. Berlin, [109] B. Duan, W. Wang, X. Li, C. Zhang, P. Zhang, and N. Sun, “Floating-
Heidelberg: Springer Berlin Heidelberg, 2014. point mixed-radix fft core generation for fpga and comparison with gpu
[84] “HDL Coder.” [Online]. Available: https://www.mathworks.com/ and cpu,” in 2011 International Conference on Field-Programmable
products/hdl-coder.html Technology, Dec 2011, pp. 1–6.
[85] “MATLAB - MathWorks.” [Online]. Available: https://www. [110] M. Owaida, P. Ienne, G. Falcao, J. Andrade, C. Antonopoulos,
mathworks.com/products/matlab.html N. Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, and A. Burg,
[86] “Vivado High-Level Synthesis.” [Online]. Available: https://www. “Enhancing Design Space Exploration by Extending CPU/GPU Spec-
xilinx.com/products/design-tools/vivado/integration/esl-design.html ifications onto FPGAs,” ACM Transactions on Embedded Computing
[87] “Intel FPGA and SoC.” [Online]. Available: https://www.altera.com/ Systems, vol. 14, no. 2, pp. 1–23, feb 2015.
[88] S. Choi, R. Scrofano, V. K. Prasanna, and J.-W. Jang, “Energy- [111] X. Tian and K. Benkrid, “High-Performance Quasi-Monte Carlo Fi-
efficient signal processing using FPGAs,” in Proceedings of the 2003 nancial Simulation: FPGA vs. GPP vs. GPU,” ACM Transactions on
ACM/SIGDA eleventh international symposium on Field programmable Reconfigurable Technology and Systems, vol. 3, no. 4, pp. 1–22, nov
gate arrays - FPGA ’03, New York, USA, 2003, p. 225. 2010.
[89] Altera, “Achieving One TeraFLOPS with 28-nm FPGAs,” Altera White [112] Q. Wu, Y. Ha, A. Kumar, S. Luo, A. Li, and S. Mohamed, “A
Paper, 2010. heterogeneous platform with GPU and FPGA for power efficient
[90] “Lattice Semiconductor.” [Online]. Available: http://www.latticesemi. high performance computing,” in 2014 International Symposium on
com/ Integrated Circuits (ISIC), dec 2014, pp. 220–223.
[91] “Microsemi — Semiconductor & System Solutions — Power Matters.” [113] H. Giefers, P. Staar, C. Bekas, and C. Hagleitner, “Analyzing the
[Online]. Available: https://www.microsemi.com/ energy-efficiency of sparse matrix multiplication on heterogeneous
systems: A comparative study of GPU, Xeon Phi and FPGA,” in 2016
[92] A. Dutta, D. Saha, D. Grunwald, and D. Sicker, “CODIPHY,” in
IEEE International Symposium on Performance Analysis of Systems
Proceedings of the second workshop on Software radio implementation
and Software (ISPASS), apr 2016, pp. 46–56.
forum - SRIF ’13, New York, USA, 2013, p. 1.
[114] J. M. P. Cardoso and M. Weinhardt, “High-Level Synthesis,” in FPGAs
[93] W. Wolf, “A decade of hardware/software codesign,” Computer,
for Software Programmers. Cham: Springer International Publishing,
vol. 36, no. 4, pp. 38–43, Apr 2003.
2016, pp. 23–47.
[94] G. De Micheli, R. Ernst, and M. Wolf, Readings in hardware/software
[115] R. Tessier, K. Pocek, and A. DeHon, “Reconfigurable Computing
co-design. Morgan Kaufmann Publishers, 2002.
Architectures,” Proceedings of the IEEE, vol. 103, no. 3, pp. 332–354,
[95] J. Teich, “Hardware/Software Codesign: The Past, the Present, and mar 2015.
Predicting the Future,” Proceedings of the IEEE, vol. 100, no. Special [116] J. Andrade, N. George, K. Karras, D. Novo, F. Pratas, L. Sousa,
Centennial Issue, pp. 1411–1430, may 2012. P. Ienne, G. Falcao, and V. Silva, “Design Space Exploration of LDPC
[96] “Architecting a Smarter World Arm.” [Online]. Available: https: Decoders using High-Level Synthesis,” IEEE Access, pp. 1–1, 2017.
//www.arm.com/ [117] A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, T. Cza-
[97] S. Windh, X. Ma, R. J. Halstead, P. Budhkar, Z. Luna, O. Hussaini, and jkowski, S. D. Brown, and J. H. Anderson, “LegUp: : An open-
W. A. Najjar, “High-level language tools for reconfigurable computing,” source high-level synthesis tool for FPGA-based processor/accelerator
Proceedings of the IEEE, vol. 103, no. 3, 2015. systems,” ACM Transactions on Embedded Computing Systems, vol. 13,
[98] K. B. Chehida and M. Auguin, “HW / SW partitioning approach no. 2, pp. 1–27, sep 2013.
for reconfigurable system design,” in Proceedings of the international [118] J. Cong, Bin Liu, S. Neuendorffer, J. Noguera, K. Vissers, and
conference on Compilers, architecture, and synthesis for embedded Zhiru Zhang, “High-Level Synthesis for FPGAs: From Prototyping
systems - CASES ’02, New York, USA, 2002, p. 247. to Deployment,” IEEE Transactions on Computer-Aided Design of
[99] M. López-Vallejo and J. C. López, “On the hardware-software partition- Integrated Circuits and Systems, vol. 30, no. 4, pp. 473–491, apr 2011.
ing problem,” ACM Transactions on Design Automation of Electronic [119] “Intel FPGA SDK for OpenCL.” [Online]. Available:
Systems, vol. 8, no. 3, pp. 269–297, jul 2003. https://www.altera.com/products/design-software/embedded-software-
[100] L. Zhuo and V. K. Prasanna, “Hardware/Software Co-Design for Matrix developers/opencl/overview.html
Computations on Reconfigurable Computing Systems,” in 2007 IEEE [120] “MaxCompiler — Maxeler Technologies.” [Online]. Available:
International Parallel and Distributed Processing Symposium, 2007, https://www.maxeler.com/products/software/maxcompiler/
pp. 1–10. [121] G. Inggs, S. Fleming, D. Thomas, and W. Luk, “Is high level synthesis
[101] G. Sapienza, I. Crnkovic, and P. Potena, “Architectural Decisions for ready for business? A computational finance case study,” in 2014
HW/SW Partitioning Based on Multiple Extra-Functional Properties,” International Conference on Field-Programmable Technology (FPT),
in 2014 IEEE/IFIP Conference on Software Architecture, apr 2014, pp. dec 2014, pp. 12–19.
175–184. [122] R. Nane, V.-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. T.
[102] I. Bolsens, H. De Man, B. Lin, K. Van Rompaey, S. Vercauteren, and Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “A
D. Verkest, “Hardware/software co-design of digital telecommunication Survey and Evaluation of FPGA High-Level Synthesis Tools,” IEEE
systems,” Proceedings of the IEEE, vol. 85, no. 3, pp. 391–418, mar Transactions on Computer-Aided Design of Integrated Circuits and
1997. Systems, vol. 35, no. 10, pp. 1591–1604, oct 2016.
[103] C. Zhang, Y. Ma, and W. Luk, “HW/SW Partitioning Algorithm [123] “High-Level Synthesis For Any FPGA — LegUp Computing.”
Targeting MPSOC with Dynamic Partial Reconfigurable Fabric,” in [Online]. Available: https://www.legupcomputing.com/
2015 14th International Conference on Computer-Aided Design and [124] W. Meeus, K. Van Beeck, T. Goedemé, J. Meel, and D. Stroobandt,
Computer Graphics (CAD/Graphics), aug 2015, pp. 240–241. “An overview of today’s high-level synthesis tools,” Design Automation
[104] D. K. Halim, S. W. Lee, M. S. Ng, Z. N. Lim, and C. M. Tang, “Ex- for Embedded Systems, vol. 16, no. 3, pp. 31–51, sep 2012.
ploring software-defined radio on Multi-Processor System-on-Chip,” in [125] S. Ravi and M. Joseph, “Open source HLS tools: A stepping stone
2015 3rd International Conference on New Media (CONMEDIA), nov for modern electronic CAD,” in 2016 IEEE International Conference
2015, pp. 1–4. on Computational Intelligence and Computing Research (ICCIC), dec
[105] G. Snider, P. Kuekes, and R. S. Williams, “{CMOS}-like logic in 2016, pp. 1–8.
defective, nanoscale crossbars,” Nanotechnology, vol. 15, no. 8, pp. [126] L. A. Tambara, J. Tonfat, A. Santos, F. Lima Kastensmidt, N. H.
881–891, aug 2004. Medina, N. Added, V. A. P. Aguiar, F. Aguirre, and M. A. G.
[106] M. Gobel, A. Elhossini, and B. Juurlink, “A Methodology for Predict- Silveira, “Analyzing Reliability and Performance Trade-Offs of HLS-
ing Application-Specific Achievable Memory Bandwidth for HW/SW- Based Designs in SRAM-Based FPGAs Under Soft Errors,” IEEE
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 22

Transactions on Nuclear Science, vol. 64, no. 2, pp. 874–881, feb [152] B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong,
2017. J. D. Owens, B. Towles, A. Chang, and S. Rixner, “Imagine: media
[127] “SDSoC Development Environment.” [Online]. Available: https: processing with streams,” IEEE Micro, vol. 21, no. 2, pp. 35–46, 2001.
//www.xilinx.com/products/design-tools/software-zone/sdsoc.html [153] B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owens, and
[128] “Stratus High-Level Synthesis - Cadence Design Systems.” [Online]. B. Towles, “Exploring the VLSI scalability of stream processors,” in
Available: https://www.cadence.com/content/cadence-www/global/en The Ninth International Symposium on High-Performance Computer
US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level- Architecture, 2003. HPCA-9 2003. Proceedings., 2003, pp. 153–164.
synthesis.html [154] J. Gummaraju and M. Rosenblum, “Stream Programming on General-
[129] “Synphony C Compiler - Synopsys.” [Online]. Available: Purpose Processors,” in 38th Annual IEEE/ACM International Sympo-
https://www.synopsys.com/implementation- and- signoff/rtl- synthesis- sium on Microarchitecture (MICRO’05), 2005, pp. 343–354.
test/synphony-c-compiler.html [155] B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh, D. W. Tovey,
[130] “MathWorks - Makers of MATLAB and Simulink.” [Online]. and W. J. Dally, “A Programmable 512 GOPS Stream Processor for
Available: https://www.mathworks.com/ Signal, Image, and Video Processing,” IEEE Journal of Solid-State
[131] “LabVIEW - National Instruments.” [Online]. Available: http: Circuits, vol. 43, no. 1, pp. 202–213, jan 2008.
//www.ni.com/en-us/shop/labview.html [156] M. Woh, Y. Lin, S. Seo, S. Mahlke, T. Mudge, C. Chakrabarti, R. Bruce,
[132] “Simulink - Simulation and Model-Based Design.” [Online]. Available: D. Kershaw, A. Reid, M. Wilder, and K. Flautner, “From SODA to
https://www.mathworks.com/products/simulink.html scotch: The evolution of a wireless baseband processor,” in 2008 41st
[133] “MATLAB Coder.” [Online]. Available: https://www.mathworks.com/ IEEE/ACM International Symposium on Microarchitecture, nov 2008,
products/matlab-coder.html pp. 152–163.
[134] “Simulink Coder - MATLAB & Simulink.” [Online]. Available: [157] “DigiKey Electronics - Electronic Components Distributor.” [Online].
https://www.mathworks.com/products/simulink-coder.html Available: https://www.digikey.com/
[135] “Embedded Coder - MATLAB & Simulink.” [Online]. Available: [158] “Newark element14 Electronics — Electronic Components
https://www.mathworks.com/products/embedded-coder.html Distributor.” [Online]. Available: https://www.newark.com
[136] “ZedBoard Zynq-7000 ARM/FPGA SoC Development Board.” [159] “BeagleBoard.org - community supported open hardware computers
[Online]. Available: https://www.xilinx.com/products/boards- and- for making.” [Online]. Available: http://beagleboard.org/
kits/1-elhabt.html [160] “Imagination Technologies - Developing and Licensing IP cores.”
[137] R. W. Stewart, L. Crockett, D. Atkinson, K. Barlee, D. Crawford, [Online]. Available: https://www.imgtec.com/
I. Chalmers, M. Mclernon, and E. Sozer, “A low-cost desktop software [161] A. S. Fayez, “Designing a Software Defined Radio to Run on a
defined radio design environment using MATLAB, simulink, and the Heterogeneous Processor,” Ph.D. dissertation, Virginia Tech, apr 2011.
RTL-SDR,” IEEE Communications Magazine, vol. 53, no. 9, pp. 64– [162] B. Le, F. A. Rodriguez, Q. Chen, B. P. Li, F. Ge, M. ElNainay, T. W.
71, sep 2015. Rondeau, and C. W. Bostian, “A public safety cognitive radio node,”
[138] R. W. Stewart, K. W. Barlee, D. S. W. Atkinson, and L. H. Crockett, in Proceedings of the 2007 SDR forum technical conference, SDRF
Software Defined Radio using MATLAB & Simulink and the RTL-SDR. Google Scholar, 2007.
Strathclyde Academic Media, 2015.
[163] M. M. Bezem, J. W. Klop, R. de. Vrijer, and Terese (Group), Term
[139] “RTL-SDR (RTL2832U).” [Online]. Available: https://www.rtl-
rewriting systems. Cambridge University Press, 2003.
sdr.com/
[164] “32-bit ARM Cortex-M3 PSoC 5LP — Cypress Semiconductor.”
[140] Blossom and Eric, “GNU radio: tools for exploring the radio frequency
[Online]. Available: http://www.cypress.com/products/32- bit- arm-
spectrum,” Linux Journal, vol. 2004, no. 122, p. 4, 2004.
cortex-m3-psoc-5lp
[141] “Vivado System Generator for DSP.” [Online]. Available: https://
[165] “PSoC SDR PAøRWE.” [Online]. Available: http://pa0rwe.nl/?page
www.xilinx.com/products/design-tools/vivado/integration/sysgen.html
id=32
[142] J. Malsbury, “Modular, open-source software transceiver for PHY/MAC
research,” Proceedings of the second workshop on Software radio [166] B. Drozdenko, M. Zimmermann, T. Dao, K. Chowdhury, and
implementation forum - SRIF ’13, 2013. M. Leeser, “Hardware-Software Codesign of Wireless Transceivers on
[143] M. Abirami, V. Hariharan, M. B. Sruthi, R. Gandhiraj, and K. P. Soman, Zynq Heterogeneous Systems,” IEEE Transactions on Emerging Topics
“Exploiting GNU radio and USRP: An economical test bed for real in Computing, pp. 1–1, 2017.
time communication systems,” in 2013 4th International Conference on [167] Z. X. Zhang Xin, T. L.-b. Tang Lin-bo, and J. M.-p. Ji Mei-ping,
Computing, Communications and Networking Technologies, ICCCNT “Remote updating for DSP and FPGA programs,” in IET International
2013, 2013. Radar Conference 2015. Institution of Engineering and Technology,
[144] K. Küçük, “RTWiFi-Lab: A real-time Wi-Fi laboratory platform on 2015, pp. 4 .–4 .
USRP and LabVIEW for wireless communications education and [168] J. Hoffmeyer, Il-Pyung Park, M. Majmundar, and S. Blust, “Radio
research,” Computer Applications in Engineering Education, aug 2017. software download for commercial wireless reconfigurable devices,”
[145] V. P. G. Jimenez, A. L. Serrano, B. G. Guzman, and A. G. Armada, IEEE Communications Magazine, vol. 42, no. 3, pp. S26–S32, mar
“Learning Mobile Communications Standards through Flexible Soft- 2004.
ware Defined Radio Base Stations,” IEEE Communications Magazine, [169] B. Bing, “A fast and secure framework for over-the-air wireless
vol. 55, no. 5, pp. 116–123, may 2017. software download using reconfigurable mobile devices,” IEEE Com-
[146] “Software Defined Radio - Lime Micro.” [Online]. Available: munications Magazine, vol. 44, no. 6, pp. 58–63, jun 2006.
http://www.limemicro.com/products/software-defined-radio/ [170] L. Zhou, Q. Liu, B. Wang, P. Yang, X. Li, and J. Zhang, “Remote
[147] G. Stewart, M. Gowda, G. Mainland, B. Radunovic, D. Vytiniotis, and System Update for System on Programmable Chip Based on Controller
C. L. Agullo, “Ziria,” in Proceedings of the Twentieth International Area Network,” Electronics, vol. 6, no. 2, p. 45, jun 2017.
Conference on Architectural Support for Programming Languages and [171] A. Fernandes, R. C. Pereira, J. Sousa, P. F. Carvalho, M. Correia, A. P.
Operating Systems - ASPLOS ’15, vol. 50, no. 4, New York, USA, Rodrigues, B. B. Carvalho, C. M. B. A. Correia, and B. Goncalves,
2015, pp. 415–428. “FPGA Remote Update for Nuclear Environments,” IEEE Transactions
[148] P. D. Sutton, J. Lotze, H. Lahlou, S. A. Fahmy, K. E. Nolan, B. ??zg??l, on Nuclear Science, vol. 63, no. 3, pp. 1645–1649, jun 2016.
T. W. Rondeau, J. Noguera, and L. E. Doyle, “Iris: An architecture for [172] J. Vliegen, N. Mentens, and I. Verbauwhede, “Secure, Remote, Dy-
cognitive radio networking testbeds,” IEEE Communications Magazine, namic Reconfiguration of FPGAs,” ACM Transactions on Reconfig-
vol. 48, no. 9, pp. 114–122, sep 2010. urable Technology and Systems, vol. 7, no. 4, pp. 1–19, dec 2014.
[149] J. van de Belt, P. D. Sutton, and L. E. Doyle, “Accelerating software [173] H. Kashyap and R. Chaves, “Compact and On-the-Fly Secure Dynamic
radio: Iris on the Zynq SoC,” in 2013 IFIP/IEEE 21st International Reconfiguration for Volatile FPGAs,” ACM Transactions on Reconfig-
Conference on Very Large Scale Integration (VLSI-SoC), oct 2013, pp. urable Technology and Systems, vol. 9, no. 2, pp. 1–22, jan 2016.
294–295. [174] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson,
[150] J. Kim, S. Hyeon, and S. Choi, “Implementation of an SDR system J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling innovation
using graphics processing unit,” IEEE Communications Magazine, in campus networks,” ACM SIGCOMM Computer Communication
vol. 48, no. 3, pp. 156–162, mar 2010. Review, vol. 38, no. 2, pp. 69–74, 2008.
[151] S. Rajagopal, S. Rixner, and J. R. Cavallaro, “A programmable base- [175] R. Enns, M. Bjorklund, and J. Schoenwaelder, “Network configuration
band processor design for software defined radios,” in The 2002 45th protocol (netconf),” 2011.
Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002., [176] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and
vol. 3, 2002, pp. 413–416. R. Boutaba, “Network Function Virtualization: State-of-the-Art and
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 23

Research Challenges,” IEEE Communications Surveys & Tutorials, [200] M. Palkovic, P. Raghavan, M. Li, A. Dejonghe, L. Van der Perre, and
vol. 18, no. 1, pp. 236–262, 2016. F. Catthoor, “Future Software-Defined Radio Platforms and Mapping
[177] J. Gil Herrera and J. F. Botero, “Resource Allocation in NFV: A Flows,” IEEE Signal Processing Magazine, vol. 27, no. 2, pp. 22–33,
Comprehensive Survey,” IEEE Transactions on Network and Service mar 2010.
Management, vol. 13, no. 3, pp. 518–532, sep 2016. [201] V. Derudder and B. Bougard and A. Couvreur and A. Dewilde and
[178] S. Sun, M. Kadoch, L. Gong, and B. Rong, “Integrating network S. Dupont and L. Folens and L. Hollevoet and F. Naessens and D.
function virtualization with SDR and SDN for 4G/5G networks,” IEEE Novo and P. Raghavan and T. Schuster and K. Stinkens and J. W.
Network, vol. 29, no. 3, pp. 54–59, may 2015. Weijers and L. Van der Perre, “A 200Mbps+ 2.14nJ/b digital baseband
[179] P. Shome, Muxi Yan, S. M. Najafabad, N. Mastronarde, and multi processor system-on-chip for SDRs,” in 2009 Symposium on VLSI
A. Sprintson, “CrossFlow: A cross-layer architecture for SDR using Circuits, 2009, pp. 292–293.
SDN principles,” in 2015 IEEE Conference on Network Function [202] R. G. Machado and A. M. Wyglinski, “Software-defined radio: Bridg-
Virtualization and Software Defined Network (NFV-SDN), nov 2015, ing the analog-digital divide,” Proceedings of the IEEE, vol. 103, no. 3,
pp. 37–39. pp. 409–423, mar 2015.
[180] J. Schulz-Zander, C. Mayer, B. Ciobotaru, S. Schmid, and A. Feld- [203] A. M. Wyglinski, D. P. Orofino, M. N. Ettus, and T. W. Rondeau,
mann, “Opensdwn: Programmatic control over home and enterprise “Revolutionizing software defined radio: case studies in hardware,
wifi,” in Proceedings of the 1st ACM SIGCOMM Symposium on software, and education,” IEEE Communications Magazine, vol. 54,
Software Defined Networking Research. ACM, 2015, p. 16. no. 1, pp. 68–75, jan 2016.
[181] A. Zubow, S. Zehl, and A. Wolisz, “BIGAP - Seamless handover in [204] R. Farrell, M. Sanchez, and G. Corley, “Software-Defined Radio
high performance enterprise IEEE 802.11 networks,” in in IEEE/IFIP Demonstrators: An Example and Future Trends,” International Journal
Network Operations and Management Symposium (NOMS), April of Digital Multimedia Broadcasting, pp. 1–12, mar 2009.
2016, pp. 445–453. [205] T. Nesimoglu, “A review of Software Defined Radio enabling technolo-
[182] R. Riggio, M. K. Marina, J. Schulz-Zander, S. Kuklinski, and gies,” in 2010 10th Mediterranean Microwave Symposium, aug 2010,
T. Rasheed, “Programming abstractions for software-defined wireless pp. 87–90.
networks,” IEEE Transactions on Network and Service Management, [206] G. Baldini, T. Sturman, A. R. Biswas, R. Leschhorn, G. Godor, and
vol. 12, no. 2, pp. 146–162, June 2015. M. Street, “Security Aspects in Software Defined Radio and Cognitive
[183] J. Vestin and A. Kassler, “Qos enabled wifi mac layer processing as Radio Networks: A Survey and A Way Ahead,” IEEE Communications
an example of a nfv service,” in 1st IEEE Conference on Network Surveys & Tutorials, vol. 14, no. 2, pp. 355–379, 2012.
Softwarization (NetSoft), April 2015, pp. 1–9.
[184] L.-m. Ang, K. P. Seng, L. W. Chew, L. S. Yeong, and W. C. Chia,
Wireless Multimedia Sensor Networks on Reconfigurable Hardware.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2013.
[185] T. Han and N. Ansari, “A Traffic Load Balancing Framework for
Software-Defined Radio Access Networks Powered by Hybrid Energy
Sources,” IEEE/ACM Transactions on Networking, vol. 24, no. 2, pp.
1038–1051, apr 2016.
[186] ——, “On Optimizing Green Energy Utilization for Cellular Networks
with Hybrid Energy Supplies,” IEEE Transactions on Wireless Com-
munications, vol. 12, no. 8, pp. 3872–3882, aug 2013.
[187] M. M. Tentzeris, A. Georgiadis, and L. Roselli, “Energy Harvesting
and Scavenging [Scanning the Issue],” Proceedings of the IEEE, vol.
102, no. 11, pp. 1644–1648, nov 2014.
[188] T. Han and N. Ansari, “Powering mobile networks with green energy,”
IEEE Wireless Communications, vol. 21, no. 1, pp. 90–96, feb 2014.
[189] Ericsson Inc., “Sustainable Energy Use in Mobile Communications
Abstract — TechOnline,” Tech. Rep., 2017.
[190] S. Park, H. Kim, and D. Hong, “Cognitive Radio Networks with Energy
Harvesting,” IEEE Transactions on Wireless Communications, vol. 12,
no. 3, pp. 1386–1397, mar 2013.
[191] F. K. Shaikh, S. Zeadally, and E. Exposito, “Enabling technologies for
green internet of things,” IEEE Systems Journal, 2017.
[192] R. Arshad, S. Zahoor, M. A. Shah, A. Wahid, and H. Yu, “Green IoT:
An Investigation on Energy Saving Practices for 2020 and Beyond,”
IEEE Access, vol. 5, pp. 15 667–15 681, 2017.
[193] I. Mhadhbi, S. Ben Othman, and S. Ben Saoud, “An Efficient
Technique for Hardware/Software Partitioning Process in Codesign,”
Scientific Programming, pp. 1–11, Jul 2016.
[194] J. M. Mitola, “Software Radios Survey, Critical Evaluation and Future
Directions,” IEEE Aerospace and Electronic Systems Magazine, vol. 8,
no. 4, pp. 25–36, apr 1993.
[195] W. H. W. Tuttlebee, “Software-defined radio: facets of a developing
technology,” IEEE Personal Communications, vol. 6, no. 2, pp. 38–44,
apr 1999.
[196] M. Dardaillon, K. Marquet, T. Risset, and A. Scherrer, “Software de-
fined radio architecture survey for cognitive testbeds,” in IWCMC 2012
- 8th International Wireless Communications and Mobile Computing
Conference, aug 2012, pp. 189–194.
[197] O. Anjum, T. Ahonen, F. Garzia, J. Nurmi, C. Brunelli, and H. Berg,
“State of the art baseband DSP platforms for Software Defined Ra-
dio: A survey,” EURASIP Journal on Wireless Communications and
Networking, vol. 2011, no. 1, p. 5, dec 2011.
[198] M. Cummings and T. Cooklev, “Tutorial: Software-defined radio tech-
nology,” in 2007 25th International Conference on Computer Design,
oct 2007, pp. 103–104.
[199] C. Moy and J. Palicot, “Software radio: a catalyst for wireless innova-
tion,” IEEE Communications Magazine, vol. 53, no. 9, pp. 24–30, sep
2015.

You might also like