You are on page 1of 13

RAMDEOBABA COLLEGE OF ENGINEERING AND

MANAGEMENT

Report on CORTEX-R4
ENT-553 Embedded System and RTOS
Assignment
Shivani Parhad
Roll No 1
Mtech VLSI Design
1st Semester

Contents
1

Introduction........................................................................................................ 3
1.1

Why Cortex-R4?............................................................................................ 3

1.2

About Processor............................................................................................. 4

Functional Description........................................................................................... 5
2.1

Data processing unit........................................................................................ 5

2.2

Load/store unit............................................................................................... 5

2.3

Prefetch unit.................................................................................................. 5

2.4

L1 Memory system......................................................................................... 6

2.5

L2 AXI Interfaces........................................................................................... 7

2.6

Debug......................................................................................................... 7

2.7

System Control Coprocessor..............................................................................8

2.8

Interrupt Handling.......................................................................................... 8

2.9

Power Management......................................................................................... 9

Comparison between Cortex R series.......................................................................10

Applications...................................................................................................... 10

References........................................................................................................ 12

Figure 1 Example Cortex R4 System................................................................................4

Figure 2 Processor Block Diagram...................................................................................4


Figure 3 Comparison of cortex R series processors...............................................................9
Figure 4 3g usb stick.................................................................................................. 10
Figure 5 Texas Instruments TMS 570.............................................................................10
Figure 6 Infineon MD8710......................................................................................... 10

Introduction

An embedded system can be defined as a piece of computer hardware running software


designedto perform a specific task. This contrasts with what is generally considered a

computer system, that is, one that runs a wide range of general purpose software running
multiple tasks.

1.1 Why Cortex-R4?


Embedded Systems are to be designed for large and more complex applications like Aircraft
that contain advanced avionics such as inertial guidance systems and GPS receivers, both
systems with considerable safety requirements or Automotive safety systems include Antilock Braking System (ABS), Electronic Stability Control (ESC/ESP), Traction Control (TCS)
and automatic four-wheel drive.
In dealing with security, the embedded systems can also be self-sufficient and be able to deal
with a failure of electrical and communication systems .There are many constraints on
embedded systems that can make programming them more of a challenge than writing an
application for a general-purpose PC.
Memory Footprint
In many systems, to minimize cost and power, memory size is limited. This forces you to
consider the size of the program and how to reduce memory usage while it runs.
Power
In many embedded systems the power source is a battery. Programmers and hardware
designers must take great care to minimize the total energy usage of the system.
Real-time behavior
A feature of certain systems is that there are time constraints to respond to external events.
This might be a hard requirement or soft requirement. An
example hard requirement is a car braking system because it must respond within
a certain time consistently. An example soft requirement is an audio processing
system because it must complete within a certain time. The ARM Real-time (R)
profile defines an architecture aimed at systems that require deterministic timing
and low interrupt latency.
A system is said to be real-time if the total correctness of an operation depends not only on its
logical correctness, but also on the time in which it is performed. Real-time systems, and their
deadlines, are classified by the consequence of missing a deadline:
Hard Missing a deadline is a total system failure.
Firm Infrequent deadline misses are tolerable, but can degrade the systems quality of
service. The usefulness of a result is zero after its deadline.
Soft The usefulness of a result degrades after its deadline, thereby degrading the
system's Quality of Service.
The goal of a hard real-time system is to ensure that all deadlines are met, but for soft realtime systems the goal can be meeting a certain subset of deadlines to optimize an applicationspecific criteria. The particular subset of deadlines depends on the application, but some
typical examples might include minimizing the lateness of tasks and maximizing the number
of high priority tasks meeting their deadlines. Hard real-time systems are used when it is vital
that an event be reacted to within a strict deadline. Such guarantees are required of systems
for which not reacting in a certain interval of time would cause great loss in some manner,
especially damaging the surroundings physically or threatening human lives, although the
strict definition is that missing the deadline constitutes failure of the system. For example, a
car engine control systems a hard real-time system because any delayed signal might cause
engine failure or damage to the engine. Other examples include medical systems such as
heart pacemakers and industrial process controllers. Hard real-time systems are typically
found interacting at a low level with physical hardware, in embedded systems.
Cortex-R series cores are intended for use in deeply-embedded, real-time systems.
providing

high-performance computing solutions where reliability, high availability, fault


tolerance,
maintainability and real-time responses are required. This requires optimizing the core
for
performance while enabling time-critical code to execute in a timely manner. Cortex-R
series
processors include features to improve the determinism of the core for critical code.

1.2 About Processor

The Cortex-R4 processor is a mid-range processor for use in deeply-embedded, realtime systems.
implements the ARMv7R architecture.
includes Thumb-2 technology for optimum code density and processing throughput.
The pipeline has a single Arithmetic Logic Unit (ALU), but implements limited dual
issuing of instructions for efficient utilization of other resources such as the register
file.
The processor has Tightly-Coupled Memory (TCM) ports for low-latency and
deterministic accesses to local RAM, in addition to caches for higher performance to
general memory.
Error Checking and Correction (ECC) is used on the Cortex-R4 processor ports
and in Level 1(L1) memories to provide improved reliability and address safetycritical applications.

Figure 1 Example Cortex R4 System

Functional Description

Figure 2 Processor Block Diagram


2.1 Data processing unit
The DPU holds most of the program-visible state of the processor, such as general-purpose
registers, status registers and control registers. It decodes and executes instructions, operating
on data held in the registers in accordance with the ARM architecture. Instructions are fed to
the DPU from the PFU through a buffer. The DPU performs instructions that require data to
be transferred to or from the memory system by interfacing to the LSU.

2.2 Load/store unit


The LSU manages all load and store operations, interfacing with the DPU to the TCMs,
caches, and L2 memory interfaces.
2.3 Prefetch unit
The PFU obtains instructions from the instruction cache, the TCMs, or from external memory
and predicts the outcome of branches in the instruction stream. The purpose of the PFU is to:
perform speculative fetch of instructions ahead of the DPU by predicting the outcome of
branch instructions

format instruction data in a way that aids the DPU in efficient execution.
Branch prediction
The branch predictor is a global type that uses history registers and a 256-entry pattern
history table.
Return stack
The PFU includes a 4-entry return stack to accelerate returns from procedure calls. For each
procedure call, the return address is pushed onto a hardware stack. When a procedure return
is recognized, the address held in the return stack is popped, and the prefetch unit uses it as
the predicted return address.
2.4 L1 Memory system
The processor L1 memory system includes the following features:
separate instruction and data caches
flexible TCM interfaces
64-bit datapaths throughout the memory system
MPU that supports configurable memory region sizes
export of memory attributes for L2 memory system
parity or ECC supported on local memories.
Instruction and data caches
You can configure the processor to include separate instruction and data caches. The caches
have the following features:
Support for independent configuration of the instruction and data cache sizes between
4KB and 64KB.
Pseudo-random cache replacement policy.
8-word cache line length. Cache lines can be either write-back or write-through,
determined by MPU region.
Ability to disable each cache independently.
Streaming of sequential data from LDM and LDRD operations, and sequential instruction
fetches.
Critical word first filling of the cache on a cache miss.
Implementation of all the cache RAM blocks and the associated tag and valid RAM
blocks using standard ASIC RAM compilers.
Parity or ECC supported on local memories.
Memory Protection Unit
An optional MPU provides memory attributes for embedded control applications. You can
configure the MPU to have eight or twelve regions, each with a minimum resolution of 32
bytes.
MPU regions can overlap, and the highest numbered region has the highest priority.
The MPU checks for protection and memory attributes, and some of these can be passed to an
external L2 memory system.
TCM interfaces
There are two Tightly-Coupled Memory(TCM) interfaces that permit connection to
configurable blocks of TCM (ATCM and BTCM). These ensure high-speed access to code or
data. As an option, the BTCM can have two memory ports for increased bandwidth.
An ATCM typically holds interrupt or exception code that must be accessed at high speed,
without any potential delay resulting from a cache miss.

A BTCM typically holds a block of data for intensive processing, such as audio or video
processing.
The TCMs are external to the processor. This provides flexibility in optimizing the TCM
subsystem for performance, power, and RAM type. The INITRAMA and INITRAMB pins
enable booting from the ATCM or BTCM, respectively. Both the ATCM and BTCM support
wait states.
Error correction and detection
To increase the tolerance of the system to soft memory faults, you can configure the caches
for either:
parity generation and error correction/detection
ECC code generation, single-bit error correction, and two-bit error detection.
Similarly, you can configure the TCM interfaces for:
parity generation and error detection
ECC code generation, single-bit error correction, and two-bit error detection.
2.5 L2 AXI Interfaces
The L2 AXI interfaces enable the L1 memory system to have access to peripherals and to
external memory using an AXI master and AXI slave port.
AXI master interface
The AXI master interface provides a high bandwidth interface to second level caches, on-chip
RAM, peripherals, and interfaces to external memory. It consists of a single AXI port with a
64-bit read channel and a 64-bit write channel for instruction and data fetches.
The AXI master can run at the same frequency as the processor, or at a lower synchronous
frequency. If asynchronous clocking is required an external asynchronous AXI slice is
required.
AXI slave interface
The AXI slave interface enables AXI masters, including the AXI master port of the processor,
to access data and instruction cache RAMs and TCMs through the AXI system bus. You can
use this for DMA into and out of the TCM RAMs and for software test of the cache RAMs.
The slave interface can run at the same frequency as the processor or at a lower, synchronous
frequency. If asynchronous clocking is required an external asynchronous AXI slice is
required. Bits in the Auxiliary Control Register and Slave Port Control Register can control
access to the AXI slave. Access to the TCM RAMs can be granted to any master, to only
privileged masters, or completely disabled. Access to the cache RAMs can be separately
controlled in a similar way.
2.6 Debug
The processor has a CoreSight compliant Advanced Peripheral Bus version 3 (APBv3) debug
interface. This permits system access to debug resources, for example, the setting of
watchpoints and breakpoints.
The processor provides extensive support for real-time debug and performance profiling
System performance monitoring
This is a group of counters that you can configure to monitor the operation of the processor
and memory system.
ETM interface
The Embedded Trace Macrocell(ETM) interface enables you to connect an external ETM unit
to the processor for real-time code tracing of the core in an embedded system The ETM
interface collects various processor signals and drives these signals from the processor. The

interface is unidirectional and runs at the full speed of the processor. The ETM interface
connects directly to the external ETM unit without any additional glue logic. You can disable
the ETM interface for power saving.
Real-time debug facilities
The processor contains debug logic, that can be used in a CoreSight system to support the
debug
operation. It supports:
up to eight breakpoints
up to eight watchpoints
a Debug Communications Channel (DCC).
Halting debug-mode
On a debug event, such as a breakpoint or watchpoint, the debug logic stops the processor
and forces it into debug state. This enables you to examine the internal state of the processor,
and the external state of the system, independently from other system activity. When the
debugging process completes, the processor and system state are restored, and normal
program execution resumes.
Monitor debug-mode
On a debug event, the processor generates a debug exception instead of entering debug state,
as in halting debug-mode. The exception entry enables a debug monitor program to debug the
processor while enabling critical interrupt service routines to operate on the processor. The
debug monitor program can communicate with the debug host over the DCC or any other
communications interface in the system.
2.7 System Control Coprocessor
The system control coprocessor provides configuration and control of the memory system
and its associated functionality. Other system-level operations, such as cache maintenance
operations, are also managed through the system control coprocessor.
2.8 Interrupt Handling
Interrupt handling in the processor is compatible with previous ARM architectures, but has
several additional features to improve interrupt performance for real-time applications.
VIC port
The core has a dedicated port that enables an external interrupt controller, such as the ARM
PrimeCellVectored Interrupt Controller (VIC), to supply a vector address along with an
Interrupt Request (IRQ) signal. This provides faster interrupt entry, but you can disable it for
compatibility with earlier interrupt controllers.
Low interrupt latency
On receipt of an interrupt, the processor abandons any pending restartable memory
operations.
Restartable memory operations are the multiword transfer instructions LDM, LDRD, STRD,
STM, PUSH,
and POP that can access Normal memory.
To minimize the interrupt latency, ARM recommends that you do not perform:
multiple accesses to areas of memory marked as Device or Strongly-ordered
SWP operations to slow areas of memory.
Exception processing
The ARMv7-R architecture contains exception processing instructions to reduce interrupt
handler entry and exit time:
SRS Save return state to a specified stack frame.
RFE Return from exception using data from the stack.
CPS Change processor state, such as interrupt mask setting and clearing, and mode

changes.
2.9 Power Management
The processor includes several microarchitectural features to reduce energy consumption:
Accurate branch and return prediction, reducing the number of incorrect instruction fetch
and decode operations.
The caches use sequential access information to reduce the number of accesses to the tag
RAMs and to unmatched data RAMs.
Extensive use of gated clocks and gates to disable inputs to unused functional blocks.
Because of this, only the logic actively in use to perform a calculation consumes any
dynamic power.
The processor uses four levels of power management:
Run mode This mode is the normal mode of operation where all of the functionality
of the processor is available.
Dormant mode The processor can be implemented in such a way as to support Dormant
mode. Dormant mode is a power saving mode in which the processor logic, but not the TCM
and cache RAMs, is powered down. The processor state, apart from the cache and TCM state,
is stored to memory before entry into Dormant mode, and restored after exit. For more
information on preparing the Cortex-R4 to support Dormant mode, contact ARM.
Shutdown mode This mode has the entire device powered down. All state, including cache
and TCM state, must be saved externally. After power-up, the assertion of
reset returns the processor to the run state.
Standby mode This mode disables most of the clocks of the device, while keeping the
device powered up. This reduces the power drawn to the static leakage
current and the minimal clock power overhead required to enable the
device to wake up from the Standby mode

Comparison between Cortex R series

Figure 3 Comparison of cortex R series processors

Applications

ARM Partners have developed families of devices using the Cortex-R4 processor with
varying feature sets and levels of performance for products

3G

USB

Figure 4 3g usb stick

automotive microcontrollers

Figure 5 Texas Instruments TMS 570

Infineon medical device platform, MD8710

Figure 6Infineon MD8710

modem

sticks

References
1. ARM Cortex-R Series Programmers Guide
2. Cortex-R4 and Cortex-R4F Technical Reference Manual Revision: r1p4
3. www.arminfocenter.com
4. www.arm.com
5. www.ti.com
6. www.infineon.com
7. www.google.com

You might also like