Professional Documents
Culture Documents
MANAGEMENT
Report on CORTEX-R4
ENT-553 Embedded System and RTOS
Assignment
Shivani Parhad
Roll No 1
Mtech VLSI Design
1st Semester
Contents
1
Introduction........................................................................................................ 3
1.1
Why Cortex-R4?............................................................................................ 3
1.2
About Processor............................................................................................. 4
Functional Description........................................................................................... 5
2.1
2.2
Load/store unit............................................................................................... 5
2.3
Prefetch unit.................................................................................................. 5
2.4
L1 Memory system......................................................................................... 6
2.5
L2 AXI Interfaces........................................................................................... 7
2.6
Debug......................................................................................................... 7
2.7
2.8
Interrupt Handling.......................................................................................... 8
2.9
Power Management......................................................................................... 9
Applications...................................................................................................... 10
References........................................................................................................ 12
Introduction
computer system, that is, one that runs a wide range of general purpose software running
multiple tasks.
The Cortex-R4 processor is a mid-range processor for use in deeply-embedded, realtime systems.
implements the ARMv7R architecture.
includes Thumb-2 technology for optimum code density and processing throughput.
The pipeline has a single Arithmetic Logic Unit (ALU), but implements limited dual
issuing of instructions for efficient utilization of other resources such as the register
file.
The processor has Tightly-Coupled Memory (TCM) ports for low-latency and
deterministic accesses to local RAM, in addition to caches for higher performance to
general memory.
Error Checking and Correction (ECC) is used on the Cortex-R4 processor ports
and in Level 1(L1) memories to provide improved reliability and address safetycritical applications.
Functional Description
format instruction data in a way that aids the DPU in efficient execution.
Branch prediction
The branch predictor is a global type that uses history registers and a 256-entry pattern
history table.
Return stack
The PFU includes a 4-entry return stack to accelerate returns from procedure calls. For each
procedure call, the return address is pushed onto a hardware stack. When a procedure return
is recognized, the address held in the return stack is popped, and the prefetch unit uses it as
the predicted return address.
2.4 L1 Memory system
The processor L1 memory system includes the following features:
separate instruction and data caches
flexible TCM interfaces
64-bit datapaths throughout the memory system
MPU that supports configurable memory region sizes
export of memory attributes for L2 memory system
parity or ECC supported on local memories.
Instruction and data caches
You can configure the processor to include separate instruction and data caches. The caches
have the following features:
Support for independent configuration of the instruction and data cache sizes between
4KB and 64KB.
Pseudo-random cache replacement policy.
8-word cache line length. Cache lines can be either write-back or write-through,
determined by MPU region.
Ability to disable each cache independently.
Streaming of sequential data from LDM and LDRD operations, and sequential instruction
fetches.
Critical word first filling of the cache on a cache miss.
Implementation of all the cache RAM blocks and the associated tag and valid RAM
blocks using standard ASIC RAM compilers.
Parity or ECC supported on local memories.
Memory Protection Unit
An optional MPU provides memory attributes for embedded control applications. You can
configure the MPU to have eight or twelve regions, each with a minimum resolution of 32
bytes.
MPU regions can overlap, and the highest numbered region has the highest priority.
The MPU checks for protection and memory attributes, and some of these can be passed to an
external L2 memory system.
TCM interfaces
There are two Tightly-Coupled Memory(TCM) interfaces that permit connection to
configurable blocks of TCM (ATCM and BTCM). These ensure high-speed access to code or
data. As an option, the BTCM can have two memory ports for increased bandwidth.
An ATCM typically holds interrupt or exception code that must be accessed at high speed,
without any potential delay resulting from a cache miss.
A BTCM typically holds a block of data for intensive processing, such as audio or video
processing.
The TCMs are external to the processor. This provides flexibility in optimizing the TCM
subsystem for performance, power, and RAM type. The INITRAMA and INITRAMB pins
enable booting from the ATCM or BTCM, respectively. Both the ATCM and BTCM support
wait states.
Error correction and detection
To increase the tolerance of the system to soft memory faults, you can configure the caches
for either:
parity generation and error correction/detection
ECC code generation, single-bit error correction, and two-bit error detection.
Similarly, you can configure the TCM interfaces for:
parity generation and error detection
ECC code generation, single-bit error correction, and two-bit error detection.
2.5 L2 AXI Interfaces
The L2 AXI interfaces enable the L1 memory system to have access to peripherals and to
external memory using an AXI master and AXI slave port.
AXI master interface
The AXI master interface provides a high bandwidth interface to second level caches, on-chip
RAM, peripherals, and interfaces to external memory. It consists of a single AXI port with a
64-bit read channel and a 64-bit write channel for instruction and data fetches.
The AXI master can run at the same frequency as the processor, or at a lower synchronous
frequency. If asynchronous clocking is required an external asynchronous AXI slice is
required.
AXI slave interface
The AXI slave interface enables AXI masters, including the AXI master port of the processor,
to access data and instruction cache RAMs and TCMs through the AXI system bus. You can
use this for DMA into and out of the TCM RAMs and for software test of the cache RAMs.
The slave interface can run at the same frequency as the processor or at a lower, synchronous
frequency. If asynchronous clocking is required an external asynchronous AXI slice is
required. Bits in the Auxiliary Control Register and Slave Port Control Register can control
access to the AXI slave. Access to the TCM RAMs can be granted to any master, to only
privileged masters, or completely disabled. Access to the cache RAMs can be separately
controlled in a similar way.
2.6 Debug
The processor has a CoreSight compliant Advanced Peripheral Bus version 3 (APBv3) debug
interface. This permits system access to debug resources, for example, the setting of
watchpoints and breakpoints.
The processor provides extensive support for real-time debug and performance profiling
System performance monitoring
This is a group of counters that you can configure to monitor the operation of the processor
and memory system.
ETM interface
The Embedded Trace Macrocell(ETM) interface enables you to connect an external ETM unit
to the processor for real-time code tracing of the core in an embedded system The ETM
interface collects various processor signals and drives these signals from the processor. The
interface is unidirectional and runs at the full speed of the processor. The ETM interface
connects directly to the external ETM unit without any additional glue logic. You can disable
the ETM interface for power saving.
Real-time debug facilities
The processor contains debug logic, that can be used in a CoreSight system to support the
debug
operation. It supports:
up to eight breakpoints
up to eight watchpoints
a Debug Communications Channel (DCC).
Halting debug-mode
On a debug event, such as a breakpoint or watchpoint, the debug logic stops the processor
and forces it into debug state. This enables you to examine the internal state of the processor,
and the external state of the system, independently from other system activity. When the
debugging process completes, the processor and system state are restored, and normal
program execution resumes.
Monitor debug-mode
On a debug event, the processor generates a debug exception instead of entering debug state,
as in halting debug-mode. The exception entry enables a debug monitor program to debug the
processor while enabling critical interrupt service routines to operate on the processor. The
debug monitor program can communicate with the debug host over the DCC or any other
communications interface in the system.
2.7 System Control Coprocessor
The system control coprocessor provides configuration and control of the memory system
and its associated functionality. Other system-level operations, such as cache maintenance
operations, are also managed through the system control coprocessor.
2.8 Interrupt Handling
Interrupt handling in the processor is compatible with previous ARM architectures, but has
several additional features to improve interrupt performance for real-time applications.
VIC port
The core has a dedicated port that enables an external interrupt controller, such as the ARM
PrimeCellVectored Interrupt Controller (VIC), to supply a vector address along with an
Interrupt Request (IRQ) signal. This provides faster interrupt entry, but you can disable it for
compatibility with earlier interrupt controllers.
Low interrupt latency
On receipt of an interrupt, the processor abandons any pending restartable memory
operations.
Restartable memory operations are the multiword transfer instructions LDM, LDRD, STRD,
STM, PUSH,
and POP that can access Normal memory.
To minimize the interrupt latency, ARM recommends that you do not perform:
multiple accesses to areas of memory marked as Device or Strongly-ordered
SWP operations to slow areas of memory.
Exception processing
The ARMv7-R architecture contains exception processing instructions to reduce interrupt
handler entry and exit time:
SRS Save return state to a specified stack frame.
RFE Return from exception using data from the stack.
CPS Change processor state, such as interrupt mask setting and clearing, and mode
changes.
2.9 Power Management
The processor includes several microarchitectural features to reduce energy consumption:
Accurate branch and return prediction, reducing the number of incorrect instruction fetch
and decode operations.
The caches use sequential access information to reduce the number of accesses to the tag
RAMs and to unmatched data RAMs.
Extensive use of gated clocks and gates to disable inputs to unused functional blocks.
Because of this, only the logic actively in use to perform a calculation consumes any
dynamic power.
The processor uses four levels of power management:
Run mode This mode is the normal mode of operation where all of the functionality
of the processor is available.
Dormant mode The processor can be implemented in such a way as to support Dormant
mode. Dormant mode is a power saving mode in which the processor logic, but not the TCM
and cache RAMs, is powered down. The processor state, apart from the cache and TCM state,
is stored to memory before entry into Dormant mode, and restored after exit. For more
information on preparing the Cortex-R4 to support Dormant mode, contact ARM.
Shutdown mode This mode has the entire device powered down. All state, including cache
and TCM state, must be saved externally. After power-up, the assertion of
reset returns the processor to the run state.
Standby mode This mode disables most of the clocks of the device, while keeping the
device powered up. This reduces the power drawn to the static leakage
current and the minimal clock power overhead required to enable the
device to wake up from the Standby mode
Applications
ARM Partners have developed families of devices using the Cortex-R4 processor with
varying feature sets and levels of performance for products
3G
USB
automotive microcontrollers
modem
sticks
References
1. ARM Cortex-R Series Programmers Guide
2. Cortex-R4 and Cortex-R4F Technical Reference Manual Revision: r1p4
3. www.arminfocenter.com
4. www.arm.com
5. www.ti.com
6. www.infineon.com
7. www.google.com