Embedded System Design Metrics

Embedded System Design
06EC82
Sub: Embedded System Design Sem: VIII PART A
Sub code: 06EC82
UNIT- 1 INTRODUCTION: Overview of embedded systems, embedded system design challenges, common design metrics and optimizing them. Survey of different embedded system design technologies, trade-offs. Custom Single-Purpose Processors, Design of custom single purpose processor 4 Hours UNIT 2 SINGLE-PURPOSE PROCESSORS: Hardware, Combinational Logic, Sequential Logic, RT level Combinational and Sequential Components, Optimizing single-purpose processors. SinglePurpose Processors: Software, Basic Architecture, Operation, Programmers View, Development Environment, ASIPS. 6 Hours UNIT 3 Standard Single-Purpose Peripherals, Timers, Counters, UART, PWM, LCD Controllers, Keypad controllers, Stepper Motor Controller, A to D Converters, Examples. 6 Hours UNIT 4 MEMORY: Introduction, Common memory Types, Compulsory memory, Memory Hierarchy and Cache, Advanced RAM. Interfacing, Communication Basics, Microprocessor Interfacing, Arbitration, Advanced Communication Principles, Protocols - Serial, Parallel 8 Hours PART - B UNIT - 5 INTERRUPTS: Basics - Shared Data Problem - Interrupt latency. Survey of Software Architecture, Round Robin, Round Robin with Interrupts - Function Queues - scheduling RTOS architecture. 8 Hours UNIT 6 INTRODUCTION TO RTOS: MORE OS SERVICES: Tasks - states - Data - Semaphores and shared data. More operating systems services - Massage Queues - Mail Boxes -Timers Events - Memory Management. 8 Hours UNIT 7 & 8 Basic Design Using RTOS:Principles- An example, Encapsulating semaphores and Queues. Hard real-time scheduling considerations Saving Memory space and power. Hardware software co-design aspects in embedded systems. 12 Hours
ECE, SJBIT
06EC82
INDEX SHEET
SL.NO TOPIC PAGE NO. 8 to 32 8 to 9 9 to 12 13 to 15
UNIT - 1 INTRODUCTION: Overview of embedded systems 01 02 Embedded systems overview Design challenges, common design metrics Processor technology 03 IC technology 04 Design Technology 05 Tradeoffs 06
16 to 19
19 to 20
21 to 22
23 to 31 07 UNIT - 2 CUSTOM SINGLE-PURPOSE PROCESSORS HARDWARE: 32 to 36 01 Introduction, combinational logic Sequential logic 02 03 Custom single purpose processor design 39 to 40 36 to 38 Recommended questions and solutions 33 to 72
ECE, SJBIT
06EC82
RT level processor design 04 Optimizing custom processors 05 SOFTWARE:
40 to 42
42 to 44
45 to 49 06 Basic architecture operation 07 51 to 55 08 Programmers view 55 to 57 09 Development environment 57 to 60 50 to 51
10
ASIPs
61 to 71 11 Recommended questions and solutions 73 to 88
UNIT - 3 Standard Single Purpose Processors : Peripherals Introduction, timers, counters watchdog timers 01 UART,PWM, 02 LCD controllers ,Stepper Motor controllers 03
73 to 74
75 to 76
77 to 79
ECE, SJBIT
06EC82
Analog to Digital converters ,RTC 04
80 to 81
82 to 87 05 Recommended questions and solutions 89 to 153
UNIT - 4 Memory and Microprocessor interfacing Intro, Memory write ability 01 Common memory types 02 Composing memory 03 Memory hierarchy and cache 04 Advanced RAM 05 Communication basics 06 Microprocessor interfacing 07 Arbitration 08 Multilevel Bus architectures 09
89 to 91
92 to 98
98 to 99
99 to 105
105 to 108
109 to 113
114 to 121
122 to 125
125 to 126
ECE, SJBIT
06EC82
Advanced communication principles 10 Recommended questions and solutions 11 UNIT - 5 INTERRUPTS and Survey of software architecture Shared Date problem 01 Round robin 02 Function queues 03 RTOS architecture 04 05 Recommended questions and solutions
126 to 132
133 to 152
154 to 174
154 to 157
157 to 161
161 to 162
162 to 166 167 to 174 175 to 230
UNIT - 6 INTRODUCTION TO RTOS , MORE ON OS SERVICES Tasks , states data 01 Semaphores 02 Messages queues, mail boxes 03
175 to 183
184 to 195
195 to 209
209 to 219 04
ECE, SJBIT
Events , memory management
06EC82
220 to 228 05 Recommended questions and solutions
UNIT 7 & 8 BASIC DESIGN USING RTOS Principles 01 Encapsulating semaphores 02 Hard real time scheduling considerations 03 Saving memory and power 04
231 to 270
230 to 234
234 to 258
258 to 258
258 to 260
261 to 269 05 Recommended questions and solutions
ECE, SJBIT
06EC82
PART A UNIT- 1
INTRODUCTION: Overview of embedded systems, embedded system design challenges, common design metrics and optimizing them. Survey of different embedded system design technologies, trade-offs. Custom Single-Purpose Processors, Design of custom single purpose processors.
4 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002 2. An Embedded software Primer - David E. Simon: Pearson Education, 1999
REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
3.
ECE, SJBIT
06EC82
EMBEDED SYSTEM DESIGN

UNIT 1 INTRODUCTION
1.1 Embedded systems overview 1.2 Design Challenges 1.3 Processor Technology 1.4 IC Technology 1.5 Design Technology
1.1.
Embedded systems overview
An embedded system is nearly any computing system other than a desktop computer. An embedded system is a dedicated system which performs the desired function upon power up, repeatedly. Embedded systems are found in a variety of common electronic devices such as consumer electronics ex. Cell phones, pagers, digital cameras, VCD players, portable Video games, calculators, etc., Embedded systems are found in a variety of common electronic devices, such as: (a)consumer electronics -- cell phones, pagers, digital cameras, camcorders, videocassette recorders, portable video games, calculators, and personal digital assistants; (b) home appliances -- microwave ovens, answering machines, thermostat, home security, washing machines, and lighting systems; (c) office automation -- fax machines, copiers, printers, and scanners; (d) business equipment -- cash registers, curbside check-in, alarm systems, card readers, product scanners, and automated teller machines; (e) automobiles --transmission control, cruise control, fuel injection, anti-lock brakes, and active suspension. Common characteristics of Embedded systems : Embedded systems have several common characteristics that distinguish such system from other computing systems; 1. Single functioned :An Embedded system executes a single program repeatedly. The entire program is executed in a loop over and over again. 2. Tightly coupled (constrained):It should cost less, perform fast enough to process data in real time, must fit in a single chip, consume as much less power as possible, etc. 3. Reactive and real time: Embedded Systems should continuously react to changes in the environment. It should also process and compute data in real time without delay.
ECE, SJBIT
06EC82
Fig 1.1 An embedded system example a digital camera
1.2 Design challenge

Design matrics: A Design metric is a measure of implementations features such as cost, size, performance and power. Embedded system - must cost less - must be sized to fit on a single chip. - must perform in real time (response time) - must consume minimum power The embedded system designer must be designed to meet the desired functionality. Apart meeting the functionality, should also consider optimizing numerous design metrics.
common design metrics that a design engineer should consider :

NRE( non recurring engineering Cost) : The one time monetary cost of designing the system. Unit cost: Monetary cost of manufacturing each copy of the system, excluding NRE cost.
ECE, SJBIT
06EC82
Size: physical space required by the system. Often measured in terms of bytes in case of software, and no. of gates in terms of hardware. Performance: execution/response time of the system. Power: The amount of power consumed by the system, which may define lifetime of the battery and cooling requirement of IC. More power means more heat. Flexibility: ability to change the functionality of the system. Time to prototype: time needed to build a working system w/o incurring heavy NRE. Time to market: time required to develop & released to the market. Maintainability: ability to modify the system after its release to the market. Correctness: our confidence that we have implemented systems functionality correctly. Safety: probability that the system does not cause any harm.
Metrics typically compete with one another: improving one often leads to worsening of another
Fig : 1.2 Design metric competition
1.2.1 Time to Market Design Metric :
The time to market: Introducing an embedded system early to the market can make big difference in terms of systems profitability. Market windows generally will be very
ECE, SJBIT
10
06EC82
narrow, often in the order of few months. Missing this window can mean significant loss in sales.
Fig 1.3 Time to Market (A) Market window (B) simplified revenue model for computing revenue loss
Lets investigate the loss of revenue that can occur due to delayed entry of a product in the market. We can use a simple triangle model y axis is the market rise, x axis to represent the point of entry to the market. The revenue for an on time market entry is the area of the triangle labeled on time and the revenue for a delayed entry product is the area of the triangle labeled Delayed. The revenue loss for a delayed entry is the difference of these triangles areas. % revenue loss = ((on time Delayed)/on time)*100 % The area of on time triangle = * base * height W -- height the market raise D -- Delayed entry ( in terms of weeks or months ) 2W products life time Area of on time triangle = *2W*W Area of delayed triangle=1/2*(W-D+W)*(W-D) %age revenue loss = (D (3W- D)/2W*W) * 100 % Ex: products life time is 52 weeks Delay of entry to the market is 4 weeks Percentage revenue loss = 22%
ECE, SJBIT
11
06EC82
1.2.2 The NRE and Unit cost Design metrics:
Unlike other design metric the best technology choice will depend on the no of units. Tech. A B C would result in NRE cost $2000 $30000 100000 unit cost $100 $30 $2
Total cost= NRE cost + unit cost* no of units Per product cost = total cost/no of units = NRE cost/no of units + unit cost 1.2.3 The performance Design metric: Performance of a system is a measure of how long the system takes to execute our desired tasks. There are several measures of performance. The two main measures are Latency or response time Throughput : no of tasks that are processed in unit
speed up is a method of comparing performance of two systems Speed up of A over B = performance of A/performance of B.
Technologies used in embedded systems:

Technology is a manner of accomplishing a task. There are three types of technologies are central to embedded system design: Processor technologies IC technologies Design technologies Processor technology: relates to architecture of the computation engine use to implement a systems desired functionalities. Generally the term processor is associated with programmable software processors. But many non programmable digital systems can be thought of as processors. Single purpose processors: is a digital system designed to execute exactly only one function. Performance may be good, flexibility may be poor.
ECE, SJBIT
12
06EC82
Application specific processor: may serve as a compromise between single purpose and general purpose. An ASIP is a programmable processor optimized for particular class of applications having common characteristics, such as embedded control, digital signal processing, or telecommunications. This provides flexibility, while achieving good performance, low power and size. General purpose processors: The designer of a general purpose or microprocessor, builds a programmable device that is suitable for a variety to maximize the sale. Design considerations Should accommodate different kinds of program Should provide general data path to handle variety of computations
Design technology: design technology involves converting our concepts of desired functionalities into an implementation. Design implementations should optimize design metrics and should also realize faster. Variations of top down design process have become popular
1.3.1 Processor Technologies:

1. General Purpose Processors Software 2. Single Purpose Processors Hardware 3.Application Specific Processors: Application specific Instruction set processors (ASIP)
1. General Purpose Processors Software

They are programmable devices used in a variety of applications. They are also known as microprocessors. They have a program memory and a general data path with a large register file and general ALU. The data path must be large enough to handle a variety of computations. The programmer writes the program to carry out the required functionality in the program memory and uses the features (instructions) provided by the general data path. This is called as the software portion of the system. The benefits of such a processor are very high. They require Low time-to-market and have low NRE costs. They provide a high flexibility. Design time and NRE cost are low, because the designer must only write a program, but need not do any digital design. Flexibility is high, because changing functionality requires only changing the program. Unit cost may be relatively low in small quantities, since the processor manufacturer sells large quantities to other customers and hence distributes the NRE cost over
ECE, SJBIT
13
06EC82
many units. Performance may be fast for computation-intensive applications, if using a fast processor, due to advanced architecture features and leading edge IC technology. some design-metric drawbacks : Unit cost may be too high for large quantities. Performance may be slow for certain applications. Size and power may be large due to unnecessary processor hardware. Figure 1.4(d) illustrates the use of a single-purpose processor in our embedded system example, representing an exact fit of the desired functionality, nothing more, nothing less.
Fig : 1.4 Processors vary in their customization for the problem at hand: (a) desired functionality, (b) generalpurpose processor, (b) application-specific processor, (c)single-purpose processor.
ECE, SJBIT
14
06EC82
Fig 1.5 Implementing desired functionality on different General purpose processor
2. Single Purpose Processors Hardware:

This is a digital circuit designed to execute exactly one program. Its features are, it contains only the components needed to execute a single program; it contains no program memory. User cannot change the functionality of the chip. They are fast, low powered and small sized. An embedded system designer creates a single-purpose processor by designing a custom digital circuit. Using a single-purpose processor in an embedded system results in several design metric benefits and drawbacks, which are essentially the inverse of those for general purpose processors. Performance may be fast, size and power may be small, and unit-cost may be low for large quantities, while design time and NRE costs may be high, flexibility is low, unit cost may be high for small quantities, and performance may not match general-purpose processors for some applications.
ECE, SJBIT
15
06EC82
Fig 1.6 Implementing desired functionality on different single purpose processor
3.Application Specific Processors: Application specific Instruction set processors (ASIP): They are programmable processors optimized for a particular class of applications having common characteristics. They strike a compromise between general-purpose and single-purpose processors. They have a program memory, an optimized data path and special functional units. They have good performance, some flexibility, size and power. An application-specific instruction-set processor (or ASIP) can serve as a compromise between the above processor options. An ASIP is designed for a particular class of applications with common characteristics, such as digital-signal processing, telecommunications, embedded control, etc. The designer of such a processor can optimize the datapath for the application class, perhaps adding special functional units for common operations, and eliminating other infrequently used units.
ECE, SJBIT
16
06EC82
Fig 1.7 Implementing desired functionality on different Application Specific processor
Digital-signal processors (DSPs) are a common class of ASIP, so demand special mention. A DSP is a processor designed to perform common operations on digital signals, which are the digital encodings of analog signals like video and audio. These operations carry out common signal processing tasks like signal filtering, transformation,or combination. Such operations are usually math-intensive, including operations like multiply and add or shift and add. To support such operations, a DSP may have special purpose datapath components such a multiplyaccumulate unit, which can perform a computation like T = T + M[i]*k using only one instruction. Because DSP programs often manipulate large arrays of data, a DSP may also include special hardware to fetch sequential data memory locations in parallel with other operations, to further speed execution. Highlight merits and demerits of single purpose processors and general-purpose processors. Single Purpose Processors: Merits: 1. They are fast 2. They consume low power 3. They have small size 4. Unit cost may be low for large quantities
ECE, SJBIT
17
06EC82
Demerits: 1. NRE costs may be high 2. Low flexibility 3. Unit cost high for small quantities
4. Performance may not match for some applications
General Purpose Processors:
Merits: 1. High Flexibility 2. Low NRE costs 3. Low time to market 4. Performance may be for fast and high-intensive computations. De-Merits: 1. Unit cost may be relatively high for large quantities. 2. Performance may be slower for certain applications. 3. Size and power may be large due to unnecessary processor hardware.
How a single purpose processor is distinctly different from a general-purpose processor? Sl.No . 1. 2. Sl.No . 3. 4. Single Purpose Processor Executes exactly one program. The functionality cannot be changed. Single Purpose Processor They do not have program memory General Purpose Processor Executes any program written by the user. The functionality can be changed by the user by writing the required program. General Purpose Processor They have program memory
Do not have any flexibility and contain Has a very large amount of resource which resources required only for that particular may or may not be used for a particular functionality functionality as decided by the user
ECE, SJBIT
18
06EC82
5.
Merits include : They are fast, they consume low power, they have small size and the unit cost may be low for large quantities
Merits include : They have high Flexibility, Low NRE costs, Low time to market, Performance may be for fast and highintensive computations.
1.4 IC Technology
Every processor must eventually be implemented on an IC. IC technology involves the manner in which we map a digital (gate-level) implementation onto an IC. An IC (Integrated Circuit), often called a chip, is a semiconductor device consisting of a set of connected transistors and other devices. A number of different processes exist to build semiconductors, the most popular of which is CMOS (Complementary Metal Oxide Semiconductor). The IC technologies differ by how customized the IC is for a particular implementation. IC technology is independent from processor technology; any type of processor can be mapped to any type of IC technology.
Fig : 1. 8 The independence of processor and IC technologies: any processor technology can be mapped to any IC technology.
To understand the differences among IC technologies, we must first recognize that semiconductors consist of numerous layers. The bottom layers form the transistors. The middle layers form logic gates. The top layers connect these gates with wires. One way to create these layers is by depositing photo-sensitive chemicals on the chip surface and then shining light through masks to change regions of the chemicals. Thus, the task of building the layers is actually one of designing appropriate masks. A set of masks is often called a layout. The narrowest line that we can create on a chip is called the feature size, which today is well below one micrometer (sub-micron).
1.4.1 Full-custom/VLSI
In a full-custom IC technology, we optimize all layers for our particular embedded systems digital implementation. Such optimization includes placing the transistors to minimize interconnection lengths, sizing the transistors to optimize signal transmissions and routing wires among the transistors. Once we complete all the masks, we send the mask specifications to a
ECE, SJBIT
19
06EC82
fabrication plant that builds the actual ICs. Full-custom IC design, often referred to as VLSI (Very Large Scale Integration) design, has very high NRE cost and long turnaround times (typically months) before the IC becomes available, but can yield excellent performance with small size and power. It is usually used only in high-volume or extremely performance-critical applications.
1.4.2 Semi-custom ASIC (gate array and standard cell)

In an ASIC (Application-Specific IC) technology, the lower layers are fully or partially built, leaving us to finish the upper layers. In a gate array technology, the masks for the transistor and gate levels are already built (i.e., the IC already consists of arrays of gates). The remaining task is to connect these gates to achieve our particular implementation. In a standard cell technology, logic-level cells (such as an AND gate or an AND-OR-INVERT combination) have their mask portions pre-designed, usually by hand. Thus, the remaining task is to arrange these portions into complete masks for the gate level, and then to connect the cells. ASICs are by far the most popular IC technology, as they provide for good performance and size, with much less NRE cost than full-custom ICs.
1.4.3 PLD
In a PLD (Programmable Logic Device) technology, all layers already exist, so we can purchase the actual IC. The layers implement a programmable circuit, where programming has a lowerlevel meaning than a software program. The programming that takes place may consist of creating or destroying connections between wires that connect gates, either by blowing a fuse, or setting a bit in a programmable switch. Small devices, called programmers, connected to a desktop computer can typically perform such programming. We can divide PLD's into two types, simple and complex. One type of simple PLD is a PLA (Programmable Logic Array), which consists of a programmable array of AND gates and a programmable array of OR gates. Another type is a PAL (Programmable Array Logic), which uses just one programmable array to reduce the number of expensive programmable components. One type of complex PLD, growing very rapidly in popularity over the past decade, is the FPGA (Field Programmable Gate Array), which offers more general connectivity among blocks of logic, rather than just arrays of logic as with PLAs and PALs, and are thus able to implement far more complex designs. PLDs offer very low NRE cost and almost instant IC availability. However, they are typically bigger than ASICs, may have higher unit cost, may consume more power, and may be slower (especially FPGAs). They still provide reasonable performance, though, so are especially well suited to rapid prototyping.
1.5 DESIGN TECHNOLOGY:

Design technology involves the manner in which we convert our concept of desired system functionality into an implementation. We must not only design the implementation to optimise design metrics, but we must do so quickly.
ECE, SJBIT
20
06EC82
Variations of a top-down design process have become popular in the past decade, an ideal form of which is illustrated in the figure. The designer refines the system through several abstraction levels. At the system level the designer describes the desired functionality in an executable language like C. This is called system specification. The designer refines this specification by distributing portions of it among several general and/or single purpose processors, yielding behavioural specifications for each processor. The designer refines these specifications into register-transfer (RT) specifications by converting behaviour on general-purpose processors to assembly code, and by converting behaviour on single purpose processors to a connection of register-transfer components and state machines. The designer then refines the RT level specification into a logic specification. Finally, the designer refines the remaining specifications into an implementation consisting of machine code for general purpose processors and a design gate level net list for single purpose processors.
Fig 1.9 : Deal top-down design process, and productivity improvers.
There are three main approaches to improving the design process for increased productivity, which we label as compilation/synthesis, libraries/IP, and test/verification. Several other approaches also exist.
ECE, SJBIT
21
06EC82
1.5.1 Compilation/Synthesis Compilation/Synthesis lets a designer specify desired functionality in an abstract manner, and automatically generates lower-level implementation details. Describing a system at high abstraction levels can improve productivity by reducing the amount of details, often by an order of magnitude, that a design must specify. A logic synthesis tool converts Boolean expressions into a connection of logic gates (called a netlist). A register-transfer (RT) synthesis tool converts finite-state machines and registertransfers into a datapath of RT components and a controller of Boolean equations. A behavioral synthesis tool converts a sequential program into finite-state machines and register transfers. Likewise, a software compiler converts a sequential program to assembly code, which is essentially register-transfer code. Finally, a system synthesis tool converts an abstract system specification into a set of sequential programs on general and single-purpose processors. The relatively recent maturation of RT and behavioral synthesis tools has enabled a unified view of the design process for single-purpose and general-purpose processors. Design for the former is commonly known as hardware design, and design for the latter as software design. In the past, the design processes were radically different software designers wrote sequential programs, while hardware designers connected components.
Fig 1.10 The co-design ladder: recent maturation of synthesis enables a unified view of hardware and software.
ECE, SJBIT
22
06EC82
1.5.2 Libraries/IP
Libraries involve re-use of pre-existing implementations. Using libraries of existing implementations can improve productivity if the time it takes to find, acquire, integrate and test a library item is less than that of designing the item oneself. A logic-level library may consist of layouts for gates and cells. An RT-level library may consist of layouts for RT components, like registers, multiplexors, decoders, and functional units. A behavioral-level library may consist of commonly used components, such as compression components, bus interfaces, display controllers, and even general purpose processors. The advent of system-level integration has caused a great change in this level of library. 1.5.3 Test/Verification Test/Verification involves ensuring that functionality is correct. Such assurance can prevent timeconsuming debugging at low abstraction levels and iterating back to high abstraction levels. Simulation is the most common method of testing for correct functionality, although more formal verification techniques are growing in popularity. At the logic level, gate level simulators provide output signal timing waveforms given input signal waveforms. Likewise, general-purpose processor simulators execute machine code. At the RT-level, hardware description language (HDL) simulators execute RT-level descriptions and provide output waveforms given input waveforms. At the behavioral level, HDL simulators simulate sequential programs, and co-simulators connect HDL and general purpose processor simulators to enable hardware/software co-verification. At the system level, a model simulator simulates the initial system specification using an abstract computation model, independent of any processor technology, to verify correctness and completeness of the specification.
1.5.4 More productivity improvers

There are numerous additional approaches to improving designer productivity. Standards focus on developing well-defined methods for specification, synthesis and libraries. Such standards can reduce the problems that arise when a designer uses multiple tools, or retrieves or provides design information from or to other designers. Common standards include language standards, synthesis standards and library standards. Languages focus on capturing desired functionality with minimum designer effort. For example, the sequential programming language of C is giving way to the object oriented language of C++, which in turn has given some ground to Java. As another example, state-machine languages permit direct capture of functionality as a set of states and transitions, which can then be translated to other languages like C. Frameworks provide a software environment for the application of numerous tools throughout the design process and management of versions of implementations. For example, a framework might generate the UNIX directories needed for various simulators and synthesis tools, supporting application of those tools through menu selections in a single graphical user interface.
ECE, SJBIT
23
06EC82
RECOMMENDED QUESTIONS UNIT 1
Overview of embedded systems

1. What is an embedded system? Why is it so hard to define ES? 2. List and define the three main characteristics of embedded system that distinguish such systems from other computing systems. 3. What is design metric? 4. List a pair of design metrics that may compete with one another providing an intuitive explanation of the reason behind it. 5. What is market window and why is it so important to reach the market early in this window? 6. What is NRE cost? 7. List and define the three main processor technologies. What are the benefits of using different processor technologies. 8. List the main IC technologies and list out the benefits. 9. List the three main design technologies and how is it helpful to designers. 10. Provide a definition of Moores law. 11. Compute annual growth rate of IC capacity and designer productivity. 12. What is design gap? 13. What I renaissance engineer and why is it so important in current market? 14. Define what is meant by mythical man month.
ECE, SJBIT
24
06EC82
QUESTION PAPER SOLUTION UNIT 1

Q1.Highlight the merits and demerits of single purpose processors and generalpurpose processors.
Single Purpose Processors: Merits: 5. 6. 7. 8. They are fast They consume low power They have small size Unit cost may be low for large quantities Demerits: 5. 6. 7. 8. NRE costs may be high Low flexibility Unit cost high for small quantities Performance may not match for some applications
General Purpose Processors: Merits: 5. 6. 7. 8. High Flexibility Low NRE costs Low time to market Performance may be for fast and high-intensive computations. De-Merits: 4. 5. Unit cost may be relatively high for large quantities. Performance may be slower for certain applications.
ECE, SJBIT
25

6. Size and power may be large due to unnecessary processor hardware.
06EC82
Q2.How a single purpose processor is distinctly different from a generalpurpose processor?

Sl.No. 1. 2. Sl.No. 3. 4. Single Purpose Processor Executes exactly one program. The functionality cannot be changed. Single Purpose Processor They do not have program memory Do not have any flexibility and contain resources required only for that particular functionality Merits include : They are fast, they consume low power, they have small size and the unit cost may be low for large quantities General Purpose Processor Executes any program written by the user. The functionality can be changed by the user by writing the required program. General Purpose Processor They have program memory Has a very large amount of resource which may or may not be used for a particular functionality as decided by the user Merits include : They have high Flexibility, Low NRE costs, Low time to market, Performance may be for fast and high-intensive computations.
5.
Q3. Explain the three Processor Technologies Briefly

1. General Purpose Processors Software: They are programmable devices used in a variety of applications. They are also known as microprocessors. They have a program memory and a general data path with a large register file and a general ALU. The data path must be large enough to handle a variety of computations. The programmer writes the program to carry out the required functionality in the program memory and uses the features (instructions) provided by the general data path. This is called as the software portion of the system. The benefits of such a processor are very high. They require Low time-to-market and have low NRE costs. They provide a high flexibility. 2. Single Purpose Processors Hardware: This is a digital circuit designed to execute exactly one program. Its features are, it contains only the components needed to execute a single program; it contains no program memory. User cannot change the functionality of the chip. They are fast, low powered and small sized. 3. Application Specific Processors: Application specific Instruction set processors (ASIP)
ECE, SJBIT
26
06EC82
They are programmable processors optimized for a particular class of applications having common characteristics. They strike a compromise between general-purpose and single-purpose processors. They have a program memory, an optimized data path and special functional units. They have good performance, some flexibility, size and power.
4. What are the common design metrics that a design engineer should consider?
NRE( non recurring engineering Cost) : The one time monetary cost of designing the system. Unit cost: Monetary cost of manufacturing each copy of the system, excluding NRE cost. Size: physical space required by the system. Often measured in terms of bytes in case of software, and no. of gates in terms of hardware. Performance: execution/response time of the system. Power: The amount of power consumed by the system, which may define lifetime of the battery and cooling requirement of IC. More power means more heat. Flexibility: ability to change the functionality of the system. Time to prototype: time needed to build a working system w/o incurring heavy NRE. Time to market: time required to develop & released to the market. Maintainability: ability to modify the system after its release to the market. Correctness: our confidence that we have implemented systems functionality correctly. Safety: probability that the system does not cause any harm.
Metrics typically compete with one another: improving one often leads to worsening of another
Q5. Write short notes on IC technology

Every processor must eventually be implemented on an IC. IC technology involves the manner in which we map a digital (gate-level) implementation onto an IC. An IC (Integrated Circuit), often called a chip, is a semiconductor device consisting of a set of connected transistors and other devices. A number of different processes exist to build semiconductors, the most popular of which is CMOS (Complementary Metal Oxide Semiconductor). The IC technologies differ by how customized the IC is for a particular implementation. IC technology is independent from processor technology; any type of processor can be mapped to any type of IC technology.
ECE, SJBIT
27
06EC82
The independence of processor and IC technologies: any processor technology can be mapped to any IC technology. To understand the differences among IC technologies, we must first recognize that semiconductors consist of numerous layers. The bottom layers form the transistors. The middle layers form logic gates. The top layers connect these gates with wires. One way to create these layers is by depositing photosensitive chemicals on the chip surface and then shining light through masks to change regions of the chemicals. Thus, the task of building the layers is actually one of designing appropriate masks. A set of masks is often called a layout. The narrowest line that we can create on a chip is called the feature size, which today is well below one micrometer (sub-micron). For each IC technology, all layers must eventually be built to get a working IC; the question is who builds each layer and when.
Q6. Derive the equation for percentage loss for any market rise . A product was delayed by 4 weeks in releasing to market. The peak revenue for on time entry to market would occur after 20 weeks for a market rise angle by 45. Find the percentage revenue loss.
Ans : Lets investigate the loss of revenue that can occur due to delayed entry of a product in the market. We can use a simple triangle model y axis is the market rise, x axis to represent the point of entry to the market. The revenue for an on time market entry is the area of the triangle labeled on time and the revenue for a delayed entry product is the area of the triangle labeled Delayed. The revenue loss for a delayed entry is the difference of these triangles areas. % revenue loss = ((on time Delayed)/on time)*100 % The area of on time triangle = * base * height W -- height the market raise
D -- Delayed entry ( in terms of weeks or months ) 2W products life time ECE, SJBIT
28
Embedded System Design Area of on time triangle = *2W*W Area of delayed triangle=1/2*(W-D+W)*(W-D) %age revenue loss = (D (3W- D)/2W*W) * 100 % Ex: products life time is 52 weeks Delay of entry to the market is 4 weeks Percentage revenue loss = 22%
06EC82
Q7. Compare GPP,SPP and ASSP along with their block diagrams .
1. General Purpose Processors Software
They are programmable devices used in a variety of applications. They are also known as microprocessors. They have a program memory and a general data path with a large register file and general ALU. The data path must be large enough to handle a variety of computations. The programmer writes the program to carry out the required functionality in the program memory and uses the features (instructions) provided by the general data path. This is called as the software portion of the system. The benefits of such a processor are very high. They require Low time-to-market and have low NRE costs. They provide a high flexibility. Design time and NRE cost are low, because the designer must only write a program, but need not do any digital design. Flexibility is high, because changing functionality requires only changing the program. Unit cost may be relatively low in small quantities, since the processor manufacturer sells large quantities to other customers and hence distributes the NRE cost over many units. Performance may be fast for computation-intensive applications, if using a fast processor, due to advanced architecture features and leading edge IC technology. some design-metric drawbacks : Unit cost may be too high for large quantities. Performance may be slow for certain applications. Size and power may be large due to unnecessary processor hardware. Figure 1.4(d) illustrates the use of a single-purpose processor in our embedded system example, representing an exact fit of the desired functionality, nothing more, nothing less.
ECE, SJBIT
29
06EC82
Fig : 1.4 Processors vary in their customization for the problem at hand: (a) desired functionality, (b) generalpurpose processor, (b) application-specific processor, (c)single-purpose processor.
Fig 1.5 Implementing desired functionality on different General purpose processor
2. Single Purpose Processors Hardware:

This is a digital circuit designed to execute exactly one program. Its features are, it contains only the components needed to execute a single program; it contains no program memory. User cannot change the functionality of the chip. They are fast, low powered and small sized.
An embedded system designer creates a single-purpose processor by designing a custom digital circuit. Using a single-purpose processor in an embedded system results in several design metric benefits and drawbacks, which are essentially the inverse of those for general purpose processors. Performance may be fast, size and power may be small, and unit-cost may be low for large quantities, while design time and NRE costs may be high, flexibility is low, unit cost may be high for small quantities, and performance may not match general-purpose processors for some applications.
ECE, SJBIT
30
06EC82
Fig 1.6 Implementing desired functionality on different single purpose processor
3.Application Specific Processors: Application specific Instruction set processors (ASIP):

They are programmable processors optimized for a particular class of applications having common characteristics. They strike a compromise between general-purpose and single-purpose processors. They have a program memory, an optimized data path and special functional units. They have good performance, some flexibility, size and power. An application-specific instruction-set processor (or ASIP) can serve as a compromise between the above processor options. An ASIP is designed for a particular class of applications with common characteristics, such as digital-signal processing, telecommunications, embedded control, etc. The designer of such a processor can optimize the datapath for the application class, perhaps adding special functional units for common operations, and eliminating other infrequently used units.
ECE, SJBIT
31
06EC82
Fig 1.7 Implementing desired functionality on different Application Specific processor
Digital-signal processors (DSPs) are a common class of ASIP, so demand special mention. A DSP is a processor designed to perform common operations on digital signals, which are the digital encodings of analog signals like video and audio. These operations carry out common signal processing tasks like signal filtering, transformation,or combination. Such operations are usually math-intensive, including operations like multiply and add or shift and add. To support such operations, a DSP may have special purpose datapath components such a multiply-accumulate unit, which can perform a computation like T = T + M[i]*k using only one instruction. Because DSP programs often manipulate large arrays of data, a DSP may also include special hardware to fetch sequential data memory locations in parallel with other operations, to further speed execution.
Q8. Suggest two methods to improve productivity.

There are numerous additional approaches to improving designer productivity. Standards focus on developing well-defined methods for specification, synthesis and libraries. Such standards can reduce the problems that arise when a designer uses multiple tools, or retrieves or provides design information from or to other designers. Common standards include language standards, synthesis standards and library standards. Languages focus on capturing desired functionality with minimum designer effort. For example, the sequential programming language of C is giving way to the object oriented language of C++, which in turn has given some ground to Java. As another example, state-machine languages permit direct capture of functionality as a set of states and transitions, which can then be translated to other languages like C. Frameworks provide a software environment for the application of numerous tools throughout the design process and management of versions of implementations. For example, a framework might generate the UNIX directories needed for various simulators and synthesis tools, supporting application of those tools through menu selections in a single graphical user interface.
ECE, SJBIT
32
06EC82
UNIT 2
SINGLE-PURPOSE PROCESSORS: Hardware, Combinational Logic, Sequential Logic, RT level Combinational and Sequential Components, Optimizing single-purpose processors. SinglePurpose Processors: Software, Basic Architecture, Operation, Programmers View, Development Environment, ASIPS. 6 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002
REFERENCE BOOKS: 1. 2. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
3.
ECE, SJBIT
33
06EC82
UNIT 2 CUSTOM SINGLE PURPOSE PROCESSORS: HARDWARE

2.1 INTRODUCITON:
A processor is a digital circuit designed to perform computation tasks . a processor consists of a datapath capable of storing and manipulating data and a controller capable of moving data through the datapath. A general purpose processor is designed to carry out a wide variety of computation task.A single purpose processor is designed specifically to carry out a particular computational task. A custom single-purpose processor may be Fast, small, low power But, high NRE, longer time-to-market, less flexible
2.2 COMBINATIONAL LOGIC:

1. Transistors and Logic Gates 2. Basic combinational logic design 3. RT level combinational components
Transistors and Logic Gates:

A transistor is the basic electrical component in digital systems. A transistor acts as simple on/off switch. Among the designs CMOS is one .
Fig 2.1 view of CMOS transistor on silicon The CMOS transistor consists of Gate, source and drain , where gate controls the current flow from source to drain. The voltage of +3V or +5V can be supply which will refer to logic 1 and low voltage is typically ground and treated as logic 0.
ECE, SJBIT
34
06EC82
When logic 1 is applied to gate transistor conducts so current flows When logic 0 is applied to gate transistor does not conduct.
Fig 2.2 a & b
CMOS transistor implementation
Fig 2.2 a b & c
CMOS transistor implementation of inverter,NAND and NOR gate
Digital system designers work at the abstraction level of logic gates where each gate is represented symbolically with Boolean equation as shown in figure 2.3
ECE, SJBIT
35
06EC82
Fig 2.3 Basic logic gates Combinational logic design:

A combinational circuit is a digital circuit whose output is purely a function of its present inputs. Such a circuit has no memory of past inputs.example is shown below.
Fig 2.4 combi design : problem , TT, output , minimized , final ckt.
ECE, SJBIT
36
06EC82
RT level combinational components:

Design of complex digital circuits takes time using only logic gates , so, combinational components like Mux, Decoders,adders ,comparators, ALUetc can be designed used RT level synthesis .
Fig 2.5 combinational components
2.3 Sequential logic a. Flip flops b. RT level sequential components c. Sequential logic design
2.3.1 Flip flops
A sequential circuit is a digital circuit whose outputs are a function of the present as well as previous input values. Basic sequential circuits is a flip flop. A flip flop stores a single bit.
ECE, SJBIT
37
06EC82
D-flip flop: It has two inputs D and clock, when clock is 1, value of D is stored in flip
flop and output Q occurs. When clock is 0, previously stored bit is maintained and output appears at Q. SR Flip flop : It has three inputs S,R,clock , when clock is 1, inputs S and R are examined , if S is 1 ,1 is stored. If R is 1, 0 is stored. If both S and R is 0, there is no change. If both are 1 behavior is undefined. Thus S stands for set and R for reset.
Fig 2.6 Sequential components
2.3.2 RT level sequential components:

A register , shift register and counters are designed using RT level synthesis, In which , a register stores n bits from its n-bit data input I with those stored bits appearing at its output Q and bits are stored in parallel. A shift register stores n bits, but these bits cannot be stored in parallel , instead they are shifted into the registers serially . A shift register has one data input I and two control inputs clock and shift. A counter is a register that can also increment add one binary bit to its stored binary value. A synchronous input value only has an effect during a clock edge. An asynchronous inputs value affects the circuit independent of the clock. All these are shown in figure 2.6
ECE, SJBIT
38
06EC82
2.3.3 Sequential logic design

Sequential logic design can be achieved using a straight forward technique which is illustrated below
Fig 2.7 (a) (b)( c)( d) sequential logic design
Fig 2.7 (e) (f) sequential logic design

ECE, SJBIT
39
06EC82
2.4 Custom single purpose processor design:

A basic processor consists of a controller and a data path . The datapath stores and manipulates a systems data controller carries out the configuration of the datapath and sets the datapath control inputs like register load mux select signals functional units and connection units to obtain desired configuration of the datapath.
Fig 2.8 A basic processor(a) controller and datapath (b) view inside the controller and datapath
Example program : First create algorithm Convert algorithm to complex state machine Known as FSMD: finite-state machine with datapath Can use templates to perform such conversion
ECE, SJBIT
40
06EC82
Fig : 2.9 Example program GCD
Create a register for any declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units Based on reads and writes Use multiplexors for multiple sources Create unique identifier for each datapath component control input and output
Templates for creating state diagram :

We finished the datapath We have a state table for the next state and control logic All thats left is combinational logic design This is not an optimized design, but we see the basic steps
ECE, SJBIT
41
06EC82
Fig 2.10 : Templates for creating state diagram
2.5 RT level Custom Single Purpose processor Design:

We often start with a state machine Rather than algorithm Cycle timing often too central to functionality Example Bus bridge that converts 4-bit bus to 8-bit bus Start with FSMD Known as register-transfer (RT) level Exercise: complete the design
ECE, SJBIT
42
06EC82
Fig 2.13 RT level Custom Single Purpose processor Design example
2.6 Optimizing Custom single-purpose processors

Optimization is the task of making design metric values the best possible Optimization opportunities original program FSMD datapath FSM
ECE, SJBIT
43
06EC82
Optimizing the original program

Analyze program attributes and look for areas of possible improvement number of computations size of variable time and space complexity operations used multiplication and division very expensive
Fig 2.15 optimizing the program
Optimizing the FSMD:

ECE, SJBIT
44
06EC82
Areas of possible improvements merge states states with constants on transitions can be eliminated, transition taken is already known states with independent operations can be merged separate states states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size scheduling
Fig 2.16 optimizing the FSDM for GCD Optimizing the datapath:
ECE, SJBIT
45
06EC82
Sharing of functional units one-to-one mapping, as done previously, is not necessary if same operation occurs in different states, they can share a single functional unit Multi-functional units ALUs support a variety of operations, it can be shared among operations occurring in different states
Optimizing the FSM:

State encoding task of assigning a unique bit pattern to each state in an FSM size of state register and combinational logic vary can be treated as an ordering problem State minimization task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state
ECE, SJBIT
46
06EC82
GENENRAL PURPOSE PROCESSORS : SOFTWARE A General-Purpose Processor is a

Processor designed for a variety of computation tasks Low unit cost, in part because manufacturer spreads NRE over large numbers of units Motorola sold half a billion 68HC05 microcontrollers in 1996 alone Carefully designed since higher NRE is acceptable Can yield good performance, size and power Low NRE cost, short time-to-market/prototype, high flexibility User just writes software; no processor design a.k.a. microprocessor micro used when they were implemented on one or a few chips rather than entire rooms
Basic Architecture:
A general purpose processor sometimes called a CPU consists of datapath and a control unit linked with memory. Control unit and datapath Note similarity to single-purpose processor Key differences Datapath is general Control unit doesnt store the algorithm the algorithm is programmed into the memory
Datapath Operations:
Load Read memory location into register ALU operation Input certain registers through ALU, store back in register Store Write register to memory location
ECE, SJBIT
47
06EC82
Fig 2.17 GPP basic architecture
Control unit :
Control unit: configures the datapath operations Sequence of desired operations (instructions) stored in memory program Instruction cycle broken into several sub-operations, each one clock cycle, e.g.: Fetch: Get next instruction into IR Decode: Determine what the instruction means Fetch operands: Move data from memory to datapath register Execute: Move data through the ALU Store results: Write data from register to memory
ECE, SJBIT
48
06EC82
Control Unit Sub-Operations:

Fetch Get next instruction into IR PC: program counter, always points to next instruction IR: holds the fetched instruction
Decode Determine what the instruction means Fetch operands Move data from memory to datapath register Execute Move data through the ALU This particular instruction does nothing during this sub-operation Store results Write data from register to memory This particular instruction does nothing during this sub-operation
Memory:
Program information consists of the sequence of instructions that cause the processor to carry out the desired system functionality. Data information represents the values being input, output and transformed by the program. We can store program and data together or separately.. In a Princeton architecture,data and program words share the same memory space. The Princeton architecture may result in a simpler hardware connection to memory, since only one connection is necessary. In a Harvard architecture, the program memory space is distinct from the data memory space. A Harvard architecture,while requiring two connections, can perform instruction and data fetches simultaneously, so may result in improved performance. Most machines have a Princeton architecture. The Intel 8051 is a well-known Harvard architecture.
ECE, SJBIT
49
06EC82
Figure 2.19: Two memory architectures: (a) Harvard, (b) Princeton
Memory may be read-only memory (ROM) or readable and writable memory (RAM). ROM is usually much more compact than RAM. An embedded system often uses ROM for program memory, since, unlike in desktop systems, an embedded systems program does not change. Constant-data may be stored in ROM, but other data of course requires RAM. Memory may be on-chip or off-chip. On-chip memory resides on the same IC as the processor, while off-chip memory resides on a separate IC. The processor can usually access on-chip memory must faster than off-chip memory, perhaps in just one cycle, but finite IC capacity of course implies only a limited amount of on-chip memory.
Figure 2.20: Cache memory
To reduce the time needed to access (read or write) memory, a local copy of a portion of memory may be kept in a small but especially fast memory called cache. Cache
ECE, SJBIT
50
06EC82
memory often resides on-chip, and often uses fast but expensive static RAM technology rather than slower but cheaper dynamic RAM. Cache memory is based on the principle that if at a particular time a processor accesses a particular memory location, then the processor will likely access that location and immediate neighbors of the location in the near future.
Operation:
Instruction execution: 1. Fetch instruction: the task of reading the next instruction from memory into the instruction register. 2. Decode instruction: the task of determining what operation the instruction in the instruction register represents (e.g., add, move, etc.). 3. Fetch operands: the task of moving the instructions operand data into appropriate registers. 4. Execute operation: the task of feeding the appropriate registers through the ALU and back into an appropriate register. 5. Store results: the task of writing a register into memory. If each stage takes one clock cycle, then we can see that a single instruction may take several cycles to complete.
Pipelining
Pipelining is a common way to increase the instruction throughput of a microprocessor. We first make a simple analogy of two people approaching the chore of washing and drying 8 dishes. In one approach, the first person washes all 8 dishes, and then the second person dries all 8 dishes. Assuming 1 minute per dish per person, this approach requires 16 minutes. The approach is clearly inefficient since at any time only one person is working and the other is idle. Obviously, a better approach is for the second person to begin drying the first dish immediately after it has been washed. This approach requires only 9 minutes -- 1 minute for the first dish to be washed, and then 8 more minutes until the last dish is finally dry . We refer to this latter approach as pipelined.
ECE, SJBIT
51
06EC82
Figure 2.21: Pipelining: (a) non-pipelined dish cleaning, (b) pipelined dish cleaning, (c) pipelined instruction execution.
Each dish is like an instruction, and the two tasks of washing and drying are like the five stages listed above. By using a separate unit (each akin a person) for each stage, we can pipeline instruction execution. After the instruction fetch unit etches the first instruction, the decode unit decodes it while the instruction fetch unit simultaneously fetches the next instruction.
Superscalar and VLIW Architectures: Performance can be improved by: Faster clock (but theres a limit) Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream Superscalar Scalar: non-vector operations Fetches instructions in batches, executes as many as possible May require extensive hardware to detect independent instructions VLIW: each word in memory has multiple independent instructions
ECE, SJBIT
52
06EC82
Relies on the compiler to detect and schedule instructions Currently growing in popularity
Programmers View
Programmer doesnt need detailed understanding of architecture Instead, needs to know what instructions can be executed Two levels of instructions: Assembly level Structured languages (C, C++, Java, etc.) Most development today done using structured languages But, some assembly level programming may still be necessary Drivers: portion of program that communicates with and/or controls (drives) another device Often have detailed timing considerations, extensive bit manipulation Assembly level may be best for these
Fig 2.22 Instruction stored in memory

ECE, SJBIT
53
06EC82
Instruction Set:
Defines the legal set of instructions for that processor Data transfer: memory/register, register/register, I/O, etc. Arithmetic/logical: move register through ALU and back Branches: determine next PC value when not just PC+1
Addressing Modes:
Fig 2.23 Addressing modes
ECE, SJBIT
54
06EC82
Fig 2.24 A Simple (Trivial) Instruction Set
Program and data memory space The embedded systems programmer must be aware of the size of the available memory for program and for data. The programmer must not exceed these limits. In addition, the programmer will probably want to be aware of on-chip program and data memory capacity, taking care to fit the necessary program and data in on-chip memory if possible.
Registers
The assembly-language programmer must know how many registers are available for general-purpose data storage. For example, a base register may exist, which permits the programmer to use a data-transfer instruction where the processor adds an operand field to the base register to obtain an actual memory address.
I/O
The programmer should be aware of the processors input and output (I/O) facilities, with which the processor communicates with other devices. One common I/O facility is parallel I/O, in which the programmer can read or write a port (a collection of external pins) by reading or writing a special-function register. Another common I/O facility is a system bus, consisting of address and data ports that are automatically activated by
ECE, SJBIT
55
06EC82
certain addresses or types of instructions.
Interrupts
An interrupt causes the processor to suspend execution of the main program, and instead jump to an Interrupt Service Routine (ISR) that fulfills a special, short-term processing need. In particular, the processor stores the current PC, and sets it to the address of the ISR. After the ISR completes, the processor resumes execution of the main program by restoring the PC.The programmer should be aware of the types of interrupts supported by the processor (we describe several types in a subsequent chapter), and must write ISRs when necessary. The assembly-language programmer places each ISR at a specific address in program memory. The structured-language programmer must do so also; some compilers allow a programmer to force a procedure to start at a particular memory location, while recognize pre-defined names for particular ISRs. For example, we may need to record the occurrence of an event from a peripheral device, such as the pressing of a button. We record the event by setting a variable in memory when that event occurs, although the users main program may not process that event until later. Rather than requiring the user to insert checks for the event throughout the main program, the programmer merely need write an interrupt service routine and associate it with an input pin connected to the button. The processor will then call the routine automatically when the button is pressed.
Operating System
Optional software layer providing low-level services to a program (application). File management, disk access Keyboard/display interfacing Scheduling multiple programs for execution Or even just multiple threads from one program Program makes system calls to the OS
Development Environment
Development processor The processor on which we write and debug our programs Usually a PC Target processor
ECE, SJBIT
56
06EC82
The processor that the program will run on in our embedded system Often different from the development processor
Software Development Process

Compilers Cross compiler Runs on one processor, but generates code for another Assemblers Linkers Debuggers Profilers
Fig 2.25 Software Development Process
ECE, SJBIT
57
06EC82
Running a Program:
If development processor is different than target, how can we run our compiled code? Two options: Download to target processor Simulate Simulation One method: Hardware description language But slow, not always available Another method: Instruction set simulator (ISS) Runs on development processor, but executes instructions of target processor ISS Gives us control over time set breakpoints, look at register values, set values, step-by-step execution, ... But, doesnt interact with real environment Download to board Use device programmer Runs in real environment, but not controllable Compromise: emulator Runs in real environment, at speed or near Supports some controllability from the PC
Testing and Debugging:
Fig 2.26 software design process

ECE, SJBIT
58
06EC82
Application-Specific Instruction-Set Processors (ASIPs): General-purpose processors Sometimes too general to be effective in demanding application e.g., video processing requires huge video buffers and operations on large arrays of data, inefficient on a GPP But single-purpose processor has high NRE, not programmable ASIPs targeted to a particular domain Contain architectural features specific to that domain e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. Still programmable A Common ASIP: Microcontroller For embedded control applications Reading sensors, setting actuators Mostly dealing with events (bits): data is present, but not in huge amounts e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven Microcontroller features On-chip peripherals Timers, analog-digital converters, serial communication, etc. Tightly integrated for programmer, typically part of register space On-chip program and data memory Direct programmer access to many of the chips pins Specialized instructions for bit-manipulation and other low-level operations
Digital Signal Processors (DSP)

For signal processing applications Large amounts of digitized data, often streaming Data transformations must be applied fast e.g., cell-phone voice filter, digital TV, music synthesizer
ECE, SJBIT
59
06EC82
DSP features Several instruction execution units Multiple-accumulate single-cycle instruction, other instrs. Efficient vector operations e.g., add two arrays Vector ALUs, loop buffers, etc.
Selecting a Microprocessor
Issues Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc. Speed: how evaluate a processors speed? Clock speed but instructions per cycle may differ Instructions per second but work per instr. may differ Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec. MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digitals VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC EDN Embedded Benchmark Consortium, Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications
Designing a General Purpose Processor

Not something an embedded system designer normally would do But instructive to see how simply we can build one top down Remember that real processors arent usually built this way Much more optimized, much more bottom-up design
ECE, SJBIT
60
06EC82
Fig:2.27 A simple microprocessor
ECE, SJBIT
61
06EC82
RECOMMENDED QUESTIONS UNIT 2 ( Hardware)

1. What is single purpose processor? What are the benefits of choosing a single purpose processor over a general purpose processor.? 2. How do nMOS and pMOS transistors differ? 3. Build a 3-input NAND gate using a minimum number of CMOS transistors. 4. Build a 3-input NOR gate using a minimum number of CMOS transistors. 5. Build a 2-input AND gate using a minimum number of CMOS transistors. 6. Build a 2-input OR gate using a minimum number of CMOS transistors. 7. Explain why NAND and NOR gates are more common than AND and OR gates. 8. Distinguish between combinational and sequential circuit. 9. Design a 2-bit comparator with single output less than using combinational design technique. 10. Design a 3 X 8 decoder with truth table and K-maps. 11. What is the difference between synchronous and asynchronous circuit? 12. What is the purpose of datapath and control path? 13. Design a single purpose processor that outputs Fibonacci numbers upto n places. Start with a function computing the desired result, translate it into state diagram and sketch a probable datapath.
UNIT 2 ( Software)
1. Describe why a general purpose processor could cost less than a single purpose processor. 2. Create a table listing the address spaces for 8 ,16, 24,32, 64 bit address sizes. 3. Illustrate how program and data memory fetches can be overlapped in a Harvard architecture. 4. For a microcontroller create a table listing Five existing variations stressing the features that differ from the basic version.
ECE, SJBIT
62
06EC82
QUESTION PAPER SOLUTION UNIT 2 Q1. Write an algorithm for GCD with more time complexity and write the FSDM and also determine total number of steps required for GCD.
First create algorithm Convert algorithm to complex state machine Known as FSMD: finite-state machine with datapath Can use templates to perform such conversion
GCD
Create a register for any declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units Based on reads and writes Use multiplexors for multiple sources Create unique identifier
ECE, SJBIT
63
06EC82
for each datapath component control input and output
Templates for creating state diagram :

We finished the datapath We have a state table for the next state and control logic All thats left is combinational logic design This is not an optimized design, but we see the basic steps
Templates for creating state diagram
Q2. Explain the different methods to optimize the FSDM .
Optimization is the task of making design metric values the best possible Optimization opportunities original program
ECE, SJBIT
64
06EC82
FSMD datapath FSM
Optimizing the original program

Analyze program attributes and look for areas of possible improvement number of computations size of variable time and space complexity operations used multiplication and division very expensive
Q3. Explain the different memory architectures

Program information consists of the sequence of instructions that cause the processor to carry out the desired system functionality. Data information represents the values being input, output and transformed by the program. We can store program and data together or separately.. In a Princeton architecture,data and program words share the same memory space. The Princeton architecture may result in a simpler hardware connection to memory, since only one connection is necessary. In a Harvard architecture, the program memory space is distinct from the data memory space. A Harvard architecture,while requiring two connections, can perform instruction and data fetches simultaneously, so may result in improved performance. Most machines have a Princeton architecture. The Intel 8051 is a well-known Harvard architecture.
ECE, SJBIT
65
06EC82
Figure 2.19: Two memory architectures: (a) Harvard, (b) Princeton
Memory may be read-only memory (ROM) or readable and writable memory (RAM). ROM is usually much more compact than RAM. An embedded system often uses ROM for program memory, since, unlike in desktop systems, an embedded systems program does not change. Constant-data may be stored in ROM, but other data of course requires RAM. Memory may be on-chip or off-chip. On-chip memory resides on the same IC as the processor, while off-chip memory resides on a separate IC. The processor can usually access on-chip memory must faster than off-chip memory, perhaps in just one cycle, but finite IC capacity of course implies only a limited amount of on-chip memory.
Q4. Explain pipelining for instruction execution with dish cleaning.

Pipelining is a common way to increase the instruction throughput of a microprocessor. We first make a simple analogy of two people approaching the chore of washing and drying 8 dishes. In one approach, the first person washes all 8 dishes, and then the second person dries all 8 dishes. Assuming 1 minute per dish per person, this approach requires 16 minutes. The approach is clearly inefficient since at any time only one person is working and the other is idle. Obviously, a better approach is for the second person to begin drying the first dish immediately after it has been washed. This approach requires only 9 minutes -- 1 minute for the first dish to be washed, and then 8 more minutes until the last dish is finally dry .
ECE, SJBIT
66
06EC82
: Pipelining: (a) non-pipelined dish cleaning, (b) pipelined dish cleaning, (c) pipelined instruction execution.
Each dish is like an instruction, and the two tasks of washing and drying are like the five stages listed above. By using a separate unit (each akin a person) for each stage, we can pipeline instruction execution. After the instruction fetch unit etches the first instruction, the decode unit decodes it while the instruction fetch unit simultaneously fetches the next instruction.
Q5. Explain the software development process.
Software Development Process

Compilers Cross compiler Runs on one processor, but generates code for another Assemblers Linkers Debuggers Profilers
ECE, SJBIT
67
06EC82
Fig 2.25 Software Development Process
Running a Program:
If development processor is different than target, how can we run our compiled code? Two options: Download to target processor Simulate Simulation One method: Hardware description language But slow, not always available Another method: Instruction set simulator (ISS) Runs on development processor, but executes instructions of target processor ISS Gives us control over time set breakpoints, look at register values, set values, step-by-step execution, ... But, doesnt interact with real environment Download to board Use device programmer
Testing and Debugging:
ECE, SJBIT
68
06EC82
Runs in real environment, but not controllable Compromise: emulator Runs in real environment, at speed or near Supports some controllability from the PC
software design process
ECE, SJBIT
69
06EC82
optimizing the program
Optimizing the FSMD:

Areas of possible improvements merge states states with constants on transitions can be eliminated, transition taken is already known states with independent operations can be merged separate states states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size scheduling
ECE, SJBIT
70
06EC82
optimizing the FSDM for GCD Optimizing the datapath:

Sharing of functional units one-to-one mapping, as done previously, is not necessary if same operation occurs in different states, they can share a single functional unit Multi-functional units ALUs support a variety of operations, it can be shared among operations occurring in different states
Optimizing the FSM:

State encoding
ECE, SJBIT
71
06EC82
task of assigning a unique bit pattern to each state in an FSM size of state register and combinational logic vary can be treated as an ordering problem State minimization task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state
ECE, SJBIT
72
06EC82
UNIT 3
Standard Single-Purpose Peripherals, Timers, Counters, UART, PWM, LCD Controllers, Keypad controllers, Stepper Motor Controller, A to D Converters, Examples. 6 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002 2. An Embedded software Primer - David E. Simon: Pearson Education, 1999 REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
73
06EC82
UNIT 3 STANDARD SINGLE PURPOSE PERIPHERALS

3.1 Introduction
A single-purpose processor is a digital system intended to solve a specific computation task. The processor may be a standard one, intended for use in a wide variety of applications in which the same task must be performed. The manufacturer of such an off-the-shelf processor sells the device in large quantities. On the other hand, the processor may be a custom one, built by a designer to implement a task specific to a particular application. Advantages of using a standard single purpose processor: performance may be fast, since the processor is customized for the particular task at hand. size may be small. A single-purpose processor does not require a program memory a standard single-purpose processor may have low unit cost, due to the manufacturer spreading NRE cost over many units
3.2 Timers, counters, and watchdog timers:

A timer is a device that generates a signal pulse at specified time intervals. A time interval is a "real-time" measure of time, such as 3 milliseconds. These devices are extremely useful in systems in which a particular action, such as sampling an input signal or generating an output signal. A simple timer may consist of a register, counter, and an extremely simple controller. The register holds a count value representing the number of clock cycles that equals the desired realtime value. This number can be computed using the simple formula:
Number of clock cycles = Desired real-time value / Clock cycle

A counter is nearly identical to a timer, except that instead of counting clock cycles (pulses on the clock signal), a counter counts pulses on some other input signal. A watchdog timer can be thought of as having the inverse functionality than that of a regular timer. We configure a watchdog timer with a real-time value, just as with a regular timer.
ECE, SJBIT
74
06EC82
Fig: 3.1 Timer structure: basic timer, counter, timer with count, timer with prescalar .
ECE, SJBIT
75
06EC82
3.3 UART
A UART (Universal Asynchronous Receiver/Transmitter) receives serial data and stores it as parallel data (usually one byte), and takes parallel data and transmits it as serial data.Such serial communication is beneficial when we need to communicate bytes of data between devices separated by long distances, or when we simply have few available I/O pins. Internally, a simple UART may possess a baud-rate configuration register, and two independently operating processors, one for receiving and the other for transmitting. The transmitter may possess a register, often called a transmit buffer, that holds data to be sent. This register is a shift register, so the data can be transmitted one bit at a time by shifting at the appropriate rate.
Fig 3.2 : serial transmission using UARTs

ECE, SJBIT
76
06EC82
To use a UART, we must configure its baud rate by writing to the configuration register, and then we must write data to the transmit register and/or read data from the received register. Baudrate = (2s mod / 32) *oscfreq / (12 *(256 - TH1))) smod corresponds to 2 bits in a special-function register, oscfreq is the frequency of the oscillator, and TH1 is an 8-bit rate register of a built-in timer.
3.4 Pulse width modulator

A pulse-width modulator (PWM) generates an output signal that repeatedly switches between high and low. We control the duration of the high value and of the low value by indicating the desired period, and the desired duty cycle, which is the percentage of time the signal is high compared to the signals period. A square wave has a duty cycle of 50%. The pulses width corresponds to the pulses time high. PWM functionality could be implemented on a dedicated general-purpose processor, or integrated with another programs functionality, but the single -purpose processor approach has the benefits of efficiency and simplicity. Another use of a PWM is to encode control commands in a single signal for use by another device.
3.5 LCD controller

An LCD (Liquid crystal display) is a low-cost, low-power device capable of displaying text and images. LCDs are extremely common in embedded systems, since such systems often do not have video monitors standard for desktop systems. LCDs can be found in numerous common devices like watches, fax and copy machines, and calculators.
ECE, SJBIT
77
06EC82
RS 0 0 0 0 0 0 1
R/W 0 0 0 0 0 0 0
DB7 0 0 0 0 0 0
DB6 0 0 0 0 0 0
DB5 0 0 0 0 0 1
DB4 0 0 0 0 1 DL
DB3 0 0 0 1 S/C N
DB2 0 0 1 D R/L F
DB1 0 1 I/D C * *
DB0 1 * S B * *
Description Clears all display, return cursor home Returns cursor home Sets cursor move direction and/or specifies not to shift display ON/OFF of all display(D), cursor ON/OFF (C), and blink position (B) Move cursor and shifts display Sets interface data length, number of display lines, and character font Writes Data
WRITE DATA
CODES I/D = 1 cursor moves left I/D = 0 cursor moves right S = 1 with display shift S/C =1 display shift S/C = 0 cursor movement R/L = 1 shift to right R/L = 0 shift to left DL = 1 8-bit DL = 0 4-bit N = 1 2 rows N = 0 1 row F = 1 5x10 dots F = 0 5x7 dots
Fig 3.3 example of LCD initialization

The basic principle of one type of LCD (reflective) works as follows. First, incoming light passes through a polarizing plate. Next, that polarized light encounters liquid crystal material. If we excite a region of this material, we cause the materials molecules to align, which in turn causes the polarized light to pass through the material. Otherwise, the light does not pass through. Finally, light that has passed through hits a mirror and reflects back, so the excited region appears to light up. Another type of LCD (absorption) works similarly, but uses a black surface instead of a mirror. The surface below the excited region absorbs light, thus appearing darker than the other regions.
ECE, SJBIT
78
06EC82
One of the simplest LCDs is 7-segment LCD. Each of the 7 segments can be activated to display any digit character or one of several letters and symbols.
3.6 Keypad controller

A keypad consists of a set of buttons that may be pressed to provide input to an embedded system. Again, keypads are extremely common in embedded systems, since such systems may lack the keyboard that comes standard with desktop systems. A simple keypad has buttons arranged in an N-column by M-row grid. The device has N outputs, each output corresponding to a column, and another M outputs, each output corresponding to a row. When we press a button, one column output and one row output go high, uniquely identifying the pressed button. To read such a keypad from software, we must scan the column and row outputs.The scanning may instead be performed by a keypad controller.
Fig 3.4 Internal keypad structure with N=4 and M= 4

ECE, SJBIT
79
06EC82
3.7 Stepper motor controller:

A stepper motor is an electric motor that rotates a fixed number of degrees whenever we apply a "step" signal. In contrast, a regular electric motor rotates continuously whenever power is applied, coasting to a stop when power is removed. Internally, a stepper motor typically has four coils. To rotate the motor one step, we pass current through one or two of the coils; the particular coils depends on the present orientation of the motor. Thus, rotating the motor 360E requires applying current to the coils in a specified sequence. Applying the sequence in reverse causes reversed rotation.
Sequence 1 2 3 4 5
A + + +
B + + +
A + + -
B + + -
Fig 3.5 : Controlling a stepper motor using a driver - hardware

ECE, SJBIT
80
06EC82
3.8 Analog-digital converters

An analog-to-digital converter (ADC, A/D or A2D) converts an analog signal to a digital signal, and a digital-to-analog converter (DAC, D/A or D2A) does the opposite. Such conversions are necessary because, while embedded systems deal with digital values, an embedded systems surroundings typically involve many analog signals. Analog refers to continuously-valued signal, such as temperature or speed represented by a voltage between 0 and 100, with infinite possible values in between. "Digital" refers to discretely-valued signals, such as integers, and in computing systems, these signals are encoded in binary. By converting between analog and digital signals, we can use digital processors in an analog environment.
Fig 3.6 Conversion: proportionality ADC and DAC

ECE, SJBIT
81
06EC82
Digital-to-analog conversion using successive approximation
Fig 3.7 example for successive approximation
3.9 Real-time clocks

Much like a digital wristwatch, a real-time clock (RTC) keeps the time and date in an embedded system. Read-time clocks are typically composed of a crystal-controlled oscillator, numerous cascaded counters, and a battery backup. The crystal-controlled oscillator generates a very consistent high-frequency digital pulse that feed the cascaded counters. The first counter, typically, counts these pulses up to the oscillator frequency, which corresponds to exactly one second. At this point, it generates a pulse that feeds the next counter. This counter counts up to 59, at which point it generates a pulse feeding the minute counter. The hour, date, month and year counters work in similar fashion. In addition, real-time clocks adjust for leap years. The rechargeable back-up battery is used to keep the real-time clock running while the system is powered off.
ECE, SJBIT
82
06EC82
RECOMMENDED QUESTIONS UNIT - 3 1. A timer has a clock frequency of 10MHz. determine its range and resolution, terminal count needed to measure 3ms intervals. 2. A watch dog timer that uses two cascaded 16 bit up counters is connected to an 11.981 MHz oscillator. A timeout should occur if the function watch dog reset is not called within 5 minutes. What value should be loaded into the up counter pair when the function is called? 3. Determine the values for smod and TH1 to generate baud rate of 9600 for the 8.51 microcontroller baud rate equation assuming an 11.981 MHz oscillator . 4. Using PWM circuit compute the value assigned to PWM1 to achieve an RPM of 8050 assuming the input voltage needed is 4.375 V. 5. Write a function in pseudocode that initializes the LCD . 6. Compute the memory needed in bytes to store a 4-bit digital encoding of a 3-second analog audio signal sampled every 10 milliseconds. 7. Given an analog input signal whose voltage ranges from -5 to 5V, and a 8-bit digital encoding calculate the correct encoding 1.2V and then trace the successive approximation approach to find the correct encoding. 8. Extend the ratio and resolution equations of analog to digital conversion to any voltage range between Vmin to vmax rather than 0 to Vmax.
ECE, SJBIT
83
06EC82
SOLUTIONS FOR UNIT -3 Q1. Explain the working of timer.

A timer is a device that generates a signal pulse at specified time intervals. A time interval is a "real-time" measure of time, such as 3 milliseconds. These devices are extremely useful in systems in which a particular action, such as sampling an input signal or generating an output signal. A simple timer may consist of a register, counter, and an extremely simple controller. The register holds a count value representing the number of clock cycles that equals the desired real-time value. This number can be computed using the simple formula:
Number of clock cycles = Desired real-time value / Clock cycle

A counter is nearly identical to a timer, except that instead of counting clock cycles (pulses on the clock signal), a counter counts pulses on some other input signal. A watchdog timer can be thought of as having the inverse functionality than that of a regular timer. We configure a watchdog timer with a real-time value, just as with a regular timer.
ECE, SJBIT
84
06EC82
Fig: Timer structure: basic timer, counter, timer with count, timer with prescalar.
Q3. Explain the working of stepper motor using a driver.

A stepper motor is an electric motor that rotates a fixed number of degrees whenever we apply a "step" signal. In contrast, a regular electric motor rotates continuously whenever power is applied, coasting to a stop when power is removed. Internally, a stepper motor typically has four coils. To rotate the motor one step, we pass current through one or two of the coils; the particular coils depends on the present orientation of the motor. Thus, rotating the motor 360E requires applying current to the coils in a specified sequence. Applying the sequence in reverse causes reversed rotation.
Sequence 1 2 3 4 5
ECE, SJBIT
A + + +
B + + +
A + + -
B + + -
85
06EC82
Controlling a stepper motor using a driver - hardware
Q4. Describe the working of a PWM unit

A pulse-width modulator (PWM) generates an output signal that repeatedly switches between high and low. We control the duration of the high value and of the low value by indicating the desired period, and the desired duty cycle, which is the percentage of time the signal is high compared to the signals period. A square wave has a duty cycle of 50%. The pulses width corresponds to the pulses time high. PWM functionality could be implemented on a dedicated general-purpose processor, or integrated with another programs functionality, but the single -purpose processor approach has the benefits of efficiency and simplicity. Another use of a PWM is to encode control commands in a single signal for use by another device.
ECE, SJBIT
86
06EC82
Q5. The analog input range for an 8-bit ADC is -5V to +5V. determine the resolution of this ADC and also the digital output in binary when the input is -2V.also trace the successive approximation steps fro verification.
An analog-to-digital converter (ADC, A/D or A2D) converts an analog signal to a digital signal, and a digital-to-analog converter (DAC, D/A or D2A) does the opposite. Such conversions are necessary because, while embedded systems deal with digital values, an embedded systems surroundings typically involve many analog signals. Analog refers to continuously-valued signal, such as temperature or speed represented by a voltage between 0 and 100, with infinite possible values in between. "Digital" refers to discretely-valued signals, such as integers, and in computing systems, these signals are encoded in binary. By converting between analog and digital signals, we can use digital processors in an analog environment.
ECE, SJBIT
87
06EC82
example for successive approximation
ECE, SJBIT
88
06EC82
UNIT 4
MEMORY: Introduction, Common memory Types, Compulsory memory, Memory Hierarchy and Cache, Advanced RAM. Interfacing, Communication Basics, Microprocessor Interfacing, Arbitration, Advanced Communication Principles, Protocols - Serial, Parallel and Wireless.
8 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002
REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
89
06EC82
UNIT- 4 MEMORY AND MICROPROCESSOR INTERFACING

Any embedded systems functionality consists of three aspects: processing, storage and communication. Processing is the transformation of data, storage is the retention of data for later use, and communication is the transfer of data. processors are used to implement processing, memories to implement storage, and buses to implement communication. A memory stores large numbers of bits. These bits exist as m words of n bits each,for a total of m*n bits. We refer to a memory as an m x n ("m-by-n") memory. Log2(m) address input signals are necessary to identify a particular word. To read a memory means to retrieve the word of a particular address, while to write a memory means to store a word in a particular address.
Fig : 1a Memory (words and bits per word) (1b)memory block diagram
5.2 Write ability/ storage permanence:

Traditional ROM/RAM distinctions ROM read only, bits stored without power RAM
ECE, SJBIT
90
06EC82
read and write, lose stored bits without power Traditional distinctions blurred Advanced ROMs can be written to e.g., EEPROM Advanced RAMs can hold bits without power e.g., NVRAM Write ability Manner and speed a memory can be written Storage permanence ability of memory to hold stored bits after they are written
Fig 5.2 : write ability and storage permanence Write ability:

Ranges of write ability High end processor writes to memory simply and quickly e.g., RAM
ECE, SJBIT
91
06EC82
Middle range processor writes to memory, but slower e.g., FLASH, EEPROM Lower range special equipment, programmer, must be used to write to memory e.g., EPROM, OTP ROM Low end bits stored only during fabrication e.g., Mask-programmed ROM In-system programmable memory Can be written to by a processor in the embedded system using the memory Memories in high end and middle range of write ability
Storage permanence:
Range of storage permanence High end essentially never loses bits e.g., mask-programmed ROM Middle range holds bits days, months, or years after memorys power source turned off e.g., NVRAM Lower range holds bits as long as power supplied to memory e.g., SRAM Low end begins to lose bits almost immediately after written e.g., DRAM Nonvolatile memory Holds bits after power is no longer supplied High end and middle range of storage permanence
ECE, SJBIT
92
06EC82
Common Memory Types ROM

Nonvolatile memory Can be read from but not written to, by a processor in an embedded system Traditionally written to, programmed, before inserting to embedded system Uses Store software program for general-purpose processor program instructions can be one or more ROM words Store constant data needed by system Implement combinational circuit
Example: 8 x 4 ROM
Horizontal lines = words Vertical lines = data Lines connected only at circles Decoder sets word 2s line to 1 if address input is 010 Data lines Q3 and Q1 are set to 1 because there is a programmed connection with word 2s line Word 2 is not connected with data lines Q2 and Q0 Output is 1010
ECE, SJBIT
93
06EC82
Mask-programmed ROM Connections programmed at fabrication set of masks Lowest write ability only once Highest storage permanence bits never change unless damaged Typically used for final design of high-volume systems spread out NRE cost for a low unit cost OTP ROM: One-time programmable ROM
Connections programmed after manufacture by user user provides file of desired contents of ROM file input to machine called ROM programmer each programmable connection is a fuse ROM programmer blows fuses where connections should not exist Very low write ability typically written only once and requires ROM programmer device Very high storage permanence
ECE, SJBIT
94
06EC82
bits dont change unless reconnected to programmer and more fuses blown Commonly used in final products cheaper, harder to inadvertently modify
EPROM: Erasable programmable ROM

Programmable component is a MOS transistor Transistor has floating gate surrounded by an insulator (a) Negative charges form a channel between source and drain storing a logic 1 (b) Large positive voltage at gate causes negative charges to move out of channel and get trapped in floating gate storing a logic 0 (c) (Erase) Shining UV rays on surface of floating-gate causes negative charges to return to channel from floating gate restoring the logic 1 (d) An EPROM package showing quartz window through which UV light can pass Better write ability can be erased and reprogrammed thousands of times Reduced storage permanence program lasts about 10 years but is susceptible to radiation and electric noise Typically used during design development
ECE, SJBIT
95
06EC82
EEPROM: Electrically erasable programmable ROM: Programmed and erased electronically typically by using higher than normal voltage can program and erase individual words Better write ability can be in-system programmable with built-in circuit to provide higher than normal voltage built-in memory controller commonly used to hide details from memory user writes very slow due to erasing and programming busy pin indicates to processor EEPROM still writing can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive
ECE, SJBIT
96
06EC82
Flash Memory: Extension of EEPROM Same floating gate principle Same write ability and storage permanence Fast erase Large blocks of memory erased at once, rather than one word at a time Blocks typically several thousand bytes large Writes to single words may be slower Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones RAM: Random-access memory Typically volatile memory bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read
ECE, SJBIT
97
06EC82
Basic types of RAM:

SRAM: Static RAM Memory cell uses flip-flop to store bit Requires 6 transistors Holds data as long as power supplied DRAM: Dynamic RAM Memory cell uses MOS transistor and capacitor to store bit More compact than SRAM Refresh required due to capacitor leak words cells refreshed when read Typical refresh rate 15.625 microsec. Slower to access than SRAM
ECE, SJBIT
98
06EC82
Ram variations:
PSRAM: Pseudo-static RAM DRAM with built-in memory refresh controller Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power turned off
Low-cost low-capacity memory devices Commonly used in 8-bit microcontroller-based embedded systems First two numeric digits indicate device type RAM: 62 ROM: 27 Subsequent digits indicate capacity in kilobits Example: TC55V2325FF-100 memory device 2-megabit synchronous pipelined burst SRAM memory device Designed to be interfaced with 32-bit processors Capable of fast sequential reads and writes as well as single byte I/O
Composing memory
Memory size needed often differs from size of readily available memories When available memory is larger, simply ignore unneeded high-order address bits and higher data lines
ECE, SJBIT
99
06EC82
When available memory is smaller, compose several smaller memories into one larger memory Connect side-by-side to increase width of words Connect top to bottom to increase number of words added high-order address line selects smaller memory containing desired word using a decoder Combine techniques to increase number and width of words
ECE, SJBIT
100
06EC82
Memory hierarchy: Want inexpensive, fast memory Main memory Large, inexpensive, slow memory stores entire program and data Cache Small, expensive, fast memory stores copy of likely accessed parts of larger memory Can be multiple levels of cache
Cache:
Usually designed with SRAM faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access ( 1 cycle vs. several cycles for main memory) Cache operation: Request for main memory access (read or write) First, check cache for copy
ECE, SJBIT
101
06EC82
cache hit copy is in cache, quick access cache miss copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques
Cache mapping
Far fewer number of available cache addresses Are address contents in cache? Cache mapping used to assign main memory address to cache address and determine hit or miss Three basic techniques: Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line
Direct mapping
Main memory address divided into 2 fields Index cache address number of bits determined by cache size Tag compared with tag stored in cache at address indicated by index if tags match, check valid bit
ECE, SJBIT
102
06EC82
Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line
Fully associative mapping

Complete main memory address stored in each cache address All addresses stored in cache simultaneously compared with desired address Valid bit and offset same as direct mapping
Set-associative mapping
ECE, SJBIT
103
06EC82
Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common
Cache-replacement policy
Technique for choosing which block to replace when fully associative cache is full when set-associative caches line is full Direct mapped cache has no choice Random replace block chosen at random LRU: least-recently used replace block not accessed for longest time FIFO: first-in-first-out push block onto queue when accessed choose block to replace by popping queue
ECE, SJBIT
104
06EC82
Cache write techniques

When written, data cache must update main memory Write-through write to main memory whenever cache is written to easiest to implement processor must wait for slower main memory write potential for unnecessary writes Write-back main memory only written when dirty block replaced extra dirty bit for each block set when cache block written to reduces number of slow main memory writes
Cache impact on system performance

Most important parameters in terms of performance: Total size of cache total number of data bytes cache can hold tag, valid and other house keeping bits not included in total Degree of associativity Data block size Larger caches achieve lower miss rates but higher access cost e.g., 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement)
ECE, SJBIT
105
06EC82
8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles (worse) Cache performance trade-offs: Improving cache hit rate without increasing size Increase line size Change set-associativity
Advanced RAM
DRAMs commonly used as main memory in processor based embedded systems high capacity, low cost Many variations of DRAMs proposed need to keep pace with processor speeds FPM DRAM: fast page mode DRAM EDO DRAM: extended data out DRAM SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM RDRAM: rambus DRAM
ECE, SJBIT
106
06EC82
Basic DRAM Address bus multiplexed between row and column components Row and column addresses are latched in, sequentially, by strobing ras and cas signals, respectively Refresh circuitry can be external or internal to DRAM device strobes consecutive memory address periodically causing memory content to be refreshed Refresh circuitry disabled during read or write operation
Fast Page Mode DRAM (FPM DRAM) Each row of memory bit array is viewed as a page Page contains multiple words Individual words addressed by column address Timing diagram: row (page) address sent 3 words read consecutively by sending column address for each Extra cycle eliminated on each read/write of words from same page
ECE, SJBIT
107
06EC82
Extended data out DRAM (EDO DRAM) Improvement of FPM DRAM Extra latch before output buffer allows strobing of cas before data read operation completed Reduces read/write latency by additional cycle
(S)ynchronous and Enhanced Synchronous (ES) DRAM

SDRAM latches data on active edge of clock Eliminates time to detect ras/cas and rd/wr signals A counter is initialized to column address then incremented on active edge of clock to access consecutive memory locations ESDRAM improves SDRAM added buffers enable overlapping of column addressing faster clocking and lower read/write latency possible
ECE, SJBIT
108
06EC82
Rambus DRAM (RDRAM) More of a bus interface architecture than DRAM architecture Data is latched on both rising and falling edge of clock Broken into 4 banks each with own row decoder can have 4 pages open at a time Capable of very high throughput DRAM integration problem SRAM easily integrated on same chip as processor DRAM more difficult Different chip making process between DRAM and conventional logic Goal of conventional logic (IC) designers: minimize parasitic capacitance to reduce signal propagation delays and power consumption Goal of DRAM designers: create capacitor cells to retain stored information Integration processes beginning to appear Memory Management Unit (MMU) Duties of MMU Handles DRAM refresh, bus interface and arbitration Takes care of memory sharing among multiple processors
ECE, SJBIT
109
06EC82
Translates logic memory addresses from processor to physical memory addresses of DRAM Modern CPUs often come with MMU built-in Single-purpose processors can be used
Introduction to interfacing :
Embedded system functionality aspects Processing Transformation of data Implemented using processors Storage Retention of data Implemented using memory Communication Transfer of data between processors and memories Implemented using buses Called interfacing
A simple bus
Wires: Uni-directional or bi-directional One line may represent multiple wires Bus Set of wires with a single function Address bus, data bus Or, entire collection of wires Address, data and control Associated protocol: rules for communication
ECE, SJBIT
110
06EC82
Ports
Conducting device on periphery Connects bus to processor or memory Often referred to as a pin Actual pins on periphery of IC package that plug into socket on printed-circuit board Sometimes metallic balls instead of pins Today, metal pads connecting process ors and memories within single IC Single wire or set of wires with single function E.g., 12-wire address port
Timing Diagrams
Most common method for describing a communication protocol Time proceeds to the right on x-axis Control signal: low or high May be active low (e.g., go, /go, or go_L) Use terms assert (active) and deassert Asserting go means go=0 Data signal: not valid or valid Protocol may have subprotocols Called bus cycle, e.g., read and write Each may be several clock cycles Read example
ECE, SJBIT
111
06EC82
rd/wr set low,address placed on addr for at least tsetup time before enable asserted, enable triggers memory to place data on data wires by time tread
Basic protocol concepts

Actor: master initiates, servant (slave) respond Direction: sender, receiver Addresses: special kind of data Specifies a location in memory, a peripheral, or a register within a peripheral Time multiplexing Share a single set of wires for multiple pieces of data Saves wires at expense of time
ECE, SJBIT
112
06EC82
Basic protocol concepts: control methods
A strobe/handshake compromise
ECE, SJBIT
113
06EC82
ISA bus protocol memory access

ISA: Industry Standard Architecture Common in 80x86s Features 20-bit address Compromise strobe/handshake control 4 cycles default Unless CHRDY deasserted resulting in additional wait cycles (up to 6)
ECE, SJBIT
114
06EC82
Microprocessor interfacing: I/O addressing

A microprocessor communicates with other devices using some of its pins Port-based I/O (parallel I/O) Processor has one or more N-bit ports Processors software reads and writes a port just like a register E.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit ports Bus-based I/O Processor has address, data and control ports that form a single bus Communication protocol is built into the processor A single instruction carries out the read or write protocol on the bus
Compromises/extensions
Parallel I/O peripheral When processor only supports bus-based I/O but parallel I/O needed Each port on peripheral connected to a register within peripheral that is read/written by the processor Extended parallel I/O When processor supports port-based I/O but more ports needed One or more processor ports interface with parallel I/O peripheral extending total number of ports available for I/O e.g., extending 4 ports to 6 ports in figure
ECE, SJBIT
115
06EC82
Types of bus-based I/O: memory-mapped I/O and standard I/O

Processor talks to both memory and peripherals using same bus two ways to talk to peripherals Memory-mapped I/O Peripheral registers occupy addresses in same address space as memory e.g., Bus has 16-bit address lower 32K addresses may correspond to memory upper 32k addresses may correspond to peripherals Standard I/O (I/O-mapped I/O) Additional pin (M/IO) on bus indicates whether a memory or peripheral access e.g., Bus has 16-bit address all 64K addresses correspond to memory when M/IO set to 0 all 64K addresses correspond to peripherals when M/IO set to 1
Memory-mapped I/O vs. Standard I/O

Memory-mapped I/O Requires no special instructions Assembly instructions involving memory like MOV and ADD work with peripherals as well Standard I/O requires special instructions (e.g., IN, OUT) to move data between peripheral registers and memory Standard I/O No loss of memory addresses to peripherals Simpler address decoding logic in peripherals possible When number of peripherals much smaller than address space then high-order address bits can be ignored smaller and/or faster comparators
ECE, SJBIT
116
06EC82
ISA bus
ISA supports standard I/O /IOR distinct from /MEMR for peripheral read /IOW used for writes 16-bit address space for I/O vs. 20-bit address space for memory Otherwise very similar to memory protocol
A basic memory protocol

Interfacing an 8051 to external memory Ports P0 and P2 support port-based I/O when 8051 internal memory being used Those ports serve as data/address buses when external memory is being used 16-bit address and 8-bit data are time multiplexed; low 8-bits of address must therefore be latched with aid of ALE signal
ECE, SJBIT
117
06EC82
A more complex memory protocol
Generates control signals to drive the TC55V2325FF memory chip in burst mode Addr0 is the starting address input to device GO is enable/disable input to device
ECE, SJBIT
118
06EC82
Microprocessor interfacing: interrupts

Suppose a peripheral intermittently receives data, which must be serviced by the processor The processor can poll the peripheral regularly to see if data has arrived wasteful The peripheral can interrupt the processor when it has data Requires an extra pin or pins: Int If Int is 1, processor suspends current program, jumps to an Interrupt Service Routine, or ISR Known as interrupt-driven I/O Essentially, polling of the interrupt pin is built-into the hardware, so no extra time
Microprocessor interfacing: interrupts
What is the address (interrupt address vector) of the ISR? Fixed interrupt Address built into microprocessor, cannot be changed Either ISR stored at address or a jump to actual ISR stored if not enough bytes available Vectored interrupt Peripheral must provide the address Common when microprocessor has multiple peripherals connected by a system bus Compromise: interrupt address table
ECE, SJBIT
119
06EC82
Interrupt-driven I/O using fixed ISR location
Interrupt address table

Compromise between fixed and vectored interrupts One interrupt pin Table in memory holding ISR addresses (maybe 256 words) Peripheral doesnt provide ISR address, but rather index into table Fewer bits are sent by the peripheral Can move ISR location without changing peripheral
Additional interrupt issues

Maskable vs. non-maskable interrupts Maskable: programmer can set bit that causes processor to ignore interrupt Important when in the middle of time-critical code Non-maskable: a separate interrupt pin that cant be masked Typically reserved for drastic situations, like power failure requiring immediate backup of data to non-volatile memory
ECE, SJBIT
120
06EC82
Jump to ISR Some microprocessors treat jump same as call of any subroutine Complete state saved (PC, registers) may take hundreds of cycles Others only save partial state, like PC only Thus, ISR must not modify registers, or else must save them first Assembly-language programmer must be aware of which registers stored
Direct memory access

Buffering Temporarily storing data in memory before processing Data accumulated in peripherals commonly buffered Microprocessor could handle this with ISR Storing and restoring microprocessor state inefficient Regular program must wait DMA controller more efficient Separate single-purpose processor Microprocessor relinquishes control of system bus to DMA controller Microprocessor can meanwhile execute its regular program No inefficient storing and restoring state due to ISR call Regular program need not wait unless it requires the system bus Harvard archictecture processor can fetch and execute instructions as long as they dont access data memory if they do, processor stalls
ECE, SJBIT
121
06EC82
Peripheral to memory transfer without DMA, using vectored interrupt
Peripheral to memory transfer with DMA
ECE, SJBIT
122
06EC82
Arbitration: Priority arbiter

Consider the situation where multiple peripherals request service from single resource (e.g., microprocessor, DMA controller) simultaneously which gets serviced first? Priority arbiter Single-purpose processor Peripherals make requests to arbiter, arbiter makes requests to resource Arbiter connected to system bus for configuration only
Arbitration using a priority arbiter
1. Microprocessor is executing its program. 2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2.
ECE, SJBIT
123
06EC82
3. Priority arbiter sees at least one Ireq input asserted, so asserts Int. 4. Microprocessor stops executing its program and stores its state. 5. Microprocessor asserts Inta. 6. Priority arbiter asserts Iack1 to acknowledge Peripheral1. 7. Peripheral1 puts its interrupt address vector on the system bus 8. Microprocessor jumps to the address of ISR read from data bus, ISR executes and returns (and completes handshake with arbiter). 9. Microprocessor resumes executing its program Arbitration: Priority arbiter Types of priority Fixed priority each peripheral has unique rank highest rank chosen first with simultaneous requests preferred when clear difference in rank between peripherals Rotating priority (round-robin) priority changed based on history of servicing better distribution of servicing especially among peripherals with similar priority demand Arbitration: Daisy-chain arbitration Arbitration done by peripherals Built into peripheral or external logic added req input and ack output added to each peripheral Peripherals connected to each other in daisy-chain manner One peripheral connected to resource, all others connected upstream Peripherals req flows downstream to resource, resources ack flows upstream to requesting peripheral Closest peripheral has highest priority
ECE, SJBIT
124
06EC82
Arbitration: Daisy-chain arbitration Pros/cons Easy to add/remove peripheral - no system redesign needed Does not support rotating priority One broken peripheral can cause loss of access to other peripherals
ECE, SJBIT
125
06EC82
Network-oriented arbitration When multiple microprocessors share a bus (sometimes called a network) Arbitration typically built into bus protocol Separate processors may try to write simultaneously causing collisions Data must be resent Dont want to start sending again at same time statistical methods can be used to reduce chances Typically used for connecting multiple distant chips Trend use to connect multiple on-chip processors
Multilevel bus architectures

Dont want one bus for all communication Peripherals would need high-speed, processor-specific bus interface excess gates, power consumption, and cost; less portable Too many peripherals slows down bus Processor-local bus High speed, wide, most frequent communication Connects microprocessor, cache, memory controllers, etc. Peripheral bus Lower speed, narrower, less frequent communication
ECE, SJBIT
126
06EC82
Typically industry standard bus (ISA, PCI) for portability Bridge Single-purpose processor converts communication between busses
Advanced communication principles:

Layering Break complexity of communication protocol into pieces easier to design and understand Lower levels provide services to higher level Lower level might work with bits while higher level might work with packets of data Physical layer Lowest level in hierarchy Medium to carry data from one actor (device or node) to another Parallel communication Physical layer capable of transporting multiple bits of data Serial communication Physical layer transports one bit of data at a time Wireless communication No physical connection needed for transport at physical layer
ECE, SJBIT
127
06EC82
Parallel communication
Multiple data, control, and possibly power wires One bit per wire High data throughput with short distances Typically used when connecting devices on same IC or same circuit board Bus must be kept short long parallel wires result in high capacitance values which requires more time to charge/discharge Data misalignment between wires increases as length increases Higher cost, bulky
Serial communication
Single data wire, possibly also control and power wires Words transmitted one bit at a time Higher data throughput with long distances Less average capacitance, so more bits per unit of time Cheaper, less bulky More complex interfacing logic and communication protocol Sender needs to decompose word into bits Receiver needs to recompose bits into word Control signals often sent on same wire as data increasing protocol complexity
Wireless communication
Infrared (IR) Electronic wave frequencies just below visible light spectrum Diode emits infrared light to generate signal Infrared transistor detects signal, conducts when exposed to infrared light Cheap to build Need line of sight, limited range Radio frequency (RF)
ECE, SJBIT
128
06EC82
Electromagnetic wave frequencies in radio spectrum Analog circuitry and antenna needed on both sides of transmission Line of sight not needed, transmitter power determines range
Error detection and correction

Often part of bus protocol Error detection: ability of receiver to detect errors during transmission Error correction: ability of receiver and transmitter to cooperate to correct problem Typically done by acknowledgement/retransmission protocol Bit error: single bit is inverted Burst of bit error: consecutive bits received incorrectly Parity: extra bit sent with word used for error detection Odd parity: data word plus parity bit contains odd number of 1s Even parity: data word plus parity bit contains even number of 1s Always detects single bit errors, but not all burst bit errors Checksum: extra word sent with data packet of multiple words e.g., extra word contains XOR sum of all data words in packet
Serial protocols: I2C I2C (Inter-IC) Two-wire serial bus protocol developed by Philips Semiconductors nearly 20 years ago Enables peripheral ICs to communicate using simple communication hardware Data transfer rates up to 100 kbits/s and 7-bit addressing possible in normal mode 3.4 Mbits/s and 10-bit addressing in fast-mode Common devices capable of interfacing to I2C bus: EPROMS, Flash, and some RAM memory, real-time clocks, watchdog timers, and microcontrollers
ECE, SJBIT
129
06EC82
Serial protocols: CAN

CAN (Controller area network) Protocol for real-time applications Developed by Robert Bosch GmbH Originally for communication among components of cars Applications now using CAN include: elevator controllers, copiers, telescopes, production-line control systems, and medical instruments Data transfer rates up to 1 Mbit/s and 11-bit addressing Common devices interfacing with CAN: 8051-compatible 8592 processor and standalone CAN controllers Actual physical design of CAN bus not specified in protocol Requires devices to transmit/detect dominant and recessive signals to/from bus e.g., 1 = dominant, 0 = recessive if single data wire used Bus guarantees dominant signal prevails over recessive signal if asserted simultaneously
Serial protocols: FireWire

FireWire (a.k.a. I-Link, Lynx, IEEE 1394) High-performance serial bus developed by Apple Computer Inc. Designed for interfacing independent electronic components e.g., Desktop, scanner Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing Plug-and-play capabilities Packet-based layered design structure Applications using FireWire include: disk drives, printers, scanners, cameras Capable of supporting a LAN similar to Ethernet 64-bit address: 10 bits for network ids, 1023 subnetworks 6 bits for node ids, each subnetwork can have 63 nodes 48 bits for memory address, each node can have 281 terabytes of distinct locations
ECE, SJBIT
130
06EC82
Serial protocols: USB

USB (Universal Serial Bus) Easier connection between PC and monitors, printers, digital speakers, modems, scanners, digital cameras, joysticks, multimedia game equipment 2 data rates: 12 Mbps for increased bandwidth devices 1.5 Mbps for lower-speed devices (joysticks, game pads) Tiered star topology can be used One USB device (hub) connected to PC hub can be embedded in devices like monitor, printer, or keyboard or can be standalone Multiple USB devices can be connected to hub Up to 127 devices can be connected like this USB host controller Manages and controls bandwidth and driver software required by each peripheral Dynamically allocates power downstream according to devices connected/disconnected
Parallel protocols: PCI Bus

PCI Bus (Peripheral Component Interconnect) High performance bus originated at Intel in the early 1990s Standard adopted by industry and administered by PCISIG (PCI Special Interest Group) Interconnects chips, expansion boards, processor memory subsystems Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit addressing Later extended to 64-bit while maintaining compatibility with 32-bit schemes Synchronous bus architecture Multiplexed data/address lines
ECE, SJBIT
131
06EC82
Parallel protocols: ARM Bus

ARM Bus Designed and used internally by ARM Corporation Interfaces with ARM line of processors Many IC design companies have own bus protocol Data transfer rate is a function of clock speed If clock speed of bus is X, transfer rate = 16 x X bits/s 32-bit addressing
Wireless protocols: IrDA

IrDA Protocol suite that supports short-range point-to-point infrared data transmission Created and promoted by the Infrared Data Association (IrDA) Data transfer rate of 9.6 kbps and 4 Mbps IrDA hardware deployed in notebook computers, printers, PDAs, digital cameras, public phones, cell phones Lack of suitable drivers has slowed use by applications Windows 2000/98 now include support Becoming available on popular embedded OSs
Wireless protocols: Bluetooth

Bluetooth New, global standard for wireless connectivity Based on low-cost, short-range radio link Connection established when within 10 meters of each other No line-of-sight required e.g., Connect to printer in another room
ECE, SJBIT
132
06EC82
Wireless Protocols: IEEE 802.11

IEEE 802.11 Proposed standard for wireless LANs Specifies parameters for PHY and MAC layers of network PHY layer physical layer handles transmission of data between nodes provisions for data transfer rates of 1 or 2 Mbps operates in 2.4 to 2.4835 GHz frequency band (RF) or 300 to 428,000 GHz (IR) MAC layer medium access control layer protocol responsible for maintaining order in shared medium collision avoidance/detection
ECE, SJBIT
133
06EC82
RECOMMENDED QUESTIONS UNIT 4 MEMORY AND MICROPROCESSOR INTERFACING

1.Briefly define each of the following: mask-programmed ROM, PROM, EPROM, EEPROM, flash EEPROM, RAM, SRAM, DRAM, PSRAM, NVRAM. 2. Sketch the internal design of a 4x3 ROM. 3. Sketch the internal design of a 4x3 RAM. 4. Compose 1kx8 ROMs into a 1kx32 ROM (note: 1k actually means 1028 words). 5. Compose 1kx8 ROMs into an 8kx8 ROM. 6. Compose 1kx8 ROMs into a 2kx16 ROM. 7. Show how to use a 1kx8 ROM to implement a 512x6 ROM. 8. Draw the timing diagram for a bus protocol thats handshaked, non -addressed, and transfers 8 bits of data over a 4-bit data bus. 9.Explain the difference between port based I/O and bus based I/O. 10.Discuss the advantages and disadvantages of suing memory mapped I/O versus standard I/O. 11. Explain the benefits that an interrupt address table has over fixed and vectored interrupt methods. 12. (a) Draw a block diagram of a processor, memory, and peripheral connected with a system bus, in which the peripheral gets serviced by using vectored interrupt. Assume servicing moves data from the peripheral to the memory. Show all relevant control and data lines of the bus, and label component inputs/outputs clearly. Use symbolic values for addresses. (b) Provide a timing diagram illustrating what happens over the system bus during the interrupt. 13. (a) Draw a block diagram of a processor, memory, peripheral and DMA controller connected with a system bus, in which the peripheral transfers 100 bytes of data to the memory using DMA. Show all relevant control and data lines of the bus, and label component inputs/outputs clearly. 14. (a) Draw a block diagram of a processor, memory, two peripherals and a priority arbiter, in which the peripherals request servicing using vectored interrupt. Show all relevant control and data lines of the bus, and label component inputs/outputs clearly. (b) List the steps that occur during for such an interrupt. 15.List the three main transmission mediums described . Give two common applications for each.
ECE, SJBIT
134
06EC82
SOLUTIONS FOR UNIT - 4
Q1. Explain the features of flash memory, SRAM and OTP RAM. OTP ROM: One-time programmable ROM
Connections programmed after manufacture by user user provides file of desired contents of ROM file input to machine called ROM programmer each programmable connection is a fuse ROM programmer blows fuses where connections should not exist Very low write ability typically written only once and requires ROM programmer device Very high storage permanence bits dont change unless reconnected to programmer and more fuses blown Commonly used in final products cheaper, harder to inadvertently modify
EEPROM: Electrically erasable programmable ROM: Programmed and erased electronically typically by using higher than normal voltage can program and erase individual words Better write ability can be in-system programmable with built-in circuit to provide higher than normal voltage built-in memory controller commonly used to hide details from memory user writes very slow due to erasing and programming busy pin indicates to processor EEPROM still writing can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive
ECE, SJBIT
135
06EC82
Flash Memory: Extension of EEPROM Same floating gate principle Same write ability and storage permanence Fast erase Large blocks of memory erased at once, rather than one word at a time Blocks typically several thousand bytes large Writes to single words may be slower Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones RAM: Random-access memory Typically volatile memory bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read
ECE, SJBIT
136
06EC82
Basic types of RAM:

SRAM: Static RAM Memory cell uses flip-flop to store bit Requires 6 transistors Holds data as long as power supplied DRAM: Dynamic RAM Memory cell uses MOS transistor and capacitor to store bit More compact than SRAM Refresh required due to capacitor leak words cells refreshed when read Typical refresh rate 15.625 microsec. Slower to access than SRAM
ECE, SJBIT
137
06EC82
Ram variations:
PSRAM: Pseudo-static RAM DRAM with built-in memory refresh controller Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power turned off
Low-cost low-capacity memory devices Commonly used in 8-bit microcontroller-based embedded systems First two numeric digits indicate device type RAM: 62 ROM: 27 Subsequent digits indicate capacity in kilobits Example: TC55V2325FF-100 memory device 2-megabit synchronous pipelined burst SRAM memory device Designed to be interfaced with 32-bit processors Capable of fast sequential reads and writes as well as single byte I/O
Q2. Describe set associative cache mapping technique. What are its merits and demerits?
Cache:
ECE, SJBIT
138
06EC82
Usually designed with SRAM faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access ( 1 cycle vs. several cycles for main memory) Cache operation: Request for main memory access (read or write) First, check cache for copy cache hit copy is in cache, quick access cache miss copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques
Cache mapping
Far fewer number of available cache addresses Are address contents in cache? Cache mapping used to assign main memory address to cache address and determine hit or miss Three basic techniques: Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line
Direct mapping
ECE, SJBIT
139
06EC82
Main memory address divided into 2 fields Index cache address number of bits determined by cache size Tag compared with tag stored in cache at address indicated by index if tags match, check valid bit Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line
Fully associative mapping

Complete main memory address stored in each cache address All addresses stored in cache simultaneously compared with desired address Valid bit and offset same as direct mapping
ECE, SJBIT
140
06EC82
Set-associative mapping
Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common
Cache-replacement policy
ECE, SJBIT
141
06EC82
Technique for choosing which block to replace when fully associative cache is full when set-associative caches line is full Direct mapped cache has no choice Random replace block chosen at random LRU: least-recently used replace block not accessed for longest time FIFO: first-in-first-out push block onto queue when accessed choose block to replace by popping queue
Cache write techniques

When written, data cache must update main memory Write-through write to main memory whenever cache is written to easiest to implement processor must wait for slower main memory write potential for unnecessary writes Write-back main memory only written when dirty block replaced extra dirty bit for each block set when cache block written to reduces number of slow main memory writes
Cache impact on system performance

Most important parameters in terms of performance: Total size of cache total number of data bytes cache can hold tag, valid and other house keeping bits not included in total Degree of associativity
ECE, SJBIT
142
06EC82
Data block size Larger caches achieve lower miss rates but higher access cost e.g., 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement) 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles (worse)
Q3. Explain the different protocol concepts and control methods.
Basic protocol concepts: control methods
ECE, SJBIT
143
06EC82
A strobe/handshake compromise
Q4. Which are the two types of bus based I/O ?Explain.
1. memory-mapped I/O and standard I/O

Processor talks to both memory and peripherals using same bus two ways to talk to peripherals Memory-mapped I/O Peripheral registers occupy addresses in same address space as memory e.g., Bus has 16-bit address lower 32K addresses may correspond to memory upper 32k addresses may correspond to peripherals Standard I/O (I/O-mapped I/O) Additional pin (M/IO) on bus indicates whether a memory or peripheral access e.g., Bus has 16-bit address all 64K addresses correspond to memory when M/IO set to 0
ECE, SJBIT
144
06EC82
all 64K addresses correspond to peripherals when M/IO set to 1
2. Memory-mapped I/O vs. Standard I/O

Memory-mapped I/O Requires no special instructions Assembly instructions involving memory like MOV and ADD work with peripherals as well Standard I/O requires special instructions (e.g., IN, OUT) to move data between peripheral registers and memory Standard I/O No loss of memory addresses to peripherals Simpler address decoding logic in peripherals possible When number of peripherals much smaller than address space then high-order address bits can be ignored smaller and/or faster comparators
Q5.How does the data get transferred from peripheral to memory withour DMA and with DMA explain with diagram
a. Peripheral to memory transfer without DMA, using vectored interrupt
ECE, SJBIT
145
06EC82
b. Peripheral to memory transfer with DMA
ECE, SJBIT
146
06EC82
Q6.Waht is arbitration ? Explain the types of arbitration.
Arbitration: Priority arbiter

Consider the situation where multiple peripherals request service from single resource (e.g., microprocessor, DMA controller) simultaneously which gets serviced first? Priority arbiter Single-purpose processor Peripherals make requests to arbiter, arbiter makes requests to resource Arbiter connected to system bus for configuration only
Arbitration: Daisy-chain arbitration Arbitration done by peripherals Built into peripheral or external logic added req input and ack output added to each peripheral Peripherals connected to each other in daisy-chain manner One peripheral connected to resource, all others connected upstream Peripherals req flows downstream to resource, resources ack flows upstream to requesting peripheral Closest peripheral has highest priority
ECE, SJBIT
147
06EC82
Arbitration: Daisy-chain arbitration Pros/cons Easy to add/remove peripheral - no system redesign needed Does not support rotating priority One broken peripheral can cause loss of access to other peripherals
ECE, SJBIT
148
06EC82
Network-oriented arbitration When multiple microprocessors share a bus (sometimes called a network) Arbitration typically built into bus protocol Separate processors may try to write simultaneously causing collisions Data must be resent Dont want to start sending again at same time statistical methods can be used to reduce chances Typically used for connecting multiple distant chips Trend use to connect multiple on-chip processors
Q7.Write the features of CAN, I2C , Bluetooth, PCI bus,IrDA protocols.
Serial protocols: I2C I2C (Inter-IC) Two-wire serial bus protocol developed by Philips Semiconductors nearly 20 years ago Enables peripheral ICs to communicate using simple communication hardware Data transfer rates up to 100 kbits/s and 7-bit addressing possible in normal mode 3.4 Mbits/s and 10-bit addressing in fast-mode
ECE, SJBIT
149
06EC82
Common devices capable of interfacing to I2C bus: EPROMS, Flash, and some RAM memory, real-time clocks, watchdog timers, and microcontrollers
Serial protocols: CAN

CAN (Controller area network) Protocol for real-time applications Developed by Robert Bosch GmbH Originally for communication among components of cars Applications now using CAN include: elevator controllers, copiers, telescopes, production-line control systems, and medical instruments Data transfer rates up to 1 Mbit/s and 11-bit addressing Common devices interfacing with CAN: 8051-compatible 8592 processor and standalone CAN controllers Actual physical design of CAN bus not specified in protocol Requires devices to transmit/detect dominant and recessive signals to/from bus e.g., 1 = dominant, 0 = recessive if single data wire used Bus guarantees dominant signal prevails over recessive signal if asserted simultaneously
Serial protocols: FireWire

FireWire (a.k.a. I-Link, Lynx, IEEE 1394) High-performance serial bus developed by Apple Computer Inc. Designed for interfacing independent electronic components e.g., Desktop, scanner Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing Plug-and-play capabilities Packet-based layered design structure Applications using FireWire include: disk drives, printers, scanners, cameras
ECE, SJBIT
150
06EC82
Capable of supporting a LAN similar to Ethernet 64-bit address: 10 bits for network ids, 1023 subnetworks 6 bits for node ids, each subnetwork can have 63 nodes 48 bits for memory address, each node can have 281 terabytes of distinct locations
Serial protocols: USB

USB (Universal Serial Bus) Easier connection between PC and monitors, printers, digital speakers, modems, scanners, digital cameras, joysticks, multimedia game equipment 2 data rates: 12 Mbps for increased bandwidth devices 1.5 Mbps for lower-speed devices (joysticks, game pads) Tiered star topology can be used One USB device (hub) connected to PC hub can be embedded in devices like monitor, printer, or keyboard or can be standalone Multiple USB devices can be connected to hub Up to 127 devices can be connected like this USB host controller Manages and controls bandwidth and driver software required by each peripheral Dynamically allocates power downstream according to devices connected/disconnected
Parallel protocols: PCI Bus

PCI Bus (Peripheral Component Interconnect) High performance bus originated at Intel in the early 1990s Standard adopted by industry and administered by PCISIG (PCI Special Interest Group) Interconnects chips, expansion boards, processor memory subsystems Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit addressing
ECE, SJBIT
151
06EC82
Later extended to 64-bit while maintaining compatibility with 32-bit schemes Synchronous bus architecture Multiplexed data/address lines
Parallel protocols: ARM Bus

ARM Bus Designed and used internally by ARM Corporation Interfaces with ARM line of processors Many IC design companies have own bus protocol Data transfer rate is a function of clock speed If clock speed of bus is X, transfer rate = 16 x X bits/s 32-bit addressing
Wireless protocols: IrDA

IrDA Protocol suite that supports short-range point-to-point infrared data transmission Created and promoted by the Infrared Data Association (IrDA) Data transfer rate of 9.6 kbps and 4 Mbps IrDA hardware deployed in notebook computers, printers, PDAs, digital cameras, public phones, cell phones Lack of suitable drivers has slowed use by applications Windows 2000/98 now include support Becoming available on popular embedded OSs
Wireless protocols: Bluetooth

Bluetooth New, global standard for wireless connectivity Based on low-cost, short-range radio link Connection established when within 10 meters of each other
ECE, SJBIT
152
06EC82
No line-of-sight required e.g., Connect to printer in another room
ECE, SJBIT
153
06EC82
PART - B UNIT - 5 INTERRUPTS: Basics - Shared Data Problem - Interrupt latency. Survey of Software Architecture, Round Robin, Round Robin with Interrupts - Function Queues - scheduling - RTOS architecture. 8 Hours
TEXT BOOKS: 1. An Embedded software Primer - David E. Simon: Pearson Education, 1999 REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
Engineers
and
ECE, SJBIT
154
06EC82
PART B UNIT 5 INTERRUPTS

Embedded systems encompass aspects of control (or more broadly, signal processing), computing and communications. In each arena, the embedded system normally manages multiple tasks with hard real-time deadlines1 and interacts with various sensors and actuators through on-chip or on-board peripheral devices, and often with other processors over one or several networks. Communication may be wired or wireless, such as the remote keyless entry on your car, or a Bluetooth-enabled consumer device. The performance requirements of the embedded control system dictate its computing platform, I/O and software architecture. Often, the quality of embedded software is determined by how well the interfaces between Communicating tasks, devices and other networked systems are handled. The following sections will discuss the Management of shared data among cooperating tasks and software architectures for embedded control systems.
Shared Data Problem

When data are shared between cooperating tasks that operate at different rates, care must be taken to maintain the integrity of the calculations that use the shared information. For example, consider the situation where an interrupt routine acquires from an A/D converter, data that are used in the main loop of the pseudocode:
int ADC_channel[3] ISR_ReadData(void) { Read ADC_channel[0] Read ADC_channel[1] Read ADC_channel[2] } int delta, offset void main(void) { while(TRUE) {
ECE, SJBIT
155

... delta = ADC_channel[0]-ADC_channel[1]; offset = delta*ADC_channel[2] } }
06EC82
The interrupt routine can suspend the main loop and execute at any time. Consider an interrupt that occurs between the calculations of delta and offset: On the return from interrupt, the data ADC_channel[0-2] may result in an unintended value being assigned to the calculated variable offset if the values have changed from the previous data acquisition. More subtly, the calculation of delta may also be affected because, as well see, even a single line of code may be interrupted.
Atomic Code and Critical Sections

We will recall that assembly language is the human readable form of the binary machine language code that eventually gets executed by the embedded rocessor. Assembly language, unlike high-level languages such as C and C++, is very closely associated with the processor hardware. Typical assembly language instructions reference memory locations or special purpose registers. An assembly language instruction typically consists of three components: Label: Memory address where the code is located (optional). Op-Code: Mnemonic for the instruction to be executed. Operands: Registers, addresses or data operated on by the instruction. The following are examples of assembly instructions for the Freescale MPC 5553 microprocessor:
add r7, r8, r9; Add the contents of registers 8 and 9, place the result in register 7. and r2, r5, r3; Bitwise AND the contents of registers 5 and 3,place the result in register 2. lwz r6, Ox4(r5); Load the word located at the memory address formed by the sum of 0x4 and the contents of register 5 into register 6. lwzx r9, r5, r8; Load the word located at the memory location formed by the sum of the contents of registers 5 and 8 into register 9. stwx r13, r8, r9; Store the value in register 13 in the memory location formed by the sum of the contents of registers 8 and 9.
The important point about assembly instructions with respect to shared data is that they are atomic, that is, an assembly instruction, because it is implementing fundamental machine operations (data moves between registers, and memory), cannot be interrupted. Now, consider the following assembler instructions with the equivalent C code temp = temp - offset:
ECE, SJBIT
156
06EC82
lwz r5, 0(r10); Read temp stored at 0(r10) and put it in r5 li r6, offset; Put offset value into r6 sub r4, r5, r6; Subtract the offset and put the result into r4 stwz r4, 0(r10); Store the result back in memory Thus, our single line of C code gets compiled into multiple lines of assembler. Consequently, whereas a single line of atomic assembler cannot be interrupted, one line of C code can be. This means that our pseudocode fragment void main(void) { while(TRUE) { ... delta = ADC_channel[0]-ADC_channel[1]; offset = delta*ADC_channel[2] ... } }
This can be interrupted anywhere. In particular, it can be interrupted in the middle of the delta calculation, with the result that the variable may be determined with one new and one old data value; undoubtedly not what the programmer intended. We shall refer to a section of code that must be atomic to execute correctly as a critical section. It is incumbent upon the programmer to protect critical code sections to maintain data coherency. All microprocessors implement instructions to enable and disable interrupts, so the obvious approach is to simply not permit critical sections to be interrupted:
void main(void) { while(TRUE) { ... disable() delta = ADC_channel[0]-ADC_channel[1]; offset = delta*ADC_channel[2] enable() ... } }
It must be kept in mind that code in the interrupt service routine has high priority for a reason something needs to be done immediately. Consequently, its important to disable interrupts sparingly, and only when absolutely necessary (and, naturally, to remember to enable interrupts again after the section of critical code). Other methods of maintaining data coherency will be discussed in the section on real-time operating systems.
Interrupt latency
How fast will a system react to interrupts? Depends on:
ECE, SJBIT
157
06EC82
1. Max. time while IT-s are disabled. 2. Max. time taken to execute higher priority IT-s. 3. Time taken by ISR invocation (context save, etc.) and return (context restore) 4. Work time in ISR to generate a response. Values: For 3: see processor docs. Others: count instructions does not work well for processors with cache! General rule: WRITE SHORT IT SERVICE ROUTINES!
Disabling Interrupts
Example system: Must disable IT-s for 125uS to process pressure variables. Must disable IT-s for 250uS to manage timer Must respond to a network IT within 600uS, the network ISR takes 300uS to execute
Alternative to disabling IT-s

int iTempAs[2]; int iTempBs[2]; bool fUsingB = FALSE; void interrupt vReadTemps() { if(fUsingB) { iTempAs[0] = // read from HW iTempAs[1] = // read from HW } else { iTempBs[0] = // read from HW iTempBs[1] = // read from HW
Survey of Software Architectures

ECE, SJBIT
158
06EC82
Software architecture, according to ANSI/IEEE Standard 1471-2000, is defined as the fundamental organization of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution. Embedded software, as weve said, must interact with the environment through sensors and actuators, and often has hard, real-time constraints. The organization of the software, or its architecture, must reflect these realities. Usually, the critical aspect of an embedded control system is its speed of response which is a function of (among other things) the processor speed and the number and complexity of the tasks to be accomplished, as well as the software architecture. Clearly,embedded systems with not much to do, and plenty of time in which to do it, can employ a simple software organization (a vending machine, for example, or the power seat in your car). Systems that must respond rapidly to many different events with hard real-time deadlines generally require a more complex software architecture (the avionics systems in an aircraft, engine and transmission control, traction control and antilock brakes in your car). Most often, the various tasks managed by an embedded system have different priorities: Some things have to be done immediately (fire the spark plug precisely 20 before the piston reaches top-dead-center in the cylinder), while other tasks may have less severe time constraints. Round robin Round robin with interrupts Function queue scheduling Real time operating systems (RTOS)
Round Robin
The simplest possible software architecture is called round robin.2 Round robin architecture has no interrupts; the software organization consists of one main loop wherein the processor simply polls eachattached device in turn, and provides service if any is required. After all devices have been serviced, start over from the top. Graphically, round robin looks like Figure 1. Round robin pseudocode looks something like this:
ECE, SJBIT
159
06EC82
One can think of many examples where round robin is a perfectly capable architecture: A vending machine, ATM, or household appliance such as a microwave oven (check for a button push, decrement timer, update display and start over). Basically, anything where the processor has plenty of time to get around the loop, and the user wont notice the delay (usually micro-seconds) between a request for service and the processor response (the time between pushing a button on your microwave and the update of the display, for example). The main advantage to round robin is that its very simple, and often its good enough. On the other hand, there are several obvious disadvantages. If a device has to be serviced in less time than it takes the processor to get around the loop, then it wont work. In fact, the worst case response time for round robin is the sum of the execution times for all of the task code. Its also fragile: suppose you added one more device, or some additional processing to a loop that was almost at its chronometric limit then you could be in trouble.3 Some additional performance can be coaxed from the
ECE, SJBIT
160
06EC82
round robin architecture, however. If one or more tasks have more stringent deadlines than the others (they have higher priority),they may simply be checked more often:
Round Robin with Interrupts

Round robin is simple, but thats pretty much its only advantage. One step up on the performance scale is round robin with interrupts. Here, urgent tasks get handled in an interrupt service routine, possibly with a flag set for follow-up processing in the main loop. If nothing urgent happens (emergency stop button pushed, or intruder detected), then the processor continues to operate round robin, managing more mundane tasks in order around the loop. Possible pseudocode:
ECE, SJBIT
161
06EC82
The obvious advantage to round robin with interrupts is that the response time to high-priority tasks is improved, since the ISR always has priority over the main loop (the main loop will always stop whatever its doing to service the interrupt), and yet it remains fairly simple. The worst case response time for a low priority task is the sum of the execution times for all of the code in the main loop plus all of the interrupt service routines. With the introduction of interrupts, the problem of shared data may arise: As in the previous example, if the interrupted low priority function is in the middle of a calculation using data that are supplied or modified by the high priority interrupting function, care must be taken that on the return from interrupt the low priority function data are still valid (by disabling interrupts around critical code sections, for example).
Function Queue Scheduling

Function queue scheduling provides a method of assigning priorities to interrupts. In this architecture,interrupt service routines accomplish urgent processing from interrupting devices, but then put a pointer to 5,a handler function on a queue for follow-up processing. The main loop simply checks the function queue,and if its not empty, calls the first function on the queue. Priorities are assigned by the
ECE, SJBIT
162
06EC82
order of the function in the queue theres no reason that functions have to be placed in the queue in the order in which the interrupt occurred. They may just as easily be placed in the queue in priority order: high priority functions at the top of the queue, and low priority functions at the bottom. The worst case timing for the highest priority function is the execution time of the longest function in the queue (think of the case of the processor just starting to execute the longest function right before an interrupt places a high priority task at the front of the queue). The worst case timing for the lowest priority task is infinite: it may never get executed if higher priority code is always being inserted at the front of the queue. The advantage to function queue scheduling is that priorities can be assigned to tasks; the disadvantages are that its more complicated than the other architectures discussed previously, and it may be subject to shared data problems.
Real-time Operating System (RTOS)

A real-time operating system is complicated, potentially expensive, and takes up precious memory in our almost always cost and memory constrained embedded system. Why use one? There are two main reasons: flexibility and response time. The elemental component of a real-time operating system is a task, and its straightforward to add new tasks or delete obsolete ones because there is no main loop: The RTOS schedules when each task is to run based on its priority. The scheduling of tasks by the RTOS is referred to as multi-tasking. In a preemptive multi-tasking system, the RTOS can suspend a low priority task at any time to execute a higher priority one, consequently, the worst case response time for a high priority task is almost zero (in a non-preemptive multi-tasking system, the low priority task finishes executing before the high priority task starts). In the simplest RTOS, a task can be in one of three states: Running: The task code is being executed by the processor. Only one task may be running at any time. Ready: All necessary data are available and the task is prepared to run when the processor is available.Many tasks may be ready at any time, and will run in priority order.
ECE, SJBIT
163
06EC82
Blocked: A task may be blocked waiting for data or for an event to occur. A task, if it is not preempted, will block after running to completion. Many tasks may be blocked at one time. The part of the RTOS called a scheduler keeps track of the state of each task, and decides which one should be running. The scheduler is a simple-minded device: It simply looks at all the tasks in the ready state and chooses the one with the highest priority. Tasks can block themselves if they run out of things to do, and they can unblock and become ready if an event occurs, but its the job of the scheduler to move tasks between the ready and running states based on their priorities. Since only one of the tasks can possess the semaphore at any time, coherency is assured by taking and releasing a semaphore around the shared data: If the 10ms task attempts to take the semaphore before the 50ms task has released it, the faster task will block until the semaphore is available. Problems, however,may arise if care is not taken in the use of semaphores. Specifically, priority inversion and deadlock. Priority inversion, as the name implies, refers to a situation in which a semaphore inadvertently causes a high priority task to block while lower priority tasks run to completion. Consider the case where a high priority task and a 7 low priority task share a semaphore, and there are tasks of intermediate priority between them . Initially, the low priority task is running and takes a semaphore; all other tasks are blocked. Should the high priority task unblock and attempt to take the semaphore before the low priority task releases it, it will block again until the semaphore is available. If, in the meantime, intermediate priority tasks have unblocked, the simple-minded RTOS will run each one in priority order, completing all the intermediate priority tasks before finally running the low priority function to the point where it gives up its semaphore, permitting the high priority task to run again. The task priorities have been inverted: all of the lower priority tasks have run before the highest priority task gets to complete.
ECE, SJBIT
164
06EC82
Different real-time operating systems employ different algorithms, or resource access protocols, to request and release semaphores in order to avoid priority inversion. A common method is called priority inheritance. In this protocol, whenever a lower priority task blocks a higher priority task, it inherits the priority of the blocked task. Reconsider our priority inversion problem, this time with priority inheritance protocol as illustrated in Figure 5. Once again, the low priority task is running and takes a semaphore; all other tasks are blocked, and again the high priority task unblocks and attempts to take the semaphore before the low priority task has released it, blocking again until the semaphore is available. In the meantime the intermediate priority tasks have unblocked, but with priority inheritance, the low priority task has inherited
ECE, SJBIT
165
06EC82
the priority of the blocked high priority task. Consequently, the RTOS will schedule the blocking task with its promoted priority first, which runs until the semaphore is released, at which time the high priority task takes the semaphore and runs, and the promoted task is reassigned its initial low priority. Consequently,all tasks will run in the correct priority order. Note that if the high priority task accesses multiple shared resources (that is, there is more than one semaphore), it may potentially block as many times as there are semaphores. Furthermore, priority inheritance protocol does nothing to mitigate deadlock. A more complex algorithm is priority ceiling protocol. In priority ceiling protocol, each task is assigned a static priority, and each semaphore, or resource is assigned a ceiling priority greater than or equal to the maximum priority of all the tasks that use it. At run time, a task assumes a priority equal to the static priority or the ceiling value of its resource, whichever is larger: if a task requires a resource, the priority of the task will be raised to the ceiling priority of the resource; when the task releases the resource, the priority is reset. It can be shown that this scheme minimizes the time that the highest priority task will be blocked, and eliminates the potential of deadlock.
ECE, SJBIT
166
06EC82
There are methods other than reliance on RTOS resource access protocols to assure data coherency. Though often used, such methods do not, in general, constitute good programming practice. If the shared data consist of only a single variable, a local copy may be assigned in the low priority task, thus assuring its integrity.
ECE, SJBIT
167
06EC82
RECOMMENDED QUESTIONS UNIT -5
INTERRUPTS AND SURVEY OF Software architecture
1. 2. 3. 4.
Explain the microprocessor architecture. With a block diagram explain the interrupt hardware. How are the interrupts disabled. How does the microprocessor know where to find the interrupt routine when the interrupts occurs. 5. Describe shared data problem with an example. Show how disable/enable interrupt can be used for solving this problem. 6. Explain about the Atomic and critical section in interrupts . 7. Explain interrupt handling procedure , context switching and critical section. 8. Define interrupt latency and how to make interrupt routines short. 9. Explain the steps to disable interrupts. 10. Describe the round robin architecture for a communication bridge. 11. Explain function queue scheduling. 12. Explain the priority levels for RTOS architecture. 13. How to select an architecture for a system.
ECE, SJBIT
168
06EC82
Unit-5 SOLUTIONS FOR UNIT-5
INTERRUPTS
Q1. Describe shared data problem with an example.
Shared Data Problem

When data are shared between cooperating tasks that operate at different rates, care must be taken to maintain the integrity of the calculations that use the shared information. For example, consider the situation where an interrupt routine acquires from an A/D converter, data that are used in the main loop of the pseudocode:
int ADC_channel[3] ISR_ReadData(void) { Read ADC_channel[0] Read ADC_channel[1] Read ADC_channel[2] } int delta, offset void main(void) { while(TRUE) { ... delta = ADC_channel[0]-ADC_channel[1]; offset = delta*ADC_channel[2] } }
The interrupt routine can suspend the main loop and execute at any time. Consider an interrupt that occurs between the calculations of delta and offset: On the return from interrupt, the data ADC_channel[0-2] may result in an unintended value being assigned to the calculated variable offset if the values have changed from the previous data acquisition. More subtly, the calculation of delta may also be affected because, as well see, even a single line of code may be interrupted.
Q2. Describe round robin architecture for digital multimeter example.
Round Robin
ECE, SJBIT
169
06EC82
The simplest possible software architecture is called round robin. 2 Round robin architecture has no interrupts; the software organization consists of one main loop wherein the processor simply polls eachattached device in turn, and provides service if any is required. After all devices have been serviced, start over from the top. Graphically, round robin looks like Figure 1. Round robin pseudocode looks something like this:
One can think of many examples where round robin is a perfectly capable architecture: A vending machine, ATM, or household appliance such as a microwave oven (check for a button push, decrement timer, update display and start over). Basically, anything where the processor has plenty of time to get around the loop, and the user wont notice the delay (usually micro-seconds) between a request for service and the processor response (the time between pushing a button on your microwave and the update of the display,
ECE, SJBIT
170
06EC82
for example). The main advantage to round robin is that its very simple, and often its good enough. On the other hand, there are several obvious disadvantages. If a device has to be serviced in less time than it takes the processor to get around the loop, then it wont work. In fact, the worst case response time for round robin is the sum of the execution times for all of the task code. Its also fragile: suppose you added one more device, or some additional processing to a loop that was almost at its chronometric limit then you could be in trouble.3 Some additional performance can be coaxed from the round robin architecture, however. If one or more tasks have more stringent deadlines than the others (they have higher priority),they may simply be checked more often:
Q3. Explain priority inversion with an example.

Priority inversion, as the name implies, refers to a situation in which a semaphore inadvertently causes a high priority task to block while lower priority tasks run to completion. Consider the case where a high priority task and a 7 low priority task share a semaphore, and there are tasks of intermediate priority between them .
ECE, SJBIT
171
06EC82
Initially, the low priority task is running and takes a semaphore; all other tasks are blocked. Should the high priority task unblock and attempt to take the semaphore before the low priority task releases it, it will block again until the semaphore is available. If, in the meantime, intermediate priority tasks have unblocked, the simple-minded RTOS will run each one in priority order, completing all the intermediate priority tasks before finally running the low priority function to the point where it gives up its semaphore, permitting the high priority task to run again. The task priorities have been inverted: all of the lower priority tasks have run before the highest priority task gets to complete.
ECE, SJBIT
172
06EC82
Different real-time operating systems employ different algorithms, or resource access protocols, to request and release semaphores in order to avoid priority inversion. A common method is called priority inheritance. In this protocol, whenever a lower priority task blocks a higher priority task, it inherits the priority of the blocked task. Reconsider our priority inversion problem, this time with priority inheritance protocol as illustrated in Figure 5. Once again, the low priority task is running and takes a semaphore; all other tasks are blocked, and again the high priority task unblocks and attempts to take the semaphore before the low priority task has released it, blocking again until the semaphore is available. In the meantime the intermediate priority tasks have
ECE, SJBIT
173
06EC82
unblocked, but with priority inheritance, the low priority task has inherited the priority of the blocked high priority task.
Q4. Explain deadlock situation.

The RTOS will schedule the blocking task with its promoted priority first, which runs until the semaphore is released, at which time the high priority task takes the semaphore and runs, and the promoted task is reassigned its initial low priority. Consequently,all tasks will run in the correct priority order. Note that if the high priority task accesses multiple shared resources (that is, there is more than one semaphore), it may potentially block as many times as there are semaphores. Furthermore, priority inheritance protocol does nothing to mitigate deadlock. A more complex algorithm is priority ceiling protocol. In priority ceiling protocol, each task is assigned a static priority, and each semaphore, or resource is assigned a ceiling priority greater than or equal to the maximum priority of all the tasks that use it. At run time, a task assumes a priority equal to the static priority or the ceiling value of its resource, whichever is larger: if a task requires a resource, the priority of the task will be raised to the ceiling priority of the resource; when the task releases the resource, the priority is reset. It can be shown that this scheme minimizes the time that the highest priority task will be blocked, and eliminates the potential of deadlock.
ECE, SJBIT
174
06EC82
UNIT 6
INTRODUCTION TO RTOS: MORE OS SERVICES Tasks - states - Data - Semaphores and shared data. More operating systems services Massage Queues - Mail Boxes -Timers Events - Memory Management. 8 Hours
TEXT BOOKS: 1. An Embedded software Primer - David E. Simon: Pearson Education, 1999 REFERENCE BOOKS: 1. 2. 3. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
175
06EC82
UNIT -6 INTRODUCTION TO RTOS & MORE OS SERVICES

6.1 Tasks 1
Issue Scheduler/Task signal exchange for block-unblock of tasks via function calls Issue All tasks are blocked and scheduler idles forever (not desirable!) Issue Two or more tasks with same priority levels in Ready state (time-slice, FIFO) Example: scheduler switches from processor-hog vLevelsTask to vButtonTask (on user interruption by pressing a push-button), controlled by the main() which initializes the RTOS, sets priority levels, and starts the RTOS (See Fig 6.2, Fig 6.3, Fig 6.4)
ECE, SJBIT
176
06EC82
ECE, SJBIT
177
06EC82
ECE, SJBIT
178
06EC82
6.3 Tasks and Data

Each tasks has its won context - not shared, private registers, stack, etc. In addition, several tasks share common data (via global data declaration; use of extern in one task to point to another ta sk that declares the shared data Shared data caused the shared-data problem without solutions discussed in Chp4 or use of Reentrancy characterization of functions (See Fig 6.5, Fig 6.6, Fig 6.7, and Fig 6.8)
ECE, SJBIT
179
06EC82
ECE, SJBIT
180
06EC82
ECE, SJBIT
181
06EC82
ECE, SJBIT
182
06EC82
6.2 Tasks 2 Reentrancy A function that works correctly regardless of the number of tasks that call it between interrupts Characteristics of reentrant functions Only access shared variable in an atomic-way, or when variable is on callees stack A reentrant function calls only reentrant functions A reentrant function uses system hardware (shared resource) atomically Inspecting code to determine Reentrancy: See Fig 6.9 Where are data stored in C? Shared, non-shared, or stacked?
ECE, SJBIT
183
06EC82
See Fig 6.10 Is it reentrant? What about variable fError? Is printf reentrant? If shared variables are not protected, could they be accessed using single assembly instructions (guaranteeing non-atomicity)?
ECE, SJBIT
184
06EC82
6.3 Semaphores and Shared Data A new tool for atomicity Semaphore a variable/lock/flag used to control access to shared resource (to avoid shared-data problems in RTOS) Protection at the start is via primitive function, called take, indexed by the semaphore Protection at the end is via a primitive function, called release, also indexed similarly Simple semaphores Binary semaphores are often adequate for shared data problems in RTOS (See Fig 6.12 and Fig 6.13)
ECE, SJBIT
185
06EC82
ECE, SJBIT
186
06EC82
ECE, SJBIT
187
06EC82
6.3 Semaphores and Shared Data 1

RTOS Semaphores & Initializing Semaphores Using binary semaphores to solve the tank monitoring problem (See Fig 6.12 and Fig 6.13) The nuclear reactor system: The issue of initializing the semaphore variable in a dedicated task (not in a competing task) before initializing the OS timing of tasks and priority overrides, which can undermine the effect of the semaphores Solution: Call OSSemInit() before OSInit() (See Fig 6.14)
ECE, SJBIT
188
06EC82
ECE, SJBIT
189
06EC82
6.3 Semaphores and Shared Data 2 Reentrancy, Semaphores, Multiple Semaphores, Device Signaling, Fig 6.15 a reentrant function, protecting a shared data, cErrors, in critical section Each shared data (resource/device) requires a separate semaphore for individual protection, allowing multiple tasks and data/resources/devices to be shared exclusively, while allowing efficient implementation and response time Fig 6.16 example of a printer device signaled by a report-buffering task, via semaphore signaling, on each print of lines constituting the formatted and buffered report
ECE, SJBIT
190
06EC82
ECE, SJBIT
191
06EC82
ECE, SJBIT
192
06EC82
6.3 Semaphores and Shared Data 3 Semaphore Problems Messing up with semaphores The initial values of semaphores when not set properly or at the wrong place The symmetry of takes and releases must match or correspond each take must have a corresponding release somewhere in the ES application Taking the wrong semaphore unintentionally (issue with multiple semaphores) Holding a semaphore for too long can cause waiting tasks deadline to be missed Priorities could be inverted and usually solved by priority inheritance/promotion (See Fig 6.17) Causing the deadly embrace problem (cycles) (See Fig 6.18)
ECE, SJBIT
193
06EC82
ECE, SJBIT
194
06EC82
6.3 Semaphores and Shared Data 4 Variants: Binary semaphores single resource, one-at-a time, alternating in use (also for resources) Counting semaphores multiple instances of resources, increase/decrease of integer semaphore variable Mutex protects data shared while dealing with priority inversion problem Summary Protecting shared data in RTOS Disabling/Enabling interrupts (for task code and interrupt routines), faster
ECE, SJBIT
195
06EC82
Taking/Releasing semaphores (cant use them in interrupt routines), slower, affecting response times of those tasks that need the semaphore Disabling task switches (no effect on interrupt routines), holds all other tasks response PART B IN UNIT 6 7.0 MORE OS SERVICES 7.1 Message Queues, Mailboxes and Pipes Basic techniques for inter-task communication and data sharing are: interrupt enable/disable and using semaphores. E.g., the tank monitoring tasks and serial port and printer handling tasks Others supported by RTOS: Message Queues, Mailboxes and Pipes Example of Message Queue: (See Fig 7.1) Task1 and Task2 (guaranteed to be reentrant) compute separate functions Use services of vLogError and ErrorsTask (vLogError enqueues errors for ErrorsTask to process) vLogError is supported by AddToQueue function, which keeps a queue of integers for the RTOS to interpret or map to errortype. Using the ReadFromQueue function, the RTOS then activates ErrorTask to handle the error if the queue is not empty freeing Task1 and Task2 to continue their tasks. Functions AddToQueue and ReadFromQueue are nonreentrant, and the RTOS switches between Task1 and Task2 in the middle of their tasks execution are guaranteed to be ok
ECE, SJBIT
196
06EC82
ECE, SJBIT
197
06EC82
7.1 Message Queues, Mailboxes, and Pipes 1 Difficulties in using Queues: Queue initialization (like semaphore initialization) must be dedicated to a separate task to a) guarantee correct start-up values and b) avoid uncertainty about task priorities and order of execution which might affect the queues content Queues must be tagged (identify which queue is referenced) Need code to manage the queue (when full and empty) if RTOS doesnt block reading/writing task on empty/full, plus returning an error code RTOS may limit the amount of info to write/read to queue in any single call (See Fig 7.2)
ECE, SJBIT
198
06EC82
ECE, SJBIT
199
06EC82
ECE, SJBIT
200
06EC82
Message Queues, Mailboxes, and Pipes Using Pointers and Queues Code in Fig 7.2 limits the amount of data to write to or read from the queue For tasks to communicate any amount of data, create a buffer and write the pointer to the buffer to the queue. (The receiving task reads/retrieves data from the buffer via the pointer, and frees the buffer space.) (See Fig 7.3)
ECE, SJBIT
201
06EC82
ECE, SJBIT
202
06EC82
7.1 Message Queues, Mailboxes, and Pipes Using Mailboxes: Purpose is similar to queues (both supporting asynchronous task communication) Typical RTOS function for managing mailboxes create, write, read, check-mail, destroy Variations in RTOS implementations of mailboxes Either a single-message mailbox or multi-message mailbox (set # entries at start) # of messages per mailbox could be unlimited, but total # in the system could be (with possibility of shuffling/distributing messages among mailboxes) Mailboxes could be prioritized Examples: (from the RTOS MultiTask! ) int sndmsg (unsigned int uMbid, void *p_vMsg, unsigned int uPriority); void *rcvmsg(unsigned int uMbid, unsigned int uTimeout); void *chkmsg(unsigned int uMbid);
Using Pipes: Pipes are implemented as (special) files, using normal file-descriptors RTOS can create, read from, write to, destroy pipes (typically: each pipe has 2 ends) Details of implementation depends on RTOS Pipes can have varying length messages (unlike fixed length for queues / mailboxes) Pipes could be byte-oriented and read/write by tasks depends on # bytes specified In standard C, read/write of pipes use fread/fwrite functions, respectively Programming queues, mailboxes, and pipes caution! Coding tasks to read from or write to intended structure (RTOS cant help on mismatch) Interpretation and processing of message types (see code segments on p. 182)
ECE, SJBIT
203
06EC82
Overflow of structure size could cripple the software, so need to set size as large as possible Passing pointers in structures provides unwanted opportunity to create shared data problem (See Fig 7.4)
ECE, SJBIT
204
06EC82
Timer Functions Issues: Embedded systems track time passage, hence, need to keep time (e.g., to save battery life, power need to be shut off automatically after, say, X seconds; a message send-task expects an ACK after Y seconds, it is delayed Y seconds and may retransmit; task is allowed a slice of time after which it is blocked) RTOS provides these timing services or functions (See Fig 7.5 VsWorks RTOS support for taskDelay(nticks) function in telephone call code)
ECE, SJBIT
205
06EC82
7.2 Timer Functions Issues: How long is delay measured in ticks (a tick is like a single heartbeat timer interrupt time) (See Fig 7.6) RTOS knowledge of time/timer and specifics of nticks or timeinterval relies on microprocessors hardware timer and its interrupt cycles (RTOS writers must know this!) OR RTOS writers write watchdog timers based on non-standard timer hardware and corresponding software interrupts called each time the software timer expires RTOS vendors provide board support packages (BSP) of drivers for timers and other hardware
ECE, SJBIT
206
06EC82
Length of a tick depends on the hardware timers design trade-off Accurate timing short tick intervals OR use dedicated timer for purpose
7.2 Timer Functions Other Timing Services (all based on system tick) Waiting time or delay on message, on a semaphore (but not too tight for high priority tasks to miss access to shared data) Place call to or activation of time -critical, high priority tasks inside timer interrupts or specialized-time-critical tasks inside the RTOS (Note: OS task have higher priority over other embedded software tasks). Calling a function of choice after some S nticks Example: (See Fig 7.7) The Timer Callback Function Note how wdStart function is passed a function vSetFrequency or vTurnOnTxorRx, associated nticks, and the parameter to the function. Also note how the vRadioControlTask communicates with vTurnOnTxorRx and vSetFrequency using the queue queueRadio and msgQreceive/msgQSend)
ECE, SJBIT
207
06EC82
ECE, SJBIT
208
06EC82
ECE, SJBIT
209
06EC82
7.3 Events In standard OS, an event is typically an indication which is related to time In RTOS, an event is a boolean flag, which is set and reset by tasks/routines for other tasks to wait on RTOS is supposed to manage several events for the waiting tasks. Blocked or waiting tasks are unblocked after the event occurrence, and the event is reset E.g., pulling the trigger of a cordless bar-code scanner sets the flag for a waiting task, which turns of the laser beam for scanning, to start running (See Fig 7.8 and Fig 7.9)
ECE, SJBIT
210
06EC82
ECE, SJBIT
211
06EC82
ECE, SJBIT
212
06EC82
7.3 Events 1 Features of events (and comparison with semaphores, queues, mbox, pipes): More than one task can wait on the same event (tasks are activated by priority) Events can be grouped, and tasks may wait on a subset of events in a group Resetting events is either done by the RTOS automatically or your embedded software Tasks can wait on only one semaphore, queue, mbox or pipe, but on many events simultaneously. Semaphores are faster, but unlike queues, mboxes, and pipes, they carry 1-bit info Queues, mboxes, and pipes are error prone and message posting/retrieval is compute-intensive 7.4 Memory Management In general RTOS offer C lang equivalent of malloc and free for MM, which are slow and unpredictable Real time system engineers prefer the faster and more predictable alloc/free functions for fixed size buffers. E.g., MultiTask! RTOS allocates pools of fixed size buffers, using getbuf() [with timed task blocking on no buffers] and reqbuf() [with no blocking and return of NULL pointer on no buffers] relbuf() to free buffers in a given pool (buffer pointer must be valid) Note that most embedded sw is integrated with the RTOS (same address space) and the ES starts the microprocessor; hence your ES must tell the memory-pool (See Fig 7.10 and Fig 7.11 high priority FormatTask and low priority OutputTask)
ECE, SJBIT
213
06EC82
ECE, SJBIT
214
06EC82
ECE, SJBIT
215
06EC82
7.5 Interrupt Routines in an RTOS Environment Rules that IRs must comply with (but not a task code) Rule 1: an IR cant call RTOS function that will cause it to blo ck, e.g., wait on semaphores, reading empty queues or mailboxes, wait on events to avoid high latency or large response time and potential deadlock (See Fig 7.12 which doesnt work; and Fig 7.13 which works using queues)
ECE, SJBIT
216
06EC82
ECE, SJBIT
217
06EC82
7.5 Interrupt Routines in an RTOS Environment 1 Rule 2: an IR cant call RTOS functions that will cause the RTOS to switch other tasks (except other IRs); breaking this rule will cause the RTOS to switch from the IR itself to handle the task, leaving the IR code incomplete or delay lower priority interrupts (See Fig 7.14 should-work case; and Fig 7.15 what really happens case)
7.5 Interrupt Routines in an RTOS Environment 2 One solution to Rule 2
ECE, SJBIT
218
06EC82
Let the RTOS intercept all the interrupts, aided by an RTOS function which tells the RTOS where the IRs are and the corresponding interrupt hardware The RTOS then activates the calling IR or the highest priority IR Control returns to the RTOS, and the RTOS scheduler decides which task gets the microprocessor (allowing the IR to run to completion) (See Fig 7.16)
7.5 Interrupt Routines in an RTOS Environment Second solution to Rule 2: Let the IR call a function in the RTOS to inform the RTOS of an interrupt After the IR is done, control goes back to the RTOS, where another function calls the scheduler to schedule the next task (See Fig 7.17) Third solution to Rule 2: Let RTOS maintain a separate queue of specialized, interrupt-supporting functions which are called by the IR (on the appropriate interrupt). When these functions complete, control goes back to that IR (similar to Fig 7.17 with queues)
ECE, SJBIT
219
06EC82
Interrupt Routines in an RTOS Environment Nested Interrupts If a running IR is interrupted by another (higher) priority interrupt (kind of interrupt stacking), the RTOS should unstack the IRs to allow all IRs to complete before letting the scheduler switch to any task code (See Fig 7.18)
ECE, SJBIT
220
06EC82
RECOMMENDED QUESTIONS UNIT -6 Introduction to RTOS and More operating systems services
1. What are the three states in a task. explain it with neat block diagram 2. Describe the use of take semaphore( ) and release semaphore( ) with an example . 3. Explain any 6 problems with semaphores. 4. Describe the use of message queues, mailbox and pipes. 5. Explain memory management in multitasking. 6. How does interrupt routines work in RTOS environment. 7. What are nested interrupts ? and how do they work?
ECE, SJBIT
221
06EC82
SOLUTION FOR UNIT 6 Q1. How does a microprocessor respond to a button under an RTOS.
Issue Scheduler/Task signal exchange for block-unblock of tasks via function calls Issue All tasks are blocked and scheduler idles forever (not desirable!) Issue Two or more tasks with same priority levels in Ready state (time-slice, FIFO) Example: scheduler switches from processor-hog vLevelsTask to vButtonTask (on user interruption by pressing a push-button), controlled by the main() which initializes the RTOS, sets priority levels, and starts the RTOS
ECE, SJBIT
222
06EC82
Q2. With a diagram explain sharing data among RTOS tasks

Each tasks has its won context - not shared, private registers, stack, etc. In addition, several tasks share common data (via global data declaration; use of extern in one task to point to another task that declares the shared data Shared data caused the shared-data problem without solutions discussed in Chp4 or use of Reentrancy characterization of functions
ECE, SJBIT
223
06EC82
Q3. What is semaphore? How does it help in shared data access along with code. Semaphore a variable/lock/flag used to control access to shared resource (to avoid shared-data problems in RTOS) Protection at the start is via primitive function, called take, indexed by the semaphore Protection at the end is via a primitive function, called release, also indexed similarly Simple semaphores Binary semaphores are often adequate for shared data problems in RTOS
ECE, SJBIT
224
06EC82
Q4. Explain the execution flowgraphs in semaphores.
ECE, SJBIT
225
06EC82
Q5. What are the basic problem in semaphores.

Semaphore Problems Messing up with semaphores The initial values of semaphores when not set properly or at the wrong place The symmetry of takes and releases must match or correspond each take must have a corresponding release somewhere in the ES application Taking the wrong semaphore unintentionally (issue with multiple semaphores) Holding a semaphore for too long can cause waiting tasks deadline to be missed
ECE, SJBIT
226
06EC82
Priorities could be inverted and usually solved by priority inheritance/promotion Causing the deadly embrace problem (cycles) Q6. Describe the use of message queues. Basic techniques for inter-task communication and data sharing are: interrupt enable/disable and using semaphores. E.g., the tank monitoring tasks and serial port and printer handling tasks Others supported by RTOS: Message Queues, Mailboxes and Pipes Example of Message Queue: Task1 and Task2 (guaranteed to be reentrant) compute separate functions Use services of vLogError and ErrorsTask (vLogError enqueues errors for ErrorsTask to process) vLogError is supported by AddToQueue function, which keeps a queue of integers for the RTOS to interpret or map to errortype. Using the ReadFromQueue function, the RTOS then activates ErrorTask to handle the error if the queue is not empty freeing Task1 and Task2 to continue their tasks. Functions AddToQueue and ReadFromQueue are nonreentrant, and the RTOS switches between Task1 and Task2 in the middle of their tasks execution are guaranteed to be ok
ECE, SJBIT
227
06EC82
Q7.Wrtie a code for delaying a task with RTOS delay function.
ECE, SJBIT
228
06EC82
Q8. What are the features of events.

Features of events (and comparison with semaphores, queues, mbox, pipes): More than one task can wait on the same event (tasks are activated by priority) Events can be grouped, and tasks may wait on a subset of events in a group Resetting events is either done by the RTOS automatically or your embedded software Tasks can wait on only one semaphore, queue, mbox or pipe, but on many events simultaneously. Semaphores are faster, but unlike queues, mboxes, and pipes, they carry 1-bit info Queues, mboxes, and pipes are error prone and message posting/retrieval is compute-intensive
ECE, SJBIT
229
06EC82
UNIT 7 & 8 Basic Design Using RTOS Principles- An example, Encapsulating semaphores and Queues. Hard real-time scheduling considerations Saving Memory space and power. Hardware software co-design aspects in embedded systems. 12 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002 2. An Embedded software Primer - David E. Simon: Pearson Education, 1999
REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
230
06EC82
UNIT 7 & 8 BASIG DESIGN USING RTOS

Introduction To design an ES, first pick a software architecture Round Robin Round Robin with Interrupts Function-Queue-Scheduling RTOS ES design concept and techniques discussed in Chp 8 assumes the RTOS architecture Key RTOS mechanisms used include tasks, task management, intertask communication mechanisms (semaphores, queues, mailboxes, pipes), and interrupts 8.1 Overview Prior to design, we must construct a specification of the ES to meet such requirements / properties as: Completeness Time (timing constraints - response time, reactive time, deadlines soft vs. hard) Properties of the target hardware (for effective design of the ES), e.g., a 9600-bps serial port that receives 1200 chars per second, requires an IR that handles interrupts 1200 times each second. If chars can be written to RAM using DMA, IR code will be different Knowledge of microprocessor speed can the mproc run the IR 1200 times per sec? Need all the software engineering skill you have, plus such properties as: structures, encapsulation, info-hiding, modularity, coupling, cohesion, maintainability, testability Effective use of design tools and methodologies RoseRT, OO, UML, YES-UML, Testing and debugging ES requires specialized hardware tools and software tools and techniques
ECE, SJBIT
231
06EC82
8.2 Principles Design considerations ES is interrupt-driven and ES remains dormant until Time passes for an event to occur (timer interrupt) A need for a response to an external request/interrupt arises Interrupts create cascade of events, causing RTOS tasks act/behave accordingly ES design technique: Create all needed tasks, get them into blocked-state or idle state waiting on interrupts (to be generated by an external event, e.g., frame-arrival at a network port) (See Fig 8.1 network port and serial port comm via tasks that implement DDP and ADSP protocol stack)
ECE, SJBIT
232
06EC82
8.2 Principles 1 Write Short IRs: Even lowest priority IRs are handled before the highest priority task code (minimize task code response time) IRs are error prone and hard to debug (due to hardware -dependent software parts) Parts IR code requiring immediate / quick response should be in the core of IR code; parts needing longer processing and not -so-urgent response should be done a task (signaled by the IR) 8.2 Principles 2 Consider the ff specs: A system responds to commands from a serial port All commands end with a carriage-return (CR) Commands arrive one at a time, the next arrives iff the preceding one is processed Serial ports buffer is 1 character long, and characters arrive quickly (at X bps) Systems processing time per character is Y char per second Three possible designs: A. Let IR handle everything => long response time, big IR code, hard to debug errors B. Let skeletal IR code, with a command parsing task that queues commands (with all the attendant message/data queuing problems C. Better compromise: Let IR save chars in a mailbox-buffer until CR, then the command parsing task can work on the buffer (See Fig 8.2 IR and parsing-task use different parts of the mail-buffer: tail and head)
ECE, SJBIT
233
06EC82
ECE, SJBIT
234
06EC82
8.2 Principles 3 Problem Decomposition into Tasks How many tasks? Considerations (+ if carefully decomposed and few tasks; and if theres no choice): +More tasks offer better control of overall response time +Modularity different task for different device handling or functionality +Encapsulation data and functionality can be encapsulated within responsible task - More tasks means data-sharing, hence more protection worries and long response time due to associated overheads - More task means intertask messaging, with overhead due to queuing, mailboxing, and pipe use - More tasks means more space for task stacks and messages - More tasks means frequent context switching (overhead) and less throughput - More tasks means frequent calls to the RTOS functions (major overhead adds up) 8.2 Principles 3 Priorities (advantage of using RTOS software architecture): Decomposing based on functionality and time criticality, separates ES components into tasks (naturally), for quicker response time using task prioritization high priority for time-critical ones, and low priority for others Encapsulating functionality in Tasks A dedicated task to encapsulate the handling of each shared device (e.g., printer display unit) or a common data structure (e.g., an error log) (See Fig 8.3) Parts of a target hardware storing data in a flash memory a single task encapsulates the handling of permission-to-write-to-flash (set / reset of flash at given times) (See Fig 8.4 using POSIX standard RTOS functions: mq_open, mq_receive, mq_send, nanosleep)
ECE, SJBIT
235
06EC82
ECE, SJBIT
236
06EC82
ECE, SJBIT
237
06EC82
ECE, SJBIT
238
06EC82
ECE, SJBIT
239
06EC82
ECE, SJBIT
240
06EC82
8.2 Principles 4 Other Tasks ? Need many small, simple tasks? But worry about data-sharing, intertask comm Need a task per stimuli? Same problems! Recommended Task Structure Modeled/Structured as State-Machines Tasks run in an infinite loop Tasks wait on RTOS for an event (expected in each tasks independent message queue) Tasks declare their own private data to use (fully encapsulated) Tasks block on in one place (RTOS signal), and not any other semaphore, no data sharing Tasks use no microprocessor time when their queues are empty 8.2 Principles 5 Avoid Creating and Destroying Tasks Creating tasks takes more system time Destroying tasks could leave destroy pointers-to-messages, remove semaphore others are waiting on (blocking them forever) Rule-of-thumb: Create all tasks needed at start, and keep them if memory is cheap! Turn Time-Slicing Off Useful in conventional OSs for fairness to user programs In ESs fairness is not an issue, response-time is! Time-slicing causes context switching time consuming and diminishes throughput Where the RTOS offers an option to turn time-slicing off, turn it off. 8.2 Principles 6 Restrict the use of RTOS functions/features Customize the RTOS features to your needs (Note: the RTOS and your ES gets linked and located together into same address space of ROM/RAM See Chapter 9)
ECE, SJBIT
241
06EC82
If possible write ES functions to interface with RTOS select features to minimize excessive calls to several RTOS functions (increases opportunity for errors) Develop a shell around the RTOS functions, and let your own ES tasks call the shell (and not the RTOS directly) improves portability since only the shell may be rewritten fro RTOS to RTOS
8.3 An Example Designing an Underground Tank Monitoring ES System Summary of Problem Specification: System of 8 underground tanks Measures read: temperature of gas (thermometer) read at any time float levels (float hardware) interrupted periodically by the microprocessor Calculate the number of gallons per tank using both measures Set an alarm on leaking tank (when level slowly and consistently falls over time) Set an alarm on overflow (level rising slowly close to full-level) User interface: a) 16-button control panel, LCD, thermal printer System can override user display options and show warning messages Histories of levels and temperature over time can be requested by user (30-50 lines long) and user can queue up several reports Issuing commands require 2 or 3 buttons, and system can prompt the display in the middle of a user command sequence Buttons interrupt the microprocessor One dedicated button turns alarm off (connected to the system) through software The printer prints one line at a time, and interrupts the microprocessor when done The LCD prints the most recent line; saves its display-data and doesnt need the microprocessor to retrieve info (See Fig 8.7)
ECE, SJBIT
242
06EC82
8.3 An Examples Issues that remain incomplete specs: What is displayed? Timing info? Print-line length? How often is float-level read? What is the response time on push-button user interface response? Printer speed number of lines per second? What is the microprocessor speed? Which kind, 8-bit? The time to set/reset alarm? Compute-time for # of gallons? 4-5 sec? (influences code design and tasking and kind of microprocessor if no calc is required to set overflow alarm, that saves time!) Knowing # gallons, what is the tolerant time-interval, or responsetime, to set alarm?
ECE, SJBIT
243
06EC82
Is reading a pair of temperature and float-level data for one tank at a time? How is software interface to alarm-set off done write a bit flag to memory or power cutoff to the alarm device Does the microprocessor come with a timer? 8.3 An Example Which Architecture? If RTOS, meeting deadlines depends on dealing with the 4-5 secs time required to calculate the # of gallons requires task suspensions, perhaps, with less IRs usage; and above all, the microprocessor must support some RTOS If not RTOS, meeting deadlines requires the use of several interrupts (and IRs) BASIC DESIGN OF AN EMBEDDED SOFTWARE (ES) USING RTOS An Example System Decomposition for Tasks One low priority task that handles all # gallons calculations and detects leaks as well (for all tanks 1 at a time) A high priority overflow-detection task (higher than a leak-detection task) A high priority float-hardware task, using semaphores to make the level-calc and overflow-detection task wait on it for reading (semaphores will be simpler, faster than queuing requests to read levels) A high priority button handling tasks need a state-machine model (an IR? with internal static data structures, a simple wait on buttonsignal, and an action which is predicated on sequence of button signals) since semaphores wont work (See Fig 8.8) A high priority display task to handle contention for LCD use [Turning the alarm bell on/off by the level-calc, overflow, and userbutton is typically non-contentious an atomic op hence do not
ECE, SJBIT
244
06EC82
need a separate alarm-bell task] However, need a module with BellOn(), BellOff() functions to encapsulate the alarm hardware Low priority task to handle report formatting (one line at a time), and handle report queue (See Table 8.2)
ECE, SJBIT
245
06EC82
8.3 An Example Moving System Forward Putting it together as Scenarios System is interrupt driven via interrupt routines responding to signals, activating tasks to their work User presses button, button hardware interrupts the microprocessor, the button IR sends message to button-handling task to interpret command, which activates display task or printer task Timer interrupts, timer IR -> signal to Overflow-Detection task
ECE, SJBIT
246
06EC82
Moving System Forward Putting it together as Scenarios 1 User presses printer button, print IR signals print-formatting task -> which sends first line to printer; printer interrupts for print IR to send next line to printer; when all lines (for report) are done, print IR signals print-formatting task for next report A level task need to read, it interrupts the level-read-hardware routine; the level is read by the hardware and the IR interrupts the task to read the new float level Dealing with Shared Level-Data: Three tasks need this data: level-calc for leak detection; display task; print formatting task Reading level data and processing it by given task takes a few msec or msec Use semaphores: let level-calc and display tasks read and process level in critical section (CS) and let formatting task copy level data in CS, release semaphore, and format outside CS See Fig 8.9
ECE, SJBIT
247
06EC82
8.4 Encapsulating Semaphores and Queues Encapsulating Semaphores: Dont assume that all tasks will use semaphore correctly (take/release), leading to errors Protect semaphores and associated data encapsulate/hide them in a task Let all tasks call a separate module (acting as an intermediary) to get to the CS - this separate module/function will in turn call the task which encapsulates the semaphore (See Fig 8.10 the correct code) (See Fig 8.11 the incorrect alternative, which bypasses the intermediate function
ECE, SJBIT
248
06EC82
ECE, SJBIT
249
06EC82
ECE, SJBIT
250
06EC82
ECE, SJBIT
251
06EC82
ECE, SJBIT
252
06EC82
ECE, SJBIT
253
06EC82
8.4 Encapsulating Semaphores and Queues Encapsulating Queues: Writing to or reading from a flash memory using queues to enqueue messages, the correctness of Fig 8.4 implementation depends passing the correct FLASH_MSG type Can a message meant for the FLASH be enqueued elsewhere Exposing the flash queue to inadvertent deletion or destruction Extra layer of data queue for holding data read from the FLASH could this auxiliary queue be referenced wrongly? Type compatible with the FLASH content? Solution Encapsulate the Flash Queue structure inside a separate module, flash.c; with access to it only through intermediate task vHandleFlashTask, which is supported by auxiliary functions vReadFlash and vWriteFlash. [The handle-task provides an interface for all other tasks to get to the queue] (See Fig 8.13)
ECE, SJBIT
254
06EC82
ECE, SJBIT
255
06EC82
ECE, SJBIT
256
06EC82
ECE, SJBIT
257
06EC82
ECE, SJBIT
258
06EC82
8.5 Hard Real-Time Scheduling Considerations Guaranteeing that the system will meet hard deadlines comes from writing fast code Issues: fast algorithms, efficient data structures, code in assembly (if possible) Characterizing real-time systems: Made of n tasks that execute periodically every Tn units of time Each task worst case execution time, Cn units of time and deadline of Dn Assume task switching time is 0 and non-blocking on semaphore Each task has priority Pn Question: SCn = S(Dn + Jn) < Tn, where Jn is some variability in tasks time Predicting Cn is very important, and depends on avoiding variability in execution times for tasks, functions, access time of data structures/buffers, semaphore blocking any operation that cant be done in the same time units on each execution/access 8.6 Saving Memory Space
ECE, SJBIT
259
06EC82
Considerations of limited memory space for ES systems Code is stored in ROM (loaded into RAM for execution), Data is stored in RAM (except for initialization/shadowing. The two memory space types are not interchangeable Trade-offs: packed data saves RAM space, but unpacking code takes ROM space Estimate space by: A. Tasks take stack space, fewer tasks take less RAM space, inspect code to estimate stack-bytes per task local variables, parameters, function nesting-level, worst-case nesting of interrupt routines, space for the RTOS (or select features) from the manual B. Experimental runs of the code not easy and wont reflect worstcase behavior 8.6 Saving Memory Space 1 Techniques / Suggestions: Substitute or eliminate large functions, watch for repeated calls to large functions Consider writing your own function to replace RTOS functions, watch RTOS functions that call several others Configure or customize the RTOS functions to suit only the needs of the ES Study assembly listing of cross-compilers, and rework your C code or write your own assembly unit/task Use static variable instead of relying on stack variables (push/pop and pointering takes space) Copy data structures passed to a function, via a pointer, into the functions local, static variables process the data and copy back into structures: trade-off code is slower For an 8-bit processor, use char instead of int variable (int takes 2bytes and longer in calculations than 1-byte chars) If ROM is really tight, experiment with coding most functions/tasks in assembly lang Saving Power Some embedded systems run on battery, turning battery off for some or all devices is good Generally, how to do you save power?
ECE, SJBIT
260
06EC82
Look for the power-saving modes (enablers) which the manufacturers provide Software can put microprocessor in one the modes via special instruction or writing a code to special register in the processor. The software must be fast!! Power saving modes: sleep, low-power, idle, standby, etc. Typical: uproc stops running, all built-in devices, and clock circuit (but leave static RAM power on since the wattage is very small) Waking uproc up is done by special circuitry and software (to avoid restart and reset write special code to RAM address and let software check if it is cold start or restart from power saving mode) Alternative: uproc stops running but all devices stay alive, uproc is resume by interrupt (this is less a hassle that stopping all devices) If software turns power of devices back on, status data for resumption must be in EEROM, and for those devices Turn off built-in devices that signal frequently from hi-low, low-hi power hungry!
ECE, SJBIT
261
06EC82
RECOMMENDED QUESTIONS UNIT 7 & 8

Basic Design Using RTOS
1. Explain the basic operation of telegraph system under embedded system . 2. How to avoid creating and destroying of tasks. 3. Explain underground tank monitoring system. 4. How do you encapsulate a semaphore. Explain 5. What are considerations in real time scheduling consideration. 6. How to save memory space in embedded system design. 7. Explain the techniques to save power.
ECE, SJBIT
262
06EC82
SOLUTION FOR UNIT 7 & 8 Q1.Explain the basic telegraph operation with block diagram.
Q2. What are the advantages of using RTOS software architecture.

Priorities (advantage of using RTOS software architecture): Decomposing based on functionality and time criticality, separates ES components into tasks (naturally), for quicker response time using task prioritization high priority for time-critical ones, and low priority for others
ECE, SJBIT
263
06EC82
Encapsulating functionality in Tasks A dedicated task to encapsulate the handling of each shared device (e.g., printer display unit) or a common data structure (e.g., an error log) (See Fig 8.3) Parts of a target hardware storing data in a flash memory a single task encapsulates the handling of permission-to-write-to-flash (set / reset of flash at given times) (See Fig 8.4 using POSIX standard RTOS functions: mq_open, mq_receive, mq_send, nanosleep)
ECE, SJBIT
264
06EC82
ECE, SJBIT
265
06EC82
Q4. How to avoid creating and destroying of tasks. Avoid Creating and Destroying Tasks Creating tasks takes more system time Destroying tasks could leave destroy pointers-to-messages, remove semaphore others are waiting on (blocking them forever) Rule-of-thumb: Create all tasks needed at start, and keep them if memory is cheap! Turn Time-Slicing Off Useful in conventional OSs for fairness to user programs In ESs fairness is not an issue, response-time is! Time-slicing causes context switching time consuming and diminishes throughput Where the RTOS offers an option to turn time-slicing off, turn it off.
ECE, SJBIT
266
06EC82
Q5. Explain the design of underground tank monitoring system.

Designing an Underground Tank Monitoring ES System Summary of Problem Specification: System of 8 underground tanks Measures read: temperature of gas (thermometer) read at any time float levels (float hardware) interrupted periodically by the microprocessor Calculate the number of gallons per tank using both measures Set an alarm on leaking tank (when level slowly and consistently falls over time) Set an alarm on overflow (level rising slowly close to full-level) User interface: a) 16-button control panel, LCD, thermal printer System can override user display options and show warning messages Histories of levels and temperature over time can be requested by user (30-50 lines long) and user can queue up several reports Issuing commands require 2 or 3 buttons, and system can prompt the display in the middle of a user command sequence Buttons interrupt the microprocessor One dedicated button turns alarm off (connected to the system) through software The printer prints one line at a time, and interrupts the microprocessor when done The LCD prints the most recent line; saves its display-data and doesnt need the microprocessor to retrieve info (See Fig 8.7)
ECE, SJBIT
267
06EC82
\ Q6. How to encapsulate queues. Encapsulating Queues: Writing to or reading from a flash memory using queues to enqueue messages, the correctness of Fig 8.4 implementation depends passing the correct FLASH_MSG type Can a message meant for the FLASH be enqueued elsewhere Exposing the flash queue to inadvertent deletion or destruction Extra layer of data queue for holding data read from the FLASH could this auxiliary queue be referenced wrongly? Type compatible with the FLASH content?
ECE, SJBIT
268
06EC82
Solution Encapsulate the Flash Queue structure inside a separate module, flash.c; with access to it only through intermediate task vHandleFlashTask, which is supported by auxiliary functions vReadFlash and vWriteFlash. [The handle-task provides an interface for all other tasks to get to the queue] (See Fig 8.13)
Q8.Waht are the techniques to save power and memory.

Techniques / Suggestions: Substitute or eliminate large functions, watch for repeated calls to large functions Consider writing your own function to replace RTOS functions, watch RTOS functions that call several others Configure or customize the RTOS functions to suit only the needs of the ES
ECE, SJBIT
269
06EC82
Study assembly listing of cross-compilers, and rework your C code or write your own assembly unit/task Use static variable instead of relying on stack variables (push/pop and pointering takes space) Copy data structures passed to a function, via a pointer, into the functions local, static variables process the data and copy back into structures: trade-off code is slower For an 8-bit processor, use char instead of int variable (int takes 2bytes and longer in calculations than 1-byte chars) If ROM is really tight, experiment with coding most functions/tasks in assembly lang Saving Power Some embedded systems run on battery, turning battery off for some or all devices is good Generally, how to do you save power? Look for the power-saving modes (enablers) which the manufacturers provide Software can put microprocessor in one the modes via special instruction or writing a code to special register in the processor. The software must be fast!! Power saving modes: sleep, low-power, idle, standby, etc. Typical: uproc stops running, all built-in devices, and clock circuit (but leave static RAM power on since the wattage is very small) Waking uproc up is done by special circuitry and software (to avoid restart and reset write special code to RAM address and let software check if it is cold start or restart from power saving mode) Alternative: uproc stops running but all devices stay alive, uproc is resume by interrupt (this is less a hassle that stopping all devices) If software turns power of devices back on, status data for resumption must be in EEROM, and for those devices Turn off built-in devices that signal frequently from hi-low, low-hi power .
ECE, SJBIT
270

Embedded System Design Metrics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Embedded System Design Metrics

Uploaded by

Copyright:

Available Formats

Embedded System Design

Sub: Embedded System Design Sem: VIII PART A

Sub code: 06EC82