You are on page 1of 71

ITEC 1000 “Introduction to Information Technology”

Lecture 3
Data Formats

objectimagegallery.com

1
Prof. Peter Khaiter
Lecture Template:

 Data Forms
 Data conversion and representation
 Data Formats
 Alphanumeric Data
 Image Data
 Audio Data
 Data Input
 Data Compression
 Internal Computer Data Format 2
Data Forms

 Human communication
Includes language, images and sounds
 Computers
Process and store all forms of data in
binary format
 Conversion to computer-usable
representation using data formats
Define the different ways human data
may be represented, stored and
processed by a computer
3
Data conversion and representation

4
Data formats

 Proprietary formats
Unique to a product or company
E.g., Microsoft Word, Word Perfect
 Standards (evolve in two ways):
Proprietary formats become de facto
standards (e.g., Adobe PostScript)
Invented by an international standard
organization (e.g., Motion Pictures
Experts Group, MPEG)

5
Common Data Representations

Type of Data Standard(s)


Alphanumeric Unicode, ASCII, EDCDIC
Image (bitmapped) GIF (graphical image format)
TIF (tagged image file format)
PNG (portable network graphics)

Image (object) PostScript, JPEG, SWF


(Macromedia Flash), SVG
Outline graphics and PostScript, TrueType
fonts
Sound WAV, AVI, MP3, MIDI, WMA
Page description PDF (Adobe Portable Document
Format), HTML, XML
Video Quicktime, MPEG-2, RealVideo,
WMV 6
Alphanumeric Data

 Characters (r, T), number digits (0..9),


punctuation (!, ;), special purpose
characters ($, &)
 Four codes/standards to represent letters
and numbers:
BCD (Binary-Coded Decimal)
Unicode
ASCII (American Standard Code for
Information Interchange)
EBCDIC (Extended Binary Coded Decimal
Interchange Code)

7
Standard Alphanumeric Formats

 BCD Next 2 slides


 ASCII
 EBCDIC
 Unicode

8
Binary-Coded Decimal (BCD)

 Four bits per digit Digit Bit pattern


0 0000
Note: the following 6 1 0001
bit patterns are not 2 0010
used: 3 0011
4 0100
1010
5 0101
1011
1100 6 0110
1101 7 0111
1110 8 1000
1111 9 1001
9
BCD: Example

 709310 = ? (in BCD)

7 0 9 3

0111 0000 1001 0011

10
Standard Alphanumeric Formats

 BCD
 ASCII Next 13 slides
 EBCDIC
 Unicode

11
ASCII Features

 Developed by ANSI (American National Standards Institute)


 Defined in ANSI document X3.4-1977
 7-bit code
 8th bit is unused (or used for a parity bit or to indicate
“extended” character set)
 27 = 128 different codes
 Two general types of codes:
95 are “Printing” codes (displayable on a console)
33 are “Control” codes (control features of the console or
communications channel)
 Represents
Latin alphabet, Arabic numerals, standard punctuation
characters
Plus small set of accents and other European special
characters (Latin-I ASCII)

12
ASCII Table

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
13
ASCII Table

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 Most$ significant
4 bit D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
Least
1100
significant
FF FS
bit , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
14
ASCII Table

e.g., ‘a’ = 1100001

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
15
ASCII Table

95 Printing codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
16
ASCII Table
33 Control codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
17
ASCII Table
Alphabetic codes

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
18
ASCII Table
Numeric codes

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
19
ASCII Table
Punctuation, etc.

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
20
ASCII Table

MSD
LSD 0 1 2 3 4 5 6 7
0 NUL DLE SP 0 @ P p
1 SOH DC1 ! 1 A Q a W
2 STX DC2 “ 2 B R b r
3 ETX DC3 # 3 C S c s
4 EOT DC4 $ 4 D T d t
5 ENQ NAK % 5 E U e u 7416
6 ACJ SYN & 6 F V f v
111 0100
7 BEL ETB ‘ 7 G W g w
8 BS CAN ( 8 H X h x
9 HT EM ) 9 I Y i y
A LF SUB * : J Z j z
B VT ESC + ; K [ k {

C FF FS , < L \ l |
D CR GS - = M ] m }
E SO RS . > N ^ n ~
F SI US / ? O _ o DEL
21
Example: “Hello, world”

Binary Hexadecimal Decimal


H = 1001000 = 48 = 72
e = 1100101 = 65 = 101
l = 1101100 = 6C = 108
l = 1101100 = 6C = 108
o = 1101111 = 6F = 111
, = 0101100 = 2C = 44
= 0100000 = 20 = 32
w = 1110111 = 77 = 119
o = 1100111 = 67 = 103
r = 1110010 = 72 = 114
l = 1101100 = 6C = 108
d = 1100100 = 64 = 100
22
Common Control Codes

 CR 0D carriage return
 LF 0A line feed
 HT 09 horizontal tab
 DEL 7F delete
 NULL 00 null

Hexadecimal code

23
ASCII Table: Common Control Codes

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
24
Standard Alphanumeric Formats

 BCD
 ASCII
 EBCDIC Next 3 slides
 Unicode

25
EBCDIC

 8-bit code ASCII EBCDIC


 Developed by IBM
 IBM and compatible
mainframes only Space 2016 4016
 Rarely used today
(common in archival
data) A 4116 C116
Character codes differ
from ASCII
 Conversion software
to/from ASCII b 6216 8216
available

26
EBCDIC Table (1 out of 2)

27
EBCDIC Table (2 out of 2)

28
Standard Alphanumeric Formats

 BCD
 ASCII
 EBCDIC
 Unicode Next 2 slides

29
Unicode

 Most common 16-bit form represents 65,536


characters
 ASCII Latin-I subset of Unicode
Values 0 to 255 in Unicode table
 Multilingual: defines codes for
Nearly every character-based alphabet
Large set of ideographs for Chinese, Japanese and
Korean
Composite characters for vowels and syllabic clusters
required by some languages
 Allows software modifications for local-languages

30
Two-byte Unicode Assignment Table

31
Collating Sequence

 Collating Sequence – the order of the


codes in the representation table
 Determines sorting and selection of the
alphanumeric data
 Collating Sequences are different in ASCII
and EBCDIC:
Small letters precede capitals in EBCDIC;
reverse in ASCII
Numbers collate first in ASCII; in EBCDIC, last

32
Two Classes of Codes

 Printing characters
Produced output on the screen or
printer
 Control characters
Control position of output on screen or
printer
Cause action to occur
Communicate status between computer
and I/O device

33
Control Code Definitions (ASCII Table)

34
Escape Sequences

 Extend the capability of the ASCII code set


 For controlling terminals and formatting output
 Defined by ANSI in documents X3.41-1974 and
X3.64-1977
 The escape code is ESC = 1B16
 An escape sequence begins with two codes:
ESC [

1B16 5B16
35
Escape Sequences: Examples

 Erase display: ESC [ 2 J


 Erase line: ESC [ K

36
Alphanumeric Input: Keyboard

 Scan code
Two different binary scan codes generated
when key is struck and when key is released
Converted to Unicode, ASCII or EBCDIC by
software in terminal or PC
Received by the host as a stream of text and
other characters, i.e. in the sequence typed
 Advantage
Easily adapted to different languages or
keyboard layout
Separate scan codes for key press/release for
multiple key combinations
Examples: shift and control keys 37
Shift Key

 inhibits bit 5 in the ASCII code


ASCII code
Key(s) 6 5 4 3 2 1 0 Character

a 1 1 0 0 0 0 1 a

Shift a 1 0 0 0 0 0 1 A

38
Control Key

 inhibits bits 5 & 6 in the ASCII


code
ASCII code
Key(s) 6 5 4 3 2 1 0 Character

c 1 1 0 0 0 1 1 c

Ctrl c 0 0 0 0 0 1 1 ETX
Control
code 39
Keyboard Input

Three letters are typed: “D”, “I”, “R”, followed by


the carriage return
Four scan codes translated to ASCII binary
codes: 1000100, 1001001, 1010010, 0001101
40
OCR (optical character recognition)

 Scans text and inputs it as character


data
 Special OCR software required
 Used to read specially encoded
characters
• Example: magnetically printed check
numbers
 Attempts to recognize hand-written
input (limited, only carefully printed)
41
Bar Code Readers

 Used in applications that


require fast, accurate and
repetitive input with minimal
employee training
 Examples: supermarket
checkout counters and
inventory control
 Alphanumeric data in bar code
(i.e., 780471 108801 90000)
read optically using wand that
converts them into electrical
binary signals
 A bar code translation module
converts the binary input into a
sequence of number codes ,
one code per digit, then
42
translated to Unicode or ASCII.
Other Alphanumeric Input

 Magnetic stripe reader:


alphanumeric data from credit cards
 Voice
Digitized audio recording common but
conversion to alphanumeric data
difficult
Requires knowledge of sound patterns in a
language (phonemes) plus rules for
pronunciation, grammar, and syntax

43
Image Data

 Photographs, figures, icons, drawings, charts


and graphs
 Two approaches:
Bitmap or raster images of photos and paintings with
continuous variation (e.g., GIF, JPEG)
Object or vector images composed of graphical shapes
like lines and curves defined geometrically
 Differences include:
Quality of the image
Storage space required
Time to transmit
Ease of modification

44
Image Input

 Image scanning (moves over the image


converting dot by dot into a stream of
binary numbers, pixels, representing black
or white, or levels of gray, or of a colour) –
bitmap image
 Digital/video cameras – bitmap image
 Pointing devices (mouse, pen)- object
image

45
Bitmap Images

 Each individual pixel (pi(x)cture element) in a


graphic stored as a binary number
Pixel: A small area with associated coordinate
location
Example: each point below represented by a 4-bit
code corresponding to 1 of 16 shades of gray

46
Bitmap Display

 Monochrome: black or white


1 bit per pixel
 Gray scale: black, white or 254 shades of
gray
1 byte per pixel
 Color graphics: 16 colors, 256 colors, or
24-bit true color (16.7 million colors)
4, 8, and 24 bits respectively

47
Storing Bitmap Images

 Frequently large files


Example: 600 rows of 800 pixels with 1 byte
for each of 3 colors ~1.5MB file
 File size affected by
Resolution (the number of pixels per inch)
Amount of detail affecting clarity and sharpness of
an image
Levels: number of bits for displaying shades of
gray or multiple colors
Palette: color translation table that uses a code for
each pixel rather than actual color value
Data compression

48
GIF (Graphics Interchange Format)

 First developed by CompuServe in 1987


 GIF89a enabled animated images
allows images to be displayed sequentially at
fixed time sequences
 Color limitation: 256
 Image compressed by LZW (Lempel-Zif-
Welch) algorithm
 Preferred for line drawings, clip art and
pictures with large blocks of solid color
 Lossless compression
49
GIF (Graphics Interchange Format)

50
JPEG
(Joint Photographers Expert Group)

 Allows more than 16 million colors


 Suitable for highly detailed
photographs and paintings
 Employs special compression
algorithm that
Discards data to decreases file size and
transmission speed
May reduce image resolution, tends to
distort sharp lines
51
Other Bitmap Formats

 TIFF (Tagged Image File Format): .tif (pronounced


tif)
Used in high-quality image processing, particularly in
publishing
 BMP (BitMaPped): .bmp (pronounced dot bmp)
Device-independent format for Microsoft Windows
environment: pixel colors stored independent of output
device
 PCX: .pcx (pronounced dot p c x)
Windows Paintbrush software
 PNG: (Portable Network Graphics): .png
(pronounced ping)
Designed to replace GIF and JPEG for Internet applications
Patent-free
Improved lossless compression
No animation support 52
Object Images

 Created by drawing packages or output


from spreadsheet data graphs
 Composed of lines and shapes in various
colors
 Computer translates geometric formulas to
create the graphic
 Storage space depends on image
complexity
number of instructions to create lines, shapes,
fill patterns
 Movies Shrek and Toy Story use object
images
53
Object Images

 Based on mathematical formulas


Easy to move, scale and rotate without losing
shape and identity as bitmap images may
 Require less storage space than bitmap
images
 Cannot represent photos or paintings
 Cannot be displayed or printed directly
Must be converted to bitmap since output
devices except plotters are bitmap

54
Popular Object Graphics Software

 Most object image formats are proprietary


Files extensions include .wmf, .dxf, .mgx, and .cgm
 Macromedia Flash: low-bandwidth animation
 Micrographx Designer: technical drawings to
illustrate products
 CorelDraw: vector illustration, layout, bitmap
creation, image-editing, painting and animation
software
 Autodesk AutoCAD: for architects, engineers,
drafters, and design-related professionals
 W3C SVG (Scalable Vector Graphics) based on
XML Web description language
Not proprietary 55
PostScript

 Page description language: list of


procedures and statements that
describe each of the objects to be
printed on a page
Stored in ASCII or Unicode text file
Interpreter program in computer or
output device reads PostScript to
generate image
Scalable font support
Font outline objects specified like other
objects 56
PostScript Program

57
Representing Characters as Images

 Characters stored in format like Unicode or


ASCII
Text processed and stored primarily for
content
 Presentation requirements like font stored
with the character
Text appearance is primary factor
Example: screen fonts in Windows
 Glyphs: Macintosh coding scheme that
includes both identification and
presentation requirement for characters
58
Bitmap vs. Object Images

Bitmap (Raster) Object (Vector)

Pixel map Geometrically defined shapes

Photographic quality Complex drawings

Paint software Drawing software

Larger storage requirements Higher computational


requirements
Enlarging images produces Objects scale smoothly
jagged edges
Resolution of output limited by Resolution of output limited by
resolution of image output device

59
Video Images

 Require massive amount of data


Video camera producing full screen 640 x 480 pixel true
color image at 30 frames/sec 27.65 MB of
data/sec
1-minute film clip 1.6 GB storage
 Options for reducing file size: decrease size of
image, limit number of colors, reduce frame rate
 Method depends on how video delivered to users
Streaming video: video displayed as it is downloaded
from the Web server
Example: video conferencing
Local data (file on DVD or downloaded onto system) for
higher quality
MPEG-2: movie quality images with high compression
require substantial processing capability 60
Audio Data

 Transmission and processing


requirements less demanding than
those for video
 Waveform audio: digital
representation of sound
 MIDI (Musical Instrument Digital
Interface): instructions to recreate or
synthesize sounds
 Analog sound converted to digital
values by A-to-D converter 61
Waveform Audio

Sampling rate
normally 50KHz

62
Sampling Rate

 Number of times per second that sound is


measured during the recording process.
1000 samples per second = 1 KHz (kilohertz)
Example: Audio CD sampling rate = 44.1KHz
 Height of each sample saved as:
8-bit number for radio-quality recordings
16-bit number for high-fidelity recordings
2 x 16-bits for stereo

63
MIDI

 Music notation system that allows


computers to communicate with
music synthesizers
 Instructions that MIDI instruments
and MIDI sound cards use to
recreate or synthesize sounds.
Do not store or recreate speaking or
singing voices
More compact than waveform
3 minutes = 10 KB
64
Audio Formats

 MP3
Derivative of MPEG-2 (ISO Moving Picture
Experts Group)
Uses psychoacoustic compression techniques
to reduce storage requirements
Discards sounds outside human hearing
range: lossy compression
 WAV
Developed by Microsoft as part of its
multimedia specification
General-purpose format for storing and
reproducing small snippets of sound

65
.WAV Sound Format

66
Data Compression

 Compression: recoding data so that it requires


fewer bytes of storage space.
 Compression ratio: the amount file is shrunk
 Lossless: inverse algorithm restores data to exact
original form
Examples: GIF, PCX, TIFF
 Lossy: trades off data degradation for file size
and download speed
Much higher compression ratios, often 10 to 1
Example: JPEG
Common in multimedia
 MPEG-2: uses both forms for ratios of 100:1
67
Compression Algorithms

 Repetition
0587000034000 015870434
03
Example: large blocks of the same color
 Pattern Substitution
Scans data for patterns
Substitutes new pattern,  Pe  pi  ed
makes dictionary entry  er  ck  pe
Example: 45 to 30 bytes  Pi
plus dictionary
Peter Piper picked a peck of pickled peppers.
 t   p    a   of  l   pp  s.
68
Internal Computer Data Format

 All data stored as binary numbers


 Interpreted based on
Operations computer can perform
Data types supported by programming
language used to create application

69
Five Simple Data Types

 Boolean: 2-valued variables or constants with


values of true or false
 Char: Variable or constant that holds
alphanumeric character
 Enumerated
User-defined data types with possible values listed in
definition
Type DayOfWeek = Mon, Tues, Wed, Thurs, Fri, Sat, Sun
 Integer: positive or negative whole numbers
 Real
Numbers with a decimal point
Numbers whose magnitude, large or small, exceeds
computer’s capability to store as an integer
70
Thank you!
Reading: Lecture slides and notes, Chapter 3

71

You might also like