You are on page 1of 4

Data and File Format Standards

Data and le format standardisation is crucial for sharing data among multiple applications, and for exchanging information between applications. However, standards are rarely nal; they must evolve on an ongoing basis to address new demands and new technology. A large number of different formats, standards as well as proprietary, are in use. We will examine a few very popular and typical formats: Rich-text format (RTF) Tagged image le format (TIFF) Resources image le format (RIFF) Musical (MIDI) instrument digital interface

COMP3600 Multimedia Systems

Data and File Format Standards


Wai Wong

Graphics interchange format (GIF)

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 1

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 2

Rich-Text Format
Simple ASCII text is a limited form of data exchange because when text is moved from one application to another, all formatting information is lost. However, this remains one of the most popular and convenient form of data exchange. Rich-text format (RTF) expands the range of information that can be pass between applications. This assumes that both the source application and destination application have a reasonable common set of features. The key information carried in RTF les are: Character sets determine the characters that are ported in a particular implementation. Font table lists all fonts used in the
COMP3600 Multimedia Systems

document. These fonts are then mapped to the fonts available in the receiving application for displaying the text. Colour table lists the colours used in the document for highlighting text (i.e., the characters are a specic colour, not black). Document formatting information of the format applying to the entire document, such as document margins, and so on. Section formatting Section breaks (and page breaks) are used to dene separation of groups of paragraphs. The formatting information species the space above and below the section. Paragraph formatting The RTF specication denes control characters for specifying paragraph justication, tab
(199811)

positions, left, right, and rst indents relative to document margins, and the spacing between paragraphs. Paragraph formatting information also includes style sheets. General formatting Formatting information in this group includes items such as footnotes, annotation, bookmarks, and pictures.

Character formatting Formatting information, including bold, italic, underline (continuous, dotted, or word), strikethrough, shadow text, outline text, and hidden text, are specied using control characters. Special characters Special characters include hyphens non-breaking space backslashes, and so on.

9. File Formats Slide 3

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 4

TIFF File Format


TIFF le format was originally developed by Aldus Corp. in the Eighties. Currently widely used TIFF specication is version 6.0 released in 1992. It is on of the most widely used digital image le format. It is very portable. TIFF le format can support: colour depth from 1-bit to 24-bit more than one image in a le many different compression methods, including uncompressed, RLE, LZW, CCITT Group 3 and Group 4, JPEG large image size (up to 232 , 1 bytes) support different platforms, including DOS, Macintosh, UNIX The basic organisation of a TIFF le is as follows: it begins with an image le header (IFH)
COMP3600 Multimedia Systems
(199811)

What is big-endian and little-endian?


In different processor systems, the orders of data bytes stored in memory are different. This is known as byte order. Two byte orders are used most often: Big-endian The most signicant byte of a multi-byte object is stored in the memory location of lowest address. For example, a 32-bit word 0x12345678 will be stored as: Addr. Value 1000 12 the most signicant byte 1001 34 1002 56 1003 78 the least signicant byte This byte order is used in Motolora processor systems. Little-endian The least signicant byte of a multi-byte object is stored in the memory location of lowest address. For example, a 32-bit word 0x12345678 will be stored as: Addr. Value 1000 78 the least signicant byte 1001 56 1002 34 1003 12 the least signicant byte This byte order is used in Intel processor systems.

it contains one or more image le dictionaries (IFD) it contains a number of blocks of image data. The number of image data block is no maore than the number of IFD.

IFH

IFD

Image data

9. File Formats Slide 5

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 6

TIFF File Format (Cont.)


IFH has a xed size of 8 bytes the rst word (two bytes) indicates the type order used in the le (0x4D4D for big-endian and 0x4949 for littleendian) the second word indicates the TIFF version which is always 0x002A the last two words is a 32-bit pointer pointing to the rst IFD
Number of tags tag 0 tag 1 tag ID data type length value ... tag ID data type tag n-2 tag n-1 pointer to next IFD length pointer to value data value

TIFF File Tags


Each tag in a TIFF IFD is 12 bytes long. It contains four elements: tag ID (word) this identies the tag data type (word) this tells what type of data this tag contains length (double word) the number of data this tag contains value / value pointer (double word) it contains the data value if the size of the data is less than or equal to 4 bytes, otherwise it is a point to the data block The possible data types include the followings: Type Code BYTE 1 ASCII 2 SHORT 3 LONG 4 RATIONAL 5 Description
8-bit, unsigned byte 8-bit, NULL-terminated string 16-bit, unsigned number 32-bit, unsigned number two 32-bit unsigned numbers

The size of IFD is variable. Each IFD contains a number of tags. The size of each tag is xed, and it is 12 bytes long. the rst word of an IFD indicates the number of tags in this IFD this is followed by the tags the last double word is a pointer to the next IFD. The pointer in the last IFD is NULL

TIFF format has a large number of tags dened. Some of them are required while others are optional. The required tags for bi-level, gray scale, palette-colour and RGB colour images are listed on the next page.

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 7

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 8

The Required TIFF File Tags


Bi-level and Gray Scale
Tag ID Dec. Hex. 254 00FE 256 0100 257 0101 258 0102 259 0103 262 0106 273 0111 277 0115 278 0116 279 0117 282 011A 283 011B 296 0128 Tag Name NewSubFileType ImageWidth ImageLength BitPerSample Compression
PhotometricInterpretation

The TIFF Image Data


Image data in a TIFF le can be in any location Image data are divided into strips Each strip contains one or more contiguous rows of bitmapped data Three TIFF tags are used to identify the image data; RowsPerStrip indicates the number of rows of compressed, bitmapped data in each strip StripOffsets pointers to every strip StripByteCounts an array of values that indicates the size of each strip in bytes. Note: The last strip may not have the same number of rows as the others. All strips must use the same compression method and congurations.
COMP3600 Multimedia Systems
RowsPerStrip SHORT 1 10 StripOffsets LONG 10 StripByteCounts LONG 10 35 42 76 ... Strip 10 Strip 1

YCbCr colour These images require all tags listed above plus the followings:
Tag ID Dec. Hex. 529 0217 530 0218 531 0219 532 021A Tag Name YCbCrCoefcients YCbCrSubSampling YCbCrPositioning ReferencelackWhite Data Type RATIONAL SHORT SHORT LONG

StripOffsets SamplePerPixel RowsPerStrip StripByteCounts XResolution YResolution ResolutionUnit

Data Type LONG SHORT/LONG SHORT/LONG SHORT SHORT SHORT SHORT/LONG SHORT SHORT/LONG SHORT/LONG RATIONAL RATIONAL SHORT

Fax Class Palette-colour and RGB colour These images require all tags listed above plus the followings:
Tag ID Dec. Hex. 284 011C 320 0140 Tag Name PlanarConguration ColorMap Data Type SHORT SHORT

Strip 2

These images require all tags listed in the bilevel images plus the followings:
Tag ID Dec. Hex. 326 0146 327 0147 328 0148 Data Type BadFaxLine SHORT/LONG CleanFaxData SHORT ConsecutiveBadFaxLine SHORT/LONG Tag Name

The advantages of this organisation are to allow smaller systems to read only part of the images, and to allow random access.
(199811)

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 9

9. File Formats Slide 10

Resource Interchange File Format (RIFF)


RIFF les can contain different multimedia elements in a single le RIFF les contain different kinds of data may appear as different types of les because they use different le name extensions, e.g., .WAV .AVI .RMI .RDI .PAL Waveform audio le Audio video interleaved le MIDI le Device independent bitmap le Palette le RIFF chunk denes the contents of the RIFF le List chunk allows embedding additional information Subchunk allows adding more information to a primary chunk All chunks begin with a 8-byte header: the rst four bytes are the chunk ID which is a four character string identifying the type of the chunk, and the next four bytes indicate the size of the data in the chunk in littleendian format. The data of the chunk follow the header immediately. The rst four bytes of the data is known as the form type which identies the type of data, e.g., WAVE, AVI , and so on.
ID SIZE form type ID SIZE

Organisation of RIFF Chunks


ID SIZE form type ID Subchunk data RIFF chunk data SIZE list type ID SIZE ID SIZE data Subchunk data Subchunk List chunk RIFF chunk data

ID SIZE data ID SIZE data Subchunk Subchunk

RIFF provides a standard way to organise data in a le Data in a RIFF le is divided into chunks. The RIFF specication denes three kinds of chunks:

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 11

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 12

RIFF Waveform Audio File Format


The form type of the RIFF chunk is WAVE It has two mandatory subchunks and a list chunk The two subchunks are: fmt the format subchunk specifying the format of the waveform data, such as the sampling rate, resolution and number of channels data the data subchunk contains the actual samples The list subchunk contains information about the le, such as the date of creation, the creator, copyright, and so on. The new RIFF WAVE format has the following extra subchunks: fact this is a required subchunk in the new format. It contains le dependent information. cue this subchunk identies a series of positions in the wave form data stream (the cue points) playlist this subchunk species a play order for a series of cur points associated data this subchunk provides a means of attaching information, such as labels, to sections of the waveform data stream

RIFF MIDI File Format


The form type of the RIFF chunk is RMID It has a single data subchunk which is the MIDI data following the standard MIDI le format 2 one or more sequentially independent tracks The remaining chunk(s) is(are) track chunck(s) Each track chunk contains a sequence of events. Each event is preceded by a delta time which is the elapse time between the current event and the previous event There are two kinds of events: MIDI events are for playing the notes and controlling the MIDI channels System events apply to the complete system. They include events such as setting the timing parameters, conguring the sequencer, and so on

Standard MIDI les


Data in a standard MIDI le are also divided into chunks The rst chunk is the header chunk which contains information about the entire le: the type of the le, number of tracks and the timing. There are three types of MIDI les: 0 single multi-channel track 1 one or more simultaneous track of a sequence

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 13

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 14

Graphics Interchange Format (GIF)


The current version is GIF89a which was introduced in July 1989. It is one of the most popular image le format. Images in GIF le use indexed colour with maximum depth of 8-bit. The maximum resolution is 65536  65536 pixels. More than one image can be store in a single le. It is a little-endian format. It is a stream-based format, i.e., it consists of a series of data packets, known as blocks, along with additional protocol information. GIF le organisation
Signature Version Logical screen descriptor Global colour table Extension block Local image descriptor Local colout table Image data Local image descriptor Local colout table Image data Extension block Trailer Optional Mandatory Image 2 Optional Image 1 Optional Optional Optional Header Mandatory

GIF File Data Blocks


header (mandatory) species the global information
Signature it is always the three character GIF Version it is either 87a or 89a Logical Screen descriptor contains information about the width and height of the screen, the global colour table, the background colour index and the pixel aspect ratio. top-left position of the block relative to the screen, the width and height of the block, and whether there is a local colour table. Local colour table may be present to dene colour used in this image block. Image data are the pixel values. The data is compressed using the LZW encoding method. They are stored as a series of sub-blocks. Each subblock begins with a count byte. The image data is always stored by scan line and by pixel. The scan line can be stored in consecutive order or interlaced. GIF uses a four-pass interlacing scheme.

Global colour table (optional) it is a series of three-byte triples making up the entries in the colour table. The number of entries is always a power of 2, i.e., 2, 4, 8, . . . , up to a maximum of 256. Image data (optional) Each image block can be divided into three parts:
Local image descriptor contains information about this image block, such the

trailer (mandatory) is a single byte whose value it always 0x3B.


(199811)

COMP3600 Multimedia Systems

(199811)

9. File Formats Slide 15

COMP3600 Multimedia Systems

9. File Formats Slide 16

You might also like