Professional Documents
Culture Documents
Lots of Bits
1.1 Big numbers
1.1.1 Just counting
In the beginning there were just scratches in the sand. Three goats III, six goats IIIIII. That way of representing numbers still turns up occasionally; look at the 2 and 2 count on the old scoreboard at Fenway Park. In the computer world its called unary notation. Pretty early on someone realized that it was hard to tell who had more goats if one person had IIIIIIIIIIIIIIIIIIIII and the other person had IIIIIIIIIIIIIIIIIIII. So abbreviations were devised. In the Italian peninsula they used an acute angle V for five and a cross X for ten. This made it easier to see that the goatherd with XXI goats had more goats than the one with only XX. Around the beginning of the Christian era, when chiseled inscriptions became all the rage, the Romans used the letters of the alphabet in place of the marks, so one became the letter I, five became the letter V, and ten became the letter X. Roman numerals were not the first system for representing numbers. And they were certainly not the best. In fact the system was so clumsy for arithmetic that even the Romans didnt use it for that. But it does illustrate a central point. Whoever it was that first wrote X instead of IIIIIIIIII was performing the first act of data compression: representing the same information with a shorter string of symbols. In this case the compression method was pretty simple: Use several symbols instead of just one so the representation becomes shorter.
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007
Usually scientific notation is used when the exact value of a number is not known, and what is shown is really an approximation. Thats another reason for not using scientific notation for a number such as 473. But one might say something like The sun is 9.3 !10 7 miles from the earth, because we dont know the number of miles to the sun down to the last mile. Just writing a power of ten, such as 1011, suggests that you mean that whole number exactly. But writing 1.00 !1011 suggests that you are referring to some measurable quantity that is approximately that number but might actually be something like 1.002743 !1011. There is no mathematical difference between 1011 and 1.00 !1011 , its just a subtlety of whats implied by the way these expressions are used. It gets hard pretty quickly to associate numbers as big as that with things in our everyday experience. Here is a table that may help a bit. Number Count Distance (meters) Time (seconds)
10
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 1 = 100 10 = 101 100 = 102 1000 = 103 104 105 106 = million 107 108 109 = billion 1010 1011 1012 = trillion 1013 1014 1015 = quadrillion 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 You Blocking group Dormitory Freshman class Harvard students Cambridge Montana Massachusetts US Females China People on earth People who ever lived, Neurons in brain Human body House Football field Across the campus Boston Massachusetts Boston to Chicago Boston to Hawaii Twice around earth One moon orbit To the sun Heartbeat Speak a sentence Brush your teeth Eat a meal Final exam Sunrise to sunrise Spring break Grow a crop College Mozarts life Harvards life Civilization
To Jupiter To Pluto Cells in human body Distance light travels in a year To nearest star Insects on earth Thickness of Milky Way Across the Milky Way To nearest major galaxy
Homo sapiens sapiens Fire tamed Hominid bipedalism Monkeys Insects Photosynthesis Origin of universe
Atoms in a pound
Diameter of
11
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 of iron 10 1027 1028 1029 1030 1031 1050 1085
26
universe
Each row of the table starts with a power of ten, so each successive row represents a scale ten times that of the previous row. Put differently, each row is one order of magnitude greater than the previous row. The second column includes something whose number is around that size. Some of these numbers, especially the larger ones, are rather speculative, but the attempt has been to show a quantity whose exact size is within a factor of two of the number in the left column. For example, the official Chinese estimate of the population of that country was 1.295 billion in 2004, though some other estimates are as high as 1.5 billion. Either way, the order of magnitude is 109. Similarly the third column shows distances in meters, and the fourth column shows times in seconds. Since the big bang was on the order of 1018 seconds ago, and the diameter of the universe is on the order of 1025 meters, the rest of those columns simply cant be filled in. There just arent any distances or times greater than the last ones listed in those columns. Reading down the distance and time columns gives one a foreshortened sense of the universe, like the famous New Yorker cartoon showing Manhattan in the foreground and most of the world in half the page, just on the other side of the Hudson river. Going up by factors of ten results in enormous jumps of space and time in only a few steps.
12
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 The numbers of things can get very large indeed. There are a billion times more bacteria on earth than there are stars in the universe. If you were to lay down atoms at a rate of one per meter, a pound of iron would suffice to get you across the entire universe. If you had started enumerating the grains of sand on the earths beaches on the day the universe was born and had counted steadily at a rate of one grain per second, you would by today only have gotten through only a thousandth of the sand. Big as these numbers are, the biggest are dwarfed by a googol. Before Google (spelled differently) was the name of a search engine that I used to find some of these numbers, googol was a word invented to designate 10100, a 1 followed by a hundred zeroes. 100000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000. That's a quadrillion times the number of particles in the universe.
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 big bang. The number of ways the cars have ever actually been arranged is a lot smaller no more than the number of days in the lifetime of a car, most likely. If you are inclined to object that a parking lot with twenty cars is not a thing but an arrangement of other things, remember that your body is an arrangement of atoms that have been around for billions of years rearranged in other ways to constitute other things. Inevitably, most of the things that could exist, dont. This is convenient, because it means that the names of things dont have to be too long. If you wanted to give a serial number to every atom of the earth, the numbers would have to be only 50 digits long, because there are less than 1050 atoms in all. Even if you wanted to give a different number to every particle in the universe, any one serial number would have to be only 85 digits. And you could, in theory, give a different ten-digit telephone number to every human being who has ever lived. The names of things the strings of symbols that can be used to identify things are a lot shorter than the number of things there are.
So the length l of the numerals needs to be at least log10n, where n is the number of things to be represented. If n is a exactly a power of 10, that is the end of the story: three digits to represent 1000 things numbered 000 to 999, four to represent 10,000 things, and so on. If n is not exactly a power of 10, you still need as many digits as the next 14 1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 larger power of 10. For example, if you want decimal numerals for 597 things, you need three digits, the same as would be needed if there were 1000 things, 1000 being the next power of ten larger than 597. You wouldnt have to name them sequentially, 000 through 596, but you would have to use sequences of length three, since there are only 100 strings of two decimal digits. Another way to look at it is that if log10n is not a whole number, the number of digits needed is the next whole number larger than log10n. For example, to name 597 different things with decimal numerals, we need three-digit numerals, because log10597 = 2.77597433 and the next integer larger than that number is 3. Of course you dont actually need to calculate 2.77597433 all you need to see is that the next power of 10 greater than or equal to 597 is 1000 or 103. We write !x" for the next integer greater than or equal to x, so for example !3.2" = 4 and !17" = 17 and !log10 597" = !2.77597433" = 3 . So then the number of decimal digits needed to assign a different numeral to every one of n things is exactly l = !log10 n" . Suppose we use the twenty-six letters of the Roman alphabet to name things rather than the ten decimal digits, how much shorter could the names be? Well, there are 26l different sequences of exactly l letters: 26 letters, 26 ! 26 = 676 two-letter combinations aa, ab, ac, zy, zz. So to give a different name, say, to every star in the universe wed need to have the length l of the strings to be long enough so that 26 l ! 10 23 . That one you can do on a scientific calculator by just multiplying 26 by itself repeatedly and counting how many times you have to do it before the answer is at least 1023, but there is a better way. For 26 l ! 10 23 we need to have l to be an integer greater than or equal to log261023. It suffices to have l as small as possible, so we should have
l = !log 2610 23 " = !23# log 2610" because log b x c = c log b x ! log10 " log x = $23 % because log b x = logb $ log26 % = !23# 0.7067" = !16.254" = 17
15
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 letters per name. So seventeen letters are enough to name all the stars in the universe thats shorter than a lot of the names people give their children!
log m n = log m ! log n . Really the same as the previous rule, expressed differently, since it amounts to log m = log n ! m n = log n + log m n .
( )
( )
log(n a ) = a ! log n . For example, if n is a four-digit number, then n 5 will be about a 20-digit number. This rule works when a is negative too, and the rule about lengths makes sense in this context if you interpret a number of length 5 as a fraction less than 1 whose first nonzero digit is 5 places to the right of the decimal point.
log a n , for any a. For example, since log 2 10 ! 3.32 , decimal log a b numerals are about a third the length of the corresponding binary numerals. log b n =
Note that in a quotient such as log x /log y , it doesnt matter what the bases of the logarithms are, as long as the base is the same in the numerator as in the denominator. (Use whatever is handy on your calculator; sometimes its base 10, sometimes its base e.) The general rule is the important one. If you are using d different symbols to give names to n different things, you need strings of length !log d n" .
16
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007
17
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 20 32 64 128 1,048,576 4,294,967,296 = 4.295109 1.8451019 3.4031038
101010 2 = 2 5 + 2 3 + 21 = 32 + 8 + 2 = 42.
Position 5 4 3 2 1 0 Bit 1 0 1 0 1 0 Position value 32 16 8 4 2 1 BitValue 32 0 8 0 2 0
To convert a number n from decimal to binary, first find the largest power of 2 less than or equal to n. There will be a 1 in the bit position corresponding to that power. Subtract that power of 2 from n and repeat the procedure starting with the remainder. For example, to convert 19 to binary, find the largest power of two less than or equal to 19, which is 16 or 24; subtracting that from 19 leaves 3, which is 21+20. So the binary numeral for 19 would have 1s in positions 4, 1, and 0, in other words 10011.
19 = 16 + 2 + 1 = 1! 2 4 + 0 ! 2 3 + 0 ! 2 2 + 1! 21 + 1! 2 0 = 100112
18
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 they are positive or negative; the same algorithm works for two positive numbers, two negative numbers, or a positive and a negative number. (Try adding +9 and 9; you should get all 0s, with a carry propagating all the way off the left end of the sum.) A small oddity is that given a fixed number of bits, say n, it is possible to represent one more negative number than positive number. (Something like this has to be true, since there is an even number, 2 n , of bit-patterns, one of them has to represent 0, and that leaves an odd number to be used for positive and negative numbers.) So the biggest positive number that can be represented using n bits is 2 n!1 !1; for example, the numbers that can be represented using 16 bits range from 32,768 to +32,767.
20
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 abc = 01100001 01100010 01100011. Weve left a little white space between the code words for visual clarity, but its not needed to translate the sequence of 24 bits back into three symbols, since it is known in advance that code words are always exactly 8 bits long.
1.4.2 Bytes
A chunk of eight bits is a convenient unit of information, in part because of its correspondence with the length of the code words for characters. A unit of eight bits is called a byte. Reportedly the person who coined that term chose that spelling because he feared that if he called it a bite, people typing it might think he meant bit and change the meaning by mistakenly repairing the spelling.
1.4.3 Hexadecimal
Strings of more than a few bits are hard to read and copy. A convenient notation uses sixteen different symbols for the sixteen different patterns of four bits. The first ten symbols used are the same as the ten decimal digits, in order. That leaves the six patterns whose decimal values would be 10 through 15; the first six letters of the Roman alphabet are used for those patterns. This is a basesixteen or hexadecimal notation, with the sixteen hexadecimal digits being 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. Bit pattern 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 Hexadecimal digit 0 1 2 3 4 5 6 7 8 9 A B 21
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 1100 1101 1110 1111 C D E F
We wont be doing hexadecimal arithmetic (though there are people who can do that in their sleep!). All we need to be able to do is to transcribe a string of bits into hexadecimal digits by breaking the bit string into chunks of length four and translating the chunks into hex digits. For example, the ASCII code for the string abc mentioned above would be rendered in hex as follows.
011000010110001001100011 = 61626316 {{ {{ {{ 6 3 6 1 6 2 If a string of eight bits is a byte, then a string of four bits, or half a byte, has to be a nibble. Sorry!
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 computer is expecting to see rightmost-bit-first, what it thinks it has received will be not a but 10000110, which is the code for a different symbol. For a time the question of which bit should travel down the wire first had all the earmarks of a holy war. Computer scientist Danny Cohen, in a hilarious reference to Gullivers Travels, called the warring viewpoints the big-endian and little-endian conventions. Cohens paper, On Holy Wars and a Plea for Peace, is worth reading. It doesnt require knowing anything you havent already learned, and plays the holy war metaphor and others, such as to the differing orthographic conventions of English, Hebrew, and Chinese, for all they are worth.1
http://khavrinen.lcs.mit.edu/wollman/ien-137.txt
23
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 address that a computer can easily handle. If the number of bits in an address on a particular computer is 23, say, then the total number of memory cells that are directly addressable is 223. For this reason physical computer memory is manufactured and sold in units that are powers of two in size. You could not go into a computer store and buy a million bytes of memory. Youd have to try to buy memory in the next larger power of two, which turns out to be 1,048,576. That quantity of memory is called a megabyte, and though that is sometimes called a million bytes, in the computer world a megabyte of data or memory almost always means 220 bytes, which is about 5% more than a million.
24
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 Gigabyte (with a hard G, as in girl Terabyte Petabyte Exabyte GB 230 = 10243 1.07 109 A few bookshelves
TB PB EB
Each line of this table is about a thousand times larger than the previous line to be precise, 1024 times as large. The entire system of nomenclature laid out here arises from the accident that 210 happens to be close to a power of 10. The personal computer you buy today probably has at least 50GB, and you can buy a PB for less than $750 (see http://www.pricewatch.com ). Such storage capacities were unthinkable only a few years ago. Miniaturization has drastically reduced the cost and increased the capacity of computer storage. A decade ago one could not have bought a petabyte for all the money in the world nor could one have owned enough warehouses to store it, even if it could be purchased. Be careful if someone tries to sell you 1Gb of memory. That small b may not be an inconsequential typographical variant of B. Small b means bit and big B means byte, so 1Gb would be a gigabit, only one-eight as much memory as 1GB. In truth, memory is not sold by the bit, but other things, for example the transmission speeds of data lines, are sometimes measured in bits per second (b/sec or bps) and sometimes in bytes per second, so be alert.
25
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 the 103.3 Mhz frequency, and truly means 103.3 106, not 103.3 220. (A Hz or Hertz is a unit of frequency, a cycle per second. Well get back to that later.) An in-between case is disk sizes. If my computer can handle 30 bit addresses, it has no trouble with a gigabyte of memory, but 230+1 bytes would be a nuisance since addresses would now need 31 bits long. But the addressing logic that ties internal memory sizes to powers of two does not apply when the storage is external. There is nothing magic about any particular disk size. It may make sense to organize the data on a disk so that it is read and written in chunks of some convenient size a number of bytes exactly equal to some power of 2, for example, so that when it moves into or out of internal memory it fits in a slot naturally defined by the computers addressing logic. Such chunks of disk data are called pages. But there is no physical reason for the number of pages on a disk to be a power of 2 or any other round number. Now: If I have 1 GB of internal memory on my computer and I buy a 1K GB disk, does that mean that I could store 1024 copies of my internal memory on the disk, 1000 copies, or perhaps even less maybe the disk truly holds only 1012 bytes, which would suffice to hold only 976 copies of the 1GB internal memory of my machine! Caveat emptor the conventions are not universal, and while none of these numbers varies from the others by more than 10%, thats enough to have generated some lawsuits by consumers who were thinking K as in kilobytes meant 1024 against manufacturers who were thinking K as in kilometers meant 1000.2
http://austin.bizjournals.com/austin/stories/2003/09/15/daily39.html
26
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 If a wire was put through a core and an electric current was pulsed through the wire, a magnetic field would be created strong enough to change the direction in which the core was magnetized. The cores were actually strung on a grid of wires. A single core could be picked out by pulsing two wires each with half as much current as was needed to flip the bit.3 The other cores on the same wires would be unaffected. Magnetic core technology replaced a 1950s vacuum-tube technology, which itself supplanted a 1940s technology based on electromechanical dials and gears. Harvard was at the forefront of the computer world in those days you can see the Aiken-IBM Mark I computer, with its rows of gears and dials, in the middle of the first floor of the Science Center. Crucial aspects of the technology needed to store and retrieve bits using magnetic cores were also developed at Harvard, by An Wang, then a graduate student of Howard Aikens. In one of a long series of missed boats in the applied sciences, Harvard took no interest in the invention. But MIT did, and the combination of Wangs invention with inventions of MITs Jay Forrester made The Aiken Mark I commercial core memories possible. The manufacture of core memories was never automated. Cores were strung manually onto wires, almost entirely by poorly paid garment workers in the Far East, working with microscopes. By the time the manufacturing process was mature, the cores were barely a millimeter in diameter and the cost of core memory was approaching a penny a bit. But that still resulted in enormous costs for what we would today consider a tiny amount of memory.
Image of a single core from www.hpmuseum.org/ tech9100.htm , of1957 core memory plane from Siemens Corp., w4.siemens.de/archiv/images/ preview/1954_dv.jpg
27
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 The mechanical and vacuum-tube computers had the equivalent of a few hundreds to a few thousands of bits of memory the ENIAC, an important vacuum tube computer, had 18,000 vacuum tubes. While vacuum tube machines were much faster than mechanical computers performing a few thousand operations per second rather than three or four operations per second the tubes were hot, so vacuum tube machines required lots of electric power and the tubes tended to burn out quickly. Core memory made possible the construction of machines with memories in the millions of bits, essential for complex calculations but out of the question for vacuum tube machines.
28
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 tubes had had in the earliest electronic computers. Happily, silicon is dirt cheap or sand cheap, to be precise, since sand is the raw material used to make silicon chips. The first of these integrated circuits, consisting of just a few transistors, became commercially available in 1961 from Texas Instruments and Fairchild Semiconductor. As the manufacturing process improved and researchers learned more about the electrical properties of silicon, the size of the individual transistors shrank and it became possible to squeeze more onto a single silicon chip. At first many chips were defective because specks of dust in the air during the manufacturing process could ruin one of the tiny features of the chip, a wire or a transistor. As a result chips could not be too large the bigger the surface area the more likely that the chip would not work because of some defect in manufacturing. As the manufacturing environment became cleaner, the physical dimensions of chips could increase without an unacceptably large increase in the number of defective chips.
ftp://download.intel.com/research/silicon/moorespaper.pdf
29
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 memory chips. Except for peculiarly specialized applications, all computers made since the mid-70s have used silicon chips for memory. As of 2004, 39 years have passed since Moores law was propounded. That is 39/1.5 = 26 eighteen-month periods, so if the law has held true, the number of components on a chip should have increased by a factor of 2(39 /1.5) =2 26 = 67,108,864 . The law had been proposed in 1965, when the biggest chip that could be manufactured had 64 = 26 components. An increase by a factor of 226 starting from a size of 26 would be a chip of size 232, or 4,294,967,296. It is, of course, comical to carry out these calculations to that level of precision. But incredibly, the biggest memory chips being manufactured in 2004 are 4 gigabit chips.5 And projections for the future of semiconductor technologies suggest that that doubling in size at the same rate will continue for at least two or three more cycles. Some have attributed the success of Moores law to its existence, suggesting that manufacturers repeatedly rise to the specific challenge of meeting the expectations that the law itself has created. I doubt that technological advances can over such a long period be shaped by such artificial pronouncements about the future. Certainly many less amazing predictions about the future of technology have not come true. But what is true is that the phenomenon described by Moores law is absolutely extraordinary. Our capacity to store information has increased by a factor of four billion over four decades. With the 4 gigabit chips costing $114, the cost per bit has dropped to 3!10"8 cents per bit. There is nothing else that humankind can do even a million times faster, bigger, cheaper, or otherwise better than it could do in the Stone Age. The fastest airplanes fly at around 1000 m/sec; the fastest humans can run short distances at a rate of around 10 m/sec, so the speed improvement is by a factor of only a hundred, or perhaps a thousand if you adopt the more realistic rate of 1 m/sec for Stone Age people walking long distances. The change in our capacity to store and communicate information is like nothing else that is or ever was. Not even close. And that is what makes the information revolution possible.
http://www.infoworld.com/article/04/04/06/HNtoshsandisk_1.html
30
1 Lots of Bits
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007
1.7 Images
We have been estimating the size of texts, as though the only thing of interest is the letters and symbols of which they are composed. Of course that is not true. Books contain diagrams and photographs, and they are set in multiple fonts and typefaces. How many bytes does it take to store a real page so that it can be reproduced perfectly from the stored version? Even getting the question to be precisely meaningful will take a bit of effort, but we dont need to explain everything at once. Lets agree that perfectly just means good enough so that the human eye cant see the imperfections, though they might be visible with a microscope. When a page is printed digitally, the surface is divided into tiny square dots, each of which can be black or white. The dots are called pixels, short for picture elements. The number of dots per inch, horizontally or vertically, is called the resolution of the printer. (Actually, the resolution does not have to be the same in both directions, so the squares could actually be rectangles.) The lowest-resolution printers are in fax machines; they print around 100 dots per inch, and it is easy to see with the naked eye how jagged the letterforms are. A very high resolution printer might have print 2400 dots per inch; at that resolution the naked eye could not tell that, for example, a diagonal line is actually a jagged staircase of tiny steps. Lets imagine a page printed at 2400 dpi. Suppose that the printed area is 7 by 10 inches. How many dots is that? 7 !10 ! 2400 ! 2400 " 4 !10 8 dots. Since each dot can be either black or white, each dot records one bit of information, so the page has around 50 megabytes ( 4 !10 8 bits / (8 bits/byte) = .5 !10 8 bytes = 50 !10 6 bytes ). So a single page digitized at 2400dpi consists of as many bits as an entire shelf full of books would take if only the text were preserved! And thats just for black and white. What if each dot needs a color? A common scheme for representing color values uses 24 bits per dot (eight bits of brightness value, 0 to 255, for each of red, green, and blue). Now we are up to around 1.25 109 bytes per page, more than a gigabyte. At that rate a 50GB disk would be filled by just 40 pages of images.
31
BITS: Notes for Harvard QR48 / MIT 6.095 DRAFT March 7, 2007 That cant be right photographs and documents do not require that much storage in practice. How is it possible to store complex documents, photographs, and images in much smaller amounts of disk space? The answer is data compression, and to understand it we need to explore the difference between mere bits and information.
32
1 Lots of Bits