You are on page 1of 35

12/4/2009 - the remaining 3 languages; cs107 wrapup

1. what is java? a. Your program is like a mouse running in the field. b. Compile time typing sets some boundaries. And its in compile time, which is before runtime! You are constraining the future! c. C has full compile time system and nothing at runtime. d. Python has full runtime and nothing at compile time (no compile) e. Java has both. Compile time and runtime. Every heap object is still tagged. i. Having runtime typing prevents writing malware in Java. ii. You cant just lie and say that something is a string that youre overwriting. Java checks. f. Java --> bytecode. i. Not elf. julies well known fixation with the elf format ii. Its portable between OS. which is why ms hates it so much iii. Old: interpreter runs bytecode. Has a big while loop and switch statement. 5-10x slowdown. iv. Modern: just in time compiler (jit) --> native code. hotspot - gpl open source from sun. hotspot does its optimizations at runtime. It usually starts in interpreter mode, observes what is running the most, and optimizes that function, and swaps that in for the code. This is the future of code optimization. g. Firefox has a JIT also for javascript. But its a lot easier in compiler time checked language h. Startup time is bad - has to start up everything. Tons of memory used. Hotspot has a couple copies of the code. But later on, itll run fairly fast. pros / cons of compile time typing a. advantages i. detects errors ii. better performance because decision isnt deferred to runtime (ie, its different if a+b is an int or string) iii. better tool refactoring and auto-complete. iv. +/- readable / verbose. 1. but python can be beautiful because there isnt a lot of extra typing stuff distracting you from codez v. allows better jit compiling b. disads i. extra stuff to key in, more verbose ii. may be hard to express some ideas in type system, though they will actually work at runtime. Maintaining type info can get in the way. c. Demo: javabat.com pros/cons of dynamic typing a. advs i. less to type in / less to get in the way ii. lang not limited to what can be expressed within static typing 1. python has lots of features 2. code is short; defer to runtime is simple to implement. b. disads i. hard to read bc type info not there 1. you might find yourself adding it in the variable names 2. type info can be useful to reader ii. worse performance bc defer decisions iii. worse compile time error detection 1. compensate with unit tests iv. worse compile time tool refactoring, autocomplete. language choice precepts a. working source code is high mass, high cost.

2.

3.

4.

5.

6.

7.

legacy. Google does a ton of things in c++ because they did it in c++ originally and it would be a big pain to change. c. Therefore, avoid building your system on top of locked-in, proprietary infrastructure. i. Once your system develops high mass, you are screwed. d. Precept 2: engineers can get heated about language choice. i. you know the Stockholm syndrome? This explains c to me. ii. Important meme: bikeshed painting principle 1. if you talk about geopolitics, people wont talk. 2. if you project a picture of a bikeshed, because its trivial, everyone can form an opinion. iii. Team idea: shutting up skills. Consider remaining silent. Whoevers implementing should do it however they want. Only intervene if their choice is sooooo bad. three lang choices. a. Language tools today are fantastic. features = lang features + libs b. c/c++ i. fast. Small memory use. Low dependency -- if youre programming for a $12chip, you dont want to have to install java on it. ii. legacy iii. features are pretty weak iv. fit: small stuff good. Performance sensitive. Not Big, complex stuff with lots of people. v. c++0x project -- making c++ catch up c. Java i. Static typing, good performance because of jit, lots of features. The lang features are small. The libraries are probably the best of any language. C++ has the horrible .h files. ii. Fit: large or complex project, team project, tolerate verbose code. d. python i. dynamic typing, worst performance, large features and flexibility and short code ii. Small projects, where simplicity shines. If youre just doing it in 2 or 3 pages at once, you dont need to add type info. Feels quick. iii. Bad if there are lots of files and classes and people on one project. things worthwhile in 107 a. gdb / bomb b. vim c. remote programming / repos i. Im going to datamine the repos. There are interesting commit messages adams are done. No, now Im done. No, now Im really done d. labs pointed out good pitfalls e. testing f. data type reps in memory / bit reps g. making my code faster h. stack / heap. Digging around and getting the args. i. Computational thinking. how did the prog make you feel? j. Lab team k. Optimizatn l. Pointer arithmetic and using memory blobs m. Lets of diagrams in lectures n. Detail given to assn descriptions o. Buggy c code / code that works even despite bugs p. linux advice to next years 107 class a. code early; code often b. read assn descriptions and the header files and such. c. Dont assume compiling means working. Compile --> not much of a milestone. d. Be ok wit deleting code. e. Hg is your friend f. Learn unix

b.

8.

9.

g. Write your own tests h. Doing the reading in B+O i. Know the pointers. Theyre important. j. Talk with each other k. Learn your editor l. Dont be shy about using cs107 email. m. n. Valgrind -- teach optimization early o. Code before thinking logistics a. assn7 due tonight. You can use a late day or two though b. grading argh 4 and 5 done. Half of 6 done. All of assn1/2 reduxes done. c. Final is in this room on Friday at 830am. d. Sample final is good. Well add a light level of python. classes following cs107 a. cs110. obvious choice. Next systems core class. More efficiency, memory higherarchy, caching, performance. Scalability. If you are taking cs110 eventually, take it asap because it follows from cs107 because otherwise youll have to relearn b. cs108. awesome elective. Intense programming. Higher level. Design issues. App design, gui dev, software design patterns, group project. Java. You cant make java hard enough. c. Cs103, cs109. theyre good to take. d. The new major is flexible, so you can find a track that lets you avoid what you dislike. e. Section leading! Very good at debugging other peoples code (which is very different skill fro debugging your own code). 6th week of any quarter.

11/30/2009 - python!
1. intro a. b. guest lecture (Nick Parlante?) Nick: Am I louder than you, Julie? Julie: Yes. And probably more charismatic, too c. Lecture 18: Python notes on Courseware (http://www.stanford.edu/class/cs107/other/nick_python/python-introduction.html) are fancy. how many programming langs are there? 3! a. C / c++. Everything must be resolved at compile time. The other langs are built on c b. There is the python space. javascript, perl, ruby, scheme. Dynamic typing. Everything is deferred until last moment. c. Java. It has a dynamic typing system, but it tries to resolve stuff at compile time for debugging. how do scripting languages (dynamic languages) evaluate expressions? a. Every var is a pointer that points to a tagged value. b. Evaluating is kind of like a big switch statement. Are these two ints? Then Ill do int addition. Are they two strings? Then Ill concatenate. Thus, each line can be used for multiple things. python rocks a. misc i. foss ii. ms hates it. iii. Boilerplate at the bottom 1. If __name__ == __main__: main() iv. Can you overload stuff? Sure. Python is customizable. Just dont do it. C++ has shown us how horrible that is. v. help(functionName) gives you info on it vi. For line in f //reads lines in file vii. Text = f.read() //reads lines into file. b. interpreter fun

2.

3.

4.

c. d.

e. f.

g.

h.

i.

i. To quit, type quit. Then, it tells you to use quit() or ctrl-D. ii. You can type in a line of code to figure out what it does. iii. Interpreter is in read-eval-print loop. Or, you can do Variables, types, etc i. Vars can be retyped. Everything is just a pointer. ii. functions, along with vars, are all in same namespace. Just pointers. Import --- lets you use a model. Like of library code. i. import sys sys.bla() ii. searches for modules in your path iii. open source --> theres a module for everything. There are language features that make it easier to share code. Boolean expressions i. == works. For everything. For lists. For strings. ii. It uses or and and and not rather than the really intuitive || or && or !. Python uses indentation, not curly braces. Made by Guido Van Nossen (google now) i. Arg: good programmers would have curly braces and indentation be consistent. Having two things that programmers have to keep consistent manually is stupid. ii. Like, does anyone update their file in a .c and then jump for joy when they think Oh, now I have to update my .h!? Ok, Ill stop bagging on C now 1. but he never did stop. iii. Indentation is more visible. Lets just use that. iv. To span multiple lines, use gratuitous parens. 1. (foo(), bar()) 2. works Three most common python errors by new folks i. Forget to put in colon after code block ii. Indentation is off by one. iii. Using parens on if statements. But theyre not necessary. Itll work, but python people will make fun of you. Watch out for that. Drawbacks to python i. There is no tab completion in python. Because python has no clue what an object is. ii. Errors not caught until it executes a line of code. Nothing predictive in the code. So it cant know what code is bad until it runs the code. This means you need to have good test coverage in python code to avoid these errors. tips i. in python, there is nothing to tell you filename versus content of file, or queue or string or whatnot. So you really need good variable names. Is it WORD or WORDS? ii. Test each few lines of code as you go. Print your data structures. Call sys.exit(0) to exit. Keep iterating through your code and printing your data structures to see if it works. string is single quotes. Or you can use double quotes for the sake of putting single quotes in it. str is name of string class. So str(2) makes 2 into 2. Len - length. You can get the length of strings or arrays or anything. Square bracket array notation works for just about everything. Strings are immutable like in java. You can make new strings if you want to. Anything that changes a string just returns a new string. Str.upper() Str.isalpha() -- tests every character in the string. --- three quotes means it can span lines a = [1,2,3] len square bracket. Lists can contain anything. List of strings and such.

5.

strings a. b. c. d. e. f. g. h.

6.

list a. b. c. d.

e. f. g.

h. loops a. python does have while and for loops and such. But the only one youll use is foreach loop b. for VAR in LIST: c. a = [x, y, z] d. for letter in a: print letter e. this works for hash tables, lists, everything. Dont bother with index. 8. functions a. def main(): i. args = sys.argv[1:] 9. hashmap a. curly brace delimited. b. d = {} c. d[a] = alpha d. d = {a: alpha, g: gamma, o: omega} e. you can always print your data structure using print d f. d[a] returns alpha. g. if x in d h. d.keys() i. for key in sorted(d.keys()): i. print k, -->, d[k] j. d.values() k. d.items() pulls out both key and value and puts them in a list of 2-tuples. Does this for better time because it doesnt have to hit the lest twice. i. Tuples are like little lists. They have length and you can pull items out o them using square brackets. l. Deleting stuff: del d[a] 10. custom sorting a. default will do alphabetical or numerical if its ints or strings. b. To sort, you do a function of one arg. That gives each arg a proxy value. Then, it compares using the proxy values. c. Python functions can have optional named args. i. Sorted([aa, a], key=len) 1. pass in len function as a function pointer. Now, you get custom sorting by length. d. def second(s): i. return s[1] e. sorted([zb, az], key=second) f. to sort in descending order: sorted(bla, key=second, reverse=True) 11. questions a. do chars exist, or is everything strings? i. No, chars dont exist. Just strings of length 1. b. Whats the difference between # comments and ## comments and comments? i. ## comments dont exist. Its just two #s. its the single line version of comments. ii. allows multiline. And its the convention for the javadoc-style-thing that python does. 7.

Lists dont need to be uniformly typed in: tests whether an item is in an array. Returns true slices i. a[1:] gets the list starting at index 1 ii. a[:2] goes up to that number iii. a[-1] is rightmost element. A[-2] is next one in. reverse() method of list

11/20/2009 - notes from Zahan 11/18/2009 section


Memory Layout: Globals are in data segment, below the heap You can only use static locals in that function, but it persists String constants in data segment too. Above the other stuff? You dont get more than one copy. If you write to a string constant? Crash. Code is in text segment, below the data segment. Library functions is between heap and stack with symbols and links to dynamic libraries. Functions are at different places in different runs. Writing to functions crashes. Stack starts at the top Where does your heap start? Make one call to malloc. How much can your heap allocate? As much as your OS gives to you. Malloc If you underrun, usually not that much happens If you overrun, you pwn your memory Free nonheap pointer, segfault Free stack ptr: invalid pointer check. Realloc non-heap ptr: same. Free twice: double free err msg Free ptr to middle of heap block: invalid Access after freed: it zeros the first four bytes. So most of the memory is still there, except the first 4 bytes

11/16/2009 - Memory Optimization


1. intro a. segbus error? i. You can fake any signal you want with kill ii. Segbus is hard to get beause segfault is used for addresses that appear to be valid but are outside your segment. iii. Bus error usually comes from alignment access (odd address), but ia32 lets you get those addresses. iv. So, you wont actually ever get a real segbus error. Just emulate it with kill Use a vector? i. Sure. But it might not help you because cvector is a pain to use. Not too many lines of code, but you need to know what happens at compile time, whats available at runtime, what the stack looks like, etc. Can we call malloc before program crashes? i. You want to do minimal stuff once the program crashes. That might fail because of the crash. ii. So you should already know symbols, name, info, etc. iii. Just look through backtrace and be done. iv. Do as much as possible at the init phase. The only thing you cant do is figure out what symbols are on the stack.

b. c. d.

2.

3.

v. You dont even want the function to complete. You want to troll the stack and then exit. What happened if someone wrote ontop of our heap data for the crash reporter? Do we have to account for that? i. No, you cant account for that. f. Can we use the stack after crash? i. Light use is cool. ii. Dont use big system library stuff that touches a lot of things like malloc and free. iii. We hope that the corruption has not touched the stack. iv. But printing a line is good. Decomp is fine. g. How do we examine the registers? i. Signal handler shows you how to get to eip, which is the one you need. Memory + Memory Hierarchy a. Most programs are not cpu bound -- that have tons of numerical analysis and doesnt even need a lot of data. But except in that case, theres probably a lot of downtime just waiting for memory. b. Main memory - cpu registries. Connection, through the bus, to ram. Bus traffic has profound impact on performance bc cpu runs at ~3GHz. Memory runs ~800MHz. c. Memory Hierarchy i. Registers ii. On chip L1 Cache (SRam). Holds cache lines retrieved from the l2 cache. 1 to 2 cycle. Not shared between processors. write through usually. iii. Off-chap L2 cache (sram). Holds cache lines retrieved from memory. Maybe 10 cycles away. Usually shared between processors. write back usually. iv. L3: Main memory (Dram) fairly slow - about 100 cycles to get memory across the bus and to the chip. But its cheap in terms of dollars. v. L4: Local secondary storage (hard disks) holds disk block retrieed from local disks vi. L5: remote secondary storage. Distributed file systems, web servers. d. The introduction of L1 and L2 cache (some machines have 3 caches) was because of the growing gap between cpu and bus speeds. Cpu and bus used to be about the same speed. e. core memory -- from magnetic storage from big room-sized computers. Ie, core dump. Now, there is no magnetic core. Awwww. caching a. two forms of locality in terms of access to memory in most progs: i. temporal locality. you use one variable a lot in one place, and you dont use it much in other places. Ie, local vars in a function. ii. Spatial locality: youre likely to use memory thats around one piece of memory at around the same time. Ie, looking at a string, youll probably look at the entire string at once. 1. this means that the heap is slower because it cant be hashed as well. Ie, arrays versus linked-list. 2. so, sometimes, with big programs, youll ask malloc for a big block of memory, and then youll control it yourself so that you can force the locality. b. the cache is like your desk. You can only have certain things on your desk at once. If you go over to the bookshelf to get another textbook, you have to clear something else off. c. cache hit = what you want is already what you need in the cache. d. cache miss = you need to grab the data from ram. e. If we have 97% hit rate (each of which is 1 cycle) and others are 100 cycles, then 0.97*1 + 100*0.03, then access time is about 4 cycles on average. If we improve hit rate to 99, avg access is about 2 cycles. Typically, cache stats are discussed in miss rate. f. Figuring out which part of memory goes into which cache: its a very simple mod relation. If we have 4 blocks of cache, we put all memory in the RAMBLOCK % 4 == 0 area to CACHEBLOCK 0. i. Easy to implement ii. Tends to work fairly well with access patterns of most progs. iii. There are fancier forms, but eh. g. Write policy? When you look in ram, youll always try to search for memory in its corresponding cache block first. So you wont use memory e.

4.

5.

6.

7. 8.

i. When you write to cache, you could write through (write through cache) which means you would write to ram as you update the cache. This can be efficient because you dont need the ram immediately, so its fine that its slow. ii. Or, you could have write back cache, where you copy cache to ram when flushed. Virtual memory: a. you have 4GB addressable, but not 4GB of ram (usually). Basically, RAM is cached to HDD. b. Each process has its own address space. Its own virtual addresses. c. Other processes have similar (or exactly the same) virtual addresses. d. Map, called the page table, that maps virtual addresses to physical addresses (which corresponds to a place in DRAM chip). e. Things go off and on DRAM chips in pages. f. If not resident in memory -- if youre not using memory -- then theyre resident on disk (in swapfile) rather than ram. g. When swapping gets too high: thrashing. Disk is very slow. Thousands or millions of cycles to get a page and bring it in. h. Avoid this: have more ram, use less ram, keep things together on one page, prefetch. numbers everyone should know according to Jeff Dean, king of large distributed systems by Google a. l1 cache 0.5ns b. l2 cache 7ns c. mutex lock/unlock 25ns d. main memory 100ns e. compress1kbytes with zippy 3k ns f. send 2k bytes over 1gbps network 20k ns g. read 1mb sequentially frem memory 250k ns h. reound trip within same datacenter 500k ns i. disk seek 10m ns j. read 1mb sequentially from disk 20m ns k. send packet CA->Netherlands->CA 150m ns etc a. storing v caching i. if I can recreate the data faster than I can write it down somewhere else, I wont bother storing it. ii. compressing data: compressed data can be read from memory more quickly even though it takes time to uncompress. b. valgrind i. valgirnd --tool=callgrind --simulate-cache=yes 1. simulates l1, not l2, cache. Its a lie when it says it knows about l2. 2. Counts cache hits and cache misses and reports. c. old programmers optimize for cpu, not memory, but memory is killer. + a lot of the tools tell you about cpu, but not memory. i. Link ordering makes a big difference. Friday a. Software level concurrency. questions a. how do ssds compare to ram? i. Orders of mmagnitudes faster than HDD. Stil orders of magnitude slower than ram. b. is there a structural limit to the size of each level of cache, or would it just be really expensive to get a ton of cache? i. Space limit ii. You have to search the cache, and being small lets them be faster. iii. So, consumer cache isnt too much different than high end cache c. why hasnt ram / bus gotten faster as much as cpu? Is it just industry effect, or is it harder to optimize ram? i. Dunno? d. Windows page file thats greater than the current ram usage: that means that a page is in use but the program is reporting Im not using that entire page?

e. f. g. h. i.

i. Probably difference between virtual and physical memory. Always more virtual than physical memory. when you say 100 cycles to pull from memory, how are those cycles optimized? i. It can pipeline, but 100 is the response time. Why are virtual addresses 0x80 if it could be any arbitrary number? i. Completely arbitrary What does it look like when the cpu is waiting for memory? Bc I often see programs pwn the cpu to 100%. does compiler need to know about the cache, or does the processor automatically deal with that? what kind of miss rate in cache is typical? i. Less than .1% ideal?

11/13/2009 - Optimization, GCC, Fancy Processors


1. intro a. use email -- ask us questions! Use forum if you need questions in public, but we want you to ask questions. \last spring, we got about 200 questions per week. This quarter, we have about 400500 questions total. optimization - dont a. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil Don Knuth i. Do it more simple first, then figure out where you need to optimize. b. more computing sins are computed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity - WA Wulf c. Bottlenecks occur in surprising places, so dont try to second guess and put in a speed hack until you have proven thats where the bottleneck is - rob pike i. Dont guess. Measure. Algos + Data Structures Matter Most a. Big O: no tool will turn an n2 algo into an n logn algo / datastructure. Optimizations will lower the coefficients, but you have to make sure to get the best big O. i. how does GCC work? a. optimizing compilers i. gcc -O0 no optimizations ii. -O1: moderate things that are known to work iii. -O2: aggressive and well documented to behave well in most cases iv. -O3: things that GCC is experimenting with that might help but might not and might hurt. v. More optimization --> more compile time. GCC is trying to understand code. vi. Sometimes, the code gets bigger: GCC will split it into different cases, and optimize one case, but have to use multiple blocks of code. Maybe. b. specific optimizations i. these might not be generally useful. In specific cases, they will be. ii. -f omit-frame-pointer. Doesnt use ebp. Uses everything off esp. ebp can be general purpose register. Also makes backtrace harder. iii. -funroll-loops iv. Lots of other optimizations. These are all c. GCC knows the compiler i. Superscalar - the computer can do multiple things in each cycle. d. GCC can fold constants i. 4*8*a versus 32*a. these constants may come out of the derived expression -- ie, getting an index of a struct. Or named constants.

2.

3.

4.

5.

6.

Or if theres some values that wont change within a loop, it wont have to recalculate each time. Common subexpression f. Looks locally for low hanging fruit. Wont be able to look at the whole code. g. It will be conservative. It cannot change the code. For buggy code, it might change, but for correct code it should work. h. Strength reduction. i. Change divide to multiply. ii. Or multiply to add -- if you keep adding 4, thats easier than re-multiplying by 4 each time. i. If you write your code normally, GCC will be able to recognize it as an idiomatic pattern. If you write esoteric code, GCC will leave it alone. j. Code Motion i. Theres an intel chip cycle count if you want to time fine-tuned. How many machine cycles did it take to do a qsort? These are each in millions of cycles. ii. Matrix mult 2M 0.38M =6x faster (strength reduction) iii. Quicksort 2M (quicksort was already compiled optimized in the library) iv. Selection sort 973M 557M =40% reduction. v. Recursive factorial 1.6M 0.3M (tail end recursion - its a constant factor away). vi. Iterative factorial 0.8M 0.3M vii. Reassemble 776M 716M 10% (for programs with mixed character, not a ton of leverage from optimizations) k. Wont optimize out function calls -- ie, it wont pull out a strlen(s) because it doesnt know that s wont change. The function call might change it! Or might change a global variable. C doesnt have a way to say that, given the same input, it will give the same output always. Ie, rand doesnt take any parameters, and it gives a new output every time. how to test a. youll have code for cycle counter for when you implement malloc b. time - very crude measure. c. valgrind -- it doesnt just do memory usage. d. Valgrind --tool=callgrind i. Maps it back to source code, so you can see how many cycles were spent on each line of source code. ii. Callgrind.out.(processnumber) iii. Callgrind_annotate --auto=yes e. Other tool: gprof Processors: superscalar, ICU + EU a. At the hardware level, code isnt linear. b. The chip can do more than one instruction at a time c. In execution unit, 6 channels: unit dedicated to loading, to storing, two for floating point, two int units. d. In one cycle, the ICU can delegate 6 tasks to get started. e. Not all of these take the same amount of cycles to finish. While its still doing a floating point divide, it can do other things f. Pipelining: i. The part of the machinery that does floating point add, which takes 3 cycles, can always be working. ii. The first cycle handles exp, second handles sign, third rounds. iii. After you finish one floating point exp handling, you can start on the next even though you still havent added the first number. iv. Latency (ie, 5 cycle latency to load or store. Issue time = 1 cycle). Time to complete. v. issue time: # cycles before start next op. If issue < latency, --> pipelined. vi. Divide: no pipelining, and takes 18 latency 18 issue. vii. Out of order scheduling g. You dont need to know about this. The compiler and chip worries about that. h. instruction level parallelism

e.

Sometimes, it has to speculatively execute code. It will sometimes do extra work bc it might need it if the units will otherwise be idle and unscheduled, no reason not to. 7. how to write code that takes advantage of instruction level parallelism a. dont make it so that each instruction depends on previous instruction. You need to have both operands ready to pipeline. b. Particularly useful when massive number of calculations. c. Compiler can optimize out associativity for integers but it CANT do that for floating points because floating points arent associative 8. how might buggy code change? a. Uninitialized variables b. Dangling pointers 9. you have whole proc? a. In the model of abstraction, you do. 10. atom and arm processors are in order of execution, though.

i.

11/11/2009 - section
Gcc Wont search current directory if you put include in brackets Null: #define NULL (void*) 0 Macros suxxorz: have to parenthesize everything, cant do good stuff like passing in ++x, it will repeat function calls. Linker will look through library symbol table even if you dont explicitly include a header file. So you can still use qsort even if you dont include stdlib! Versus assert, which is a macro and needs to be taken care of in preprocessing because there is no symbol for it in the library symbol table.

11/9/2009 - The Heap


1. intro a. next class: i. take 110! Its lower level and systemsy. Its required. ii. It wouldnt be terrible to take 108. its an optional mixin. Its not required. More higher level. b. Assignment regrade i. No late days ii. Its posted iii. Monday of thanksgiving break iv. Get back 75% of the points you lost c. Binary bomb i. Woot woot ii. GDB=r0x0rz. You can use it rather than printfs now. Its like laundry a. You throw it into a pile. Its unordered. You just have to search through and grab something from it. b. Malloc and free c. Other stuff i. Realloc ii. Calloc d. Its C code. The stack is just one assembly instruction to move stuff. Malloc and free are in c libraries. e. Relies on low level os allocator to get big chunks of memory. 1 page-- 4k, or 8k. that allocator is not appropriate for individual calls. Malloc gets big chunks of memory and divvies them up for you. f. Pages dont even have to be contiguous g. In use list and free list

2.

3.

4.

5.

6.

7.

Sizeof is compile time operator. But Malloc knows how much space youre using on the heap! Malloc.usable.size() - goes into malloc and figures out the usable size that you allocated. But it isnt standard c. data structures a. How to track in use and free? i. In use will be sorted by address, and you can do log-bin search. You could also use hash. ii. Free needs to be quickly accessed, sorted by size. b. More likely way its tracked: Embed housekeeping into the heap i. Every node in the heap will have info stored left of the pointer itself. ii. Size of the node and free status will be embedded. iii. When the free, we can just take that pointer and go back 4 bytes. iv. But finding a new node is O(number of nodes) c. Their storage - the payload d. Prenode header: sizeof node and freed status. e. Implicit free list functions as an implicit linked list. When you have a free node, you can make it point to the next free node. Probably a doubly linked list. + you know that each node is at least 8 bytes. optimizing a. Our malloc probably gives you extra memory i. If you have less than 8, it gives you 8. if you have more, it might give you a number dividisble by 4 or 8. ii. If you ask for 16, it might give you 20 if it has a 20-byte block of memory, because it cant use the extra memory. b. Two competing interests i. Throughput - runtime performance ii. Density - low fragmentation. Cluster nodes with little space between them. Holes mean well use extra memory. c. Criteria i. It must fit ii. Do you want the first fit? That means if you have a bunch of small nodes at the start, you wont have to research them a lot. iii. Next fit? Start where you left off after the last one. iv. Best fit? A node thats exactly the size requested. Least extra overhead. v. Do you want to split at the end? d. Rejoining blocks? Coalescing? i. When you call free, you might track fragmentation (number of nodes and number of pages). When that gets too big, coalesce nodes together. e. How OS knows valid region? It asks if an address is in a mapped page. f. Knuths idea: make it a doubly linked list i. Put a header at the end of the node also. That makes it a doubly linked list bc you can back up. malloc16 and corrupting the heap a. some guy called tech support saying that malloc didnt work for small bytes, so he wrote a malloc16 function that mallocs 16 extra bytes. Turns out, he was concatenating a string onto it a malloced region b. if you use more space than you have in a region, youre likely to corrupt the heap. Because youll get rid of the next header. c. If you free something that isnt a pointer that was returned by malloc, it will corrupt the heap. Because it will interpret the previous 4 bytes as a header, which means it might say itll say I have 2 million bytes free! memory leaks a. gcc has tons of leaks bc it knows its run once and done so leaks dont matter. b. Apache cant have any leaks because it gets millions of requests and never reboots. c. Valgrind has its own version of the heap. It does a lot more work. realloc a. is there already extra room at the end? Cool.

h. i.

8.

b. Is there an adjacent free node? Cool. Detach it from the free list and eat it up. c. Anything else? Malloc a new region, copy over memory, free old block. malloc overhead? a. A few global variables. b. Maybe a free list or two (ie, a segregated fit free list - here are 12B free nodes, here are 16-32B nodes,

11/6/2009 - Assembly wrapup, Make, Preprocessor, Linker


1. intro a. well regrade one of your assignment 1,2,3. you can rework it, and youll get back 75% of the points. b. Well try to get assn3 back this week register saving a. caller saved: eax, ecx, edx b. callee saved: esi, edi, ebx what about context switching within OS? a. Freeze dry and restore. b. Every core has its own registers how code REALLY gets compiled? a. Makefile b. preprocessor c. Compiling d. Linking make a. not something that only works for c. I used make to assemble my websites! b. Idea: there are dependencies c. On the left: target d. On the right of the colon: depends on other files existing e. By default, it knows that .o files probably come from associated .c files. f. If it lacks a dependency, it will look for how to create that dependency g. $ = variable h. $@: the name of the target i. Noone ever makes a new makefile. They just copy an old one and slightly edit it. preprocessor a. what is preprocessed? i. #define (constants / macros) ii. #include iii. #ifdef iv. #ifndef v. Removing comments vi. String concatenation: abc def --> abcdef. vii. __LINE__ (string version of names?) viii. White space reannangement. b. We tend to use all uppercase for #define things. c. #define is totally text find and replace. You give a token -- whitespace delimited -- and replace that the rest of the line. d. Gcc -E // i. Run preprocessor and output and stop ii. It has to add line numbers so that it can give the correct error messages e. Make sure you dont have errors in your #defines. Like having semicolons at the end f. Macros i. Faster / more efficient than functions. We dont need to save state, move around, shuffle registers

2. 3. 4.

5.

6.

g.

ii. And since its just a find and replace, you dont need to deal with type rigidness. Functions dont work for multiple types, but macros are. iii. But function call overhead is not that big. iv. You can also do this with inline functions v. But its also really easy to make bugs there. And its easy to do a lot of recalculate stuff vi. When writing macros, you need to put parens around every instance of x so that longer expressions dont completely fail. vii. Whole macro needs to be parenthesized. viii. Ie, 1. #define ABS(x) ((x) > 0 ? (x) : -(x)) ix. But theres no way to get around the reevaluation. #ifndef i. We dont want something to be defined twice. Which would be a problem bc everything includes stdio ii. C doesnt care about redefn of prototypes iii. C DOES care about redefn of types, which is a problem. iv. C DOES care if you redeclare a variable, which is a problem. v. _DEMO_H is general convention for names

7.

8.

modules a. Preprocessor runs on a per-module basis. b. There is independent module compilation c. It compiles each thing individually d. That way, only the ones that change have to be recompiled. e. Any dependencies might also need changes. f. In the .h file is everything that is part of the public interface. g. Extern: global. By default, its extern, so people wont say it. Functions are also extern by default. Structures are also extern. h. Static: private to this module. TOTALLY DIFFERENT from c++ static. You want to declare stuff static unless you know you want it to be used everywhere. i. Theres nothing aside from extern and static. Either global or private. j. Dont pollute the namespace. k. Nm : symbol table. l. Size demo.o tells you how big each segment is. linking a. taking together two object files b. just tidies up references between different object files c. undefined refernecs or multiply defined references are the only errors that linker can throw. d. Linker doesnt take whole of system library code and reduplicate it in every executable. It leaves behind breadcrumbs saying where it needs to come from. Every process that runs concurrently using the same system library functions can share those instructions in data. Ie, prinf@@glibc_2.0 e. Compiling as static forces this redundancy f. ld is link editor. collect is a part of ld. And this will tell you if multiply or undefined refernces.

11/4/09 - section
x/10wx $esp at breakpoint, you can say up and go up a stack level. stack smashing protection -- puts arrays far away from stored ebp. That means that an off by one error wont mess up your stack by quite as much. Gdb: watch Gdb: display

leave moves stack pointer up to saved ebp and then pops it pop just pops the ebp. limit

11/2/2009 - The Stack


1. intro a. b. c. d. e. f. were sorry that were bad at getting assignments back to you. Well try! Midterms graded. I wanted to give you all a big hug! I hate giving exams I learned that you all dont know IA32. lab problems are like my children. I love them. And its hard for me to cut them. I really just need to find my least favorite g. turns out there is a big correlation between people who got the floating point question in lab and the people who got the floating point question on the midterm. For the others, it was a matter of I didnt get this in lab. I dont get it any more now than I didnt get it in lab h. Extra space in output shouldnt be like MINUS 40: if were going to grade 140 assignments in a timely manner -- which, apparently, we arent, but if we were -- we need autograders. And inserting extra space can screw that up. Thus, sanity check. now, Im going to let you read about the switch table stack a. grows top-down, starting at the middle of your ram b. deep recursion lets stack frame grow deep. c. Contains parameters, local vars, and housekeeping stuff related to knowing where control is going. i. Backtrace ii. Other scratch space -- ie, computing temporary results d. Fast place to use memory. push just adds something to the stack and adjusts the stack pointer. Push and pop just take one arg. They just subtract from the stack pointer. e. Convention about where it puts parameters and how to return to stack after a fn call (which means reinstating info, getting back stack context so that I know where everything is) f. Parameters are pushed right to left. Binky(3.8) pushes 8, then 3. this is important for printf, because that way it knows where the first parameter is. i. Either push $8 push$3 ii. Or minus 8 from stack ptr or iii. Call Binky g. Call i. Changes eip to adjust to binky ii. Saves current value of eip -- the instruction right after the call -- and pushes it to the stack. iii. Aka return address h. Ebp points in the middle of the stack frame. Beneath parameters and before local vars. The sp changes a lot as we use the stack for scratch. The bp doesnt change as much. We keep both bc otherwise we would need to keep track of how much was on the stack i. Binky saves old value of ebp (push %ebp -- mains base pointer) j. Sets ebp to be current value of sp. So, first parameter (from left) is ebp + 8, next parameter is ebp + 12, saved bp is ebp, and return address is at ebp + 4. k. When done i. Unmake space on stack ii. Set ebp to what its pointing to -- the saved ebp. Mov (%ebp) %ebp iii. Pop things off of the stack / add offset back to ebp l. Nothing will be below esp bc esp is below the stack. Unless you do screwy stuff m. Something similar for main, too!!

2. 3.

4.

Stack might make a big chunk of space rather than changing a bunch of times to make space as it goes o. Parameters have to be consistent bc caller and callee have to know what theyre doing. But locals have more variation bc only the inside function needs to know. This means that padding has to be consistent p. where in gdb does a backtrace q. fr 5 info frame r. Doesnt record old stack pointer bc where the ebp points is where stack pointer was when I got here. s. Cant return address of a stack var. bc the space is deallocated. But its dealloccated by changing the pointer, so it might just leave the contents there if noone else writes over it. i. At Next, we had a bug like this for years: a big char buffer that was declar locally, and it all worked bc the stack grows down bc youll only use the lower stuff at the buffer. ii. Changing to HPPA architecture makes the stack grow up. Which broke it. Bc the hot activity is at the lower indices. iii. Valgrind isnt very good at stack, so it might not catch this either. Stack is harder to track bc everything is contiguous. Accessing past end otf array means that its hard to tell if unintentional access. questions a. how does using registers rather than memory work with state saving and such? i. Register pasasing rather than memory passing might be a bit faster, but there needs to be agreement, and libraries arent compiled that way, so it woludnt work very well. b. How do registers work when you have a new function? Ie, what if youre using eac for something before? i. 3 registers are reserved for the function that gets called. That means that you need to back them up before you call a function if you want them to stay the same. ii. 3 registers are reserved for the function that makes calls. That means that you need to back them up if you use them within a function, and reset them before you return. c. How does stack work considering that other programs are using ram too? You dont know that the stack is completely yours, do you? i. Multiple stacks. One for each thread. ii. Each stack has a maximum amount of space that it can use. In other words, a lot of space is allocated for it ahead of time.

n.

10/28/2009 - section
Pipe: output of one command --> input of another. Output of strings --> input of grep: Grep Warn --count

10/26/2009 - Assembly + Control


1. intro a. midterm Friday. Still no room. Ill post to website and email you. b. Assn 4 tomorrow night c. No man page for open? Need to use man 2 open. Working in assembly a. ptr = &arr[3] i. leal -24(%ebp,3,4), -%ecx(it could collapse it down to -12 with some optimization, but not without it) ii. mov %eax, -8(%ebp)

2.

1. b. *ptr

load the address of arr[3] and store it in ptr, which is at -8(%ebp)

c. d. e. f. 3.

i. Movl -8(%ebp), %eax ii. Movl (%eax), %eax iii. (only allowed to do one memory read per instruction, so you cant do double-dereference. *ptr + 31 i. Add $0x1f, %eax ii. ($ is immediate constant) Num i. Mov -4(%ebp), %ecx Arr[num] = *ptr + 31 i. Movl %eax, -24(%ebp, %ecx, 4) What happens when arr[num] segfaults? i. the OS says heeeyyy -- no memory for you! ii. Calculating bad address is fine. Read/write causes error.

4.

5. 6.

Typecast a. Rather than mov (%eax), eax to dereference, it would movb (%eax), %al and would change offset scaling to by 1. b. If it needs to do datatype conversion, thats just one assembly instr control structure a. label (outdented; word followed by colon. In human-readable assembly, itll be ) i. Loop: ii. Incl %eax //increments eax iii. Jmp Loop b. Conditional jumps: first do an operation (usually a compare -- a cmp). There are condition flags. Like the carry flag, overflow flag, zero flag, etc. Records the result of these flags when you cmp. Then, j uses them. The flags are kind of like a register (not implemented the same) i. Cmp %eax, %edx //subtracts eax from edx ii. J! //jump if edx is less than eax iii. Je //jump if equal iv. Jne //jump if not equal v. Jump label/target. vi. Jns (no sign bit set) vii. J there are tons of them. viii. Jz -- result is zero c. Ie i. Cmp a, b ii. If a != b, jump over instr d. Loops i. Usually go through body and jump around a lot. ii. Jump down to test iii. Do stuff iv. Jump up to instructions v. Jump up to top vi. --> one jump per loop and one outside. vii. Valgrind err: conditional cmp with uninit val (if you read uninit val) e. Switch i. Series of cascading if/elses and lots of unconditional jumps whenevr theres a break statement. other switch next time: makes switch table. If mayn optns close together, you can sort of treat like index and imagine an array of targets. gdb a. disp b. disass c. p $eax d. info reg

7.

q a. how do the registers work, given multitasking?

10/23/2009 - assembly data layout, assembly operations, alu basics


1. intro a. assn1 grading i. email us if you havea question ii. come to office hrs if you have lots of questions iii. we can show you sample code. b. Midterm i. Practice midterm (last springs midterm) is a handout now. ii. We might also do an ia32 question iii. Dont take late days for assn4 because the midterm matters more. You may, though. c. This week and next, you should read the text. Because its dense. data layout a. ie, i. disp(base, index, scale) 1. *(base+disp+index*scale) 2. -16(%ebp, 2, 4) == -8(%ebp) 3. Base -- the start of the stack frame 4. Disp -- where the variable lives on the stack 5. Index*scale -- going into the variable. b. Void binky (int a) i. Int b, c, c. Assembly of binky i. Ebp is a pointer to to the function. Base pointer. ii. Parameters are stored on positive offsets. Parameter 1 is at +4 offset, param2 at +8 offset. Stack variables are at -4, -8, etc. iii. Parens in assembly are like * -- dereference. d. `void ArrayPtr() { i. int *ptr, num, arr[10]; num = 8 e. ebp ebp at 0 ptr at -4 num at -8 arr[0] at -48, arr[1] at -44, etc. because arrays are always stored with lowest index at lowest spot in memory. That way, you can always add to an array to get subsequent indexes. v. Often, padding to wordsize of machine. So char, int, char will have char at -4, int at -8, and char at -16 even though theres just wasted space. Generate assembly. i. Gcc -m32 -S demo.c 1. Capital S. 2. Produces demo.s that is assembly emission. ii. Objdump (it has a man page). Can use objdump on an executable. Can see disassembly. If compiled with debug info in, you can extract assembly right next to source code iii. Objdump -S -d Looking at assembly i. 0: push %ebp ii. Number of bytes offset Lea i. ii. iii. iv.

2.

f.

g. h.

3.

i. Load effective address ii. Like a move without an indirection. iii. Base + disp + index*scale. Dont load that. Just put it in the register. Store it in eax, for instance. Deal with pointers. iv. Leal offset(base, index, scale), address to store it in. v. Sometimes used to compute simple polynomials. Integer arithmetic. Faster than arithmetic logic unit for simple stuff. i. Movl i. Value to push, Offset(base, index, scale) j. Shl i. Does a leftshift. Bitshift left. Same sa multiplying by 2. shl 2, %eax multiplies eax by 4. ii. Imult multiplies k. Assembly stuff always operates on itself. Stores in the last arg. I think. l. Scale must be 1,2,4, or 8. m. Structs i. Struct binky { 1. int num 2. char letter; 3. int *ptr; ii. Void Structs() 1. Struct binky *ptr, b; iii. You can turn off padding in gcc, but if you dont, binky will be 12bytes: 4b for num, 4b for letter even though only 1b is used, and 4b for ptr. iv. Assembly 1. movb moves a single byte. n. Movl v mov i. Mov infers from operand how much youll move. ii. Movl is move long o. you do know how o hotwire your car, but you should still use the key i. Just because you know exactly how far in front of the struct is the data doesnt mean you should use it. Use c, not assembly. ii. Cs110 told us that we fail. Because the old cs107 paradgm was have void*, will travel. My reputation depends on your void* usage. p. Gcc convention: eax is return value q. Eax is return / scratch r. Ecx is scratch s. ebp is base pointer t. Esp is stack pointer u. Others are less used. If you need more scratch space, you might use others, but you have to save that stuff, so less cconvenient. v. Accessing the cache? i. Totally opaque to the programmer. You just say get the memory. ii. The chip knows. alu - arithmetic logic unit. a. Imult (integer multiply) a, b, dest b. To multiply: load into register, multiply, store. c. Sometimes wont even write to stack; will just leave in register bc it doesnt need to take it out. d. Add a, (to) dest e. Sub (you can use memory address or constant), (from) dest f. Subl $1, dest (you can use memory aperand. Like -8(%ebp) //b-g. Shll -0x4(%ebp) //thing at ebp-4 *= 2 h. Not dest i. And src, dest

4. 5.

control structure function call protocall (later) a.

10/21/2009 - section
Chars always converted to ints when you do math Float: Sign bit (1 if positive) 8 bits of exponent. (x - 127) 23 bit Mantissa/significand: 1.xxxxxxx 1 + 1/2 + 1/4 + 1/8 + + 1/2^23 (Sign * (mantissa)*2)^exp Bit extraction: unsigned int bits = *(unsigned int*)&f SIGN MASK = 1 << 31. (a 1 followed by 31 0s) EXPONENT_MASK = 0xff << 23

10/19/2009 - Assembly Basics


1. intro a. what if you want to zero-fill the memory? i. Use calloc rather than malloc. Calloc interface: nElems, elemSize. Rather than just giving the size, you have to give 2 parameters. It works the same, though. b. on forum i. put in the green checkmark when your question is answered! c. assn4 will be smaller. d. Youll have breather for assn5. e. midterm will be 1wk from Friday i. open book, open notes Assembly! Background. a. X86 chip only understands primitive instructions. Add numbers, move data from here to there. If equal, do this. b. C source is analyzed by compiler (parens, syntax) and turned into sequence of x86 instructions. c. Compiled for one particular architecture. Once compiled, you cant move it from machine to machine. The ISA (instruction set architecture) must be the same. d. Instructions are in address space. x/4bt main in gdb will show binary instr rep of your main function. e. x/2i main instructions. I = instructions. f. In ia32, each instr takes between 1-7bytes to be encoded? g. Assembly and machine instructions are one to one. Assembly is more for human reader. h. Assembly-->machine code = assembly. i. disassem functionName disassembles it in gdb j. on mac OS9, there was no source code debugger. You needed to use the assembly debugger. k. Now, youll never have to deal with assembly. l. Assembly helps you think about your code i. Whats fast? Whats not fast? Whats the symptom of a bug? How can you avoid it? ISA a. Each chip has a comprehensive document. Op codes, instructions, args, datapath, limitations, scheduling ops, superscalar instr to do things in parallel?, b. 3k pages for ia32

2.

3.

4.

5.

6.

Tradeoffs in design i. Reduced instruction set v complex instruction set. Risc v cisc. ii. Risc: you can do what you want very well and very quickly. iii. Cisc: special optimized instructions: advance pointer, test against null is something done very commonly, so lets make one instruction for it! Theres tons of feature creep. iv. Univ pres was involved in design of risc. v. Every instruction is same size in risc. That means you can figure out where each instr is. Versus variable 1-7byte instruction. vi. Its possible to only use arrow key in emacs. Or, you can use esc+< to get to top. If you learn all of those, youre cisc. Youre slower bc you have to think about what you do. Do you just forget whats in there? vii. Gcc doesnt admit fancy instructions after the line in the 80s unless you tell it to. d. Massive inertia in ISA. Each subsequent iteration needs to maintain backwards compatibility. You cant get rid of things. Intel: 32-->64: chance to make a new instr set. So, they moved to a risc model. AMD used old style. AMD used old style and beat Intel to marketplace, so intel sort of failed. what does assembly look like? a. Storage on the chip itself: registers. In IA32, there are 8. They are very fast. i. EIP: instruction pointer. Dedicated register to instruction. Fetches instruction and then executes it. b. On cpu: Condition codes: did the last operation end in 0? Did the last op Has lots of info about the last c. Cache: faster than memory. d. In memory: i. object code. ii. Program data iii. Runtime stack e. The 8 registers i. %eSOMETHING 1. the e is legacy thing -- e stands for extended. Bc it changed from 2byte to 4byte. ii. Esp: stack pointer iii. Ebp: base pointer iv. Esi: source index v. Edi: destination index vi. Eax: accumulator register. Used for arithmetic. Uses for return val of function vii. Ecx viii. Edx ix. Ebx x. They might have come from a specific thing, but theyre fairly general now. xi. Number of registers is very cramped. turning c into object code a. make --> gcc (which isnt a compiler: its a compiler driver. It invokes other things, like cc, the c compiler. ). (cvector.c) b. C source --> asm source Using compiler. (cvector.s) c. Asm --> object code using assembler (cvector.o) d. Object --> executable using linker(cvector-test) e. Ie, sum function listed on slides. assembly characteristics a. minimal datatypes. very little evidence of datatype i. integer data of 1 2 or 4 bytes ii. pointers are unsigned ints iii. floating point data of 4,8,or10 bytes iv. no explicit aggregate types such as arrays or structures. Just constructed f rom primitives laid in sequence. b. Primitive ops i. Performs arithmetic function on register or memory data

c.

ii. Transfer data between memory and register 1. load data from mem into reg 2. store reg data into mem iii. transfer control 1. unconditional jumps to/from procedures 2. conditional branches 7. moving data a. b = 1byte, w = 2byte, l = 4byte i. word = 2 bytes is just a legacy. Because words that we use are 4 bytes. ii. l = long iii. b = byte b. the mov instruction - one of the most common ones. c. General form: movx src, dst i. Intel version reverses dst and src. ii. Different conventions on brackets and parens. d. Movb $65, %al i. $ = its a constant. Thats encoded as part of the instruction itself. ii. Take the byte 65 and put it in lower byte of e. Immediate i. Constant data prefixed with $ ii. Number not prefixed with $ is interpreted as fixed memory address. f. Register i. register name prefixed with % g. memory i. registery enclosed in parens or fixed address. Like dereferencing a pointer. h. cant move from memory to memory. Can move from immediate to register or memory. From register to register or memory. From memory to register. addressing modes a. (cisc nature) b. Direct: fixed memory address. Global/static. c. Indirect: register holds memory address. Pointer. d. Base + displacement i. Constant value or fixed address ii. Used to access data with offset from base: struct field, locals/parameters (expressed as offset within stack), some array access and pointer arithmetic. iii. Ie, 8(%ebp) e. Base + scaled index + displacement i. -8(%ebp, %esi, 4) 1. displacement(base, index, scale) 2. like an array: base + index * 4 ii. displacement can be constant or fixed iii. base and index must be registers iv. scale can only be 1 2 4 8. units of char, short, long, double, and nothing else. f. Special cases i. -8(%ebp, %esi) 1. no scale 2. used for char array elem, struct field ii. (%ebp, %esi, 4) 1. used for array elems, pointer arith. Friday: manual translation.

8.

9.

10/16/2009 Zahans Notes


- Unisigned is used wrongly by programmers to ek out larger range from byte. - don't - acts weird in extreme cases - Correctly used for bit masks, individual bit manipulation - short, int, long - default signed - char - no default - when converting from smaller to larger type, tries to preserve sign of value - -1 in two's complement notation - short - 11111111 - convert to int - it will replicate the signed bit - 11111111 11111111 -> -1 in two's complement - converting larger type to smaller - 255 - int- 00000000 11111111 - conv to short - 11111111 -> -1 - #define EOF -1 - when will ((ch = getc(fp)) != EOF) not work? - We don't know whether ch will be signed, or unsigned - value returned by getc is truncated to char and then promoted back to int for comparison - if ch is unsigned - the rest of ch during promotion is zero-filled, 00000000 11111111, does not hit EOF - if ch is signed - it works - -1 - rest of ch is one-filled - preserves sign - Fix for this is not to hope that machine makes char signed by default - just use int ch - Fractional representation - fixed point - can use bits for powers of two -xxxxy y y y - 8 4 2 1 .5 .25 .125 .0625 - 11111111 - 15*15/16 - no negative numbers - cannot represent exact real numbers - 1/3? - floating point - use bits for powers again - rep significant number and powers separately - 32 bit float - MSB used for sign bit - 8 bits for exponent -> -128 to 128 - exp bits all zeros - denormalised numbers - exp bits all ones - special, inf, Nan, div by zero - 23 remaining bits used for significant bits - to rep 5 - the number is normalised down to 1.01*2^2 - makes exponent from -128 + whatever - shows up as .01 * 2^2 - 1. is implicit - denormalised numbers - where exponent is assumed to be zero - Next week - Machine instructions, IA32

10/14/2009 section
memcpy is faster, but if theres overlap it can fail memmv wont fail with overlap casting primitives converts (ie, int to float). Casting pointers just reads straight bits.

10/12/2009 - binary, memory structure, data representations, bitwise operators, conversions


1. binary + memory structure a. why dont we use base 10? Hard to distinguish between 10 signals esp with noise and vacuum tubes. Instead, we just have to distinguish between 2. b. Bit: binary digit c. Cant get far with one bit, so we use sequences. 8bit = byte. Smallest addressable unit of memory. d. Memory goes from 0 to 4GB. Each part of memory has a unique address. e. Different parts of your program go to different parts of your memory completely differently. Once you get familiar with addresses, youll learn what looks good for heap v stack. i. Your code, in binary, is read from fairly low data. Global variables (read only and read write sections) are just a little bit above that. ii. Halfway through the memory will be your stack. Stack typically grows downward (on intel architecture) iii. Heap starts low (above your globals) and grows up. Not very ordered, though. iv. Big space between stack and heap. v. Large section above the stack thahts unmapped. The heap can jump over the stack and go up to the top. And some lower level allocator functions other than malloc can specifically use that space. f. There is no runtime tagging of addresses. If some bits store an int and you read it as a double, it wont be what you want. g. Pages (typically 4k) of allocated memory. h. Your code is the Text segment. Global data is the DATA segment. Stack is the STACK segment. If your memory is within the segment, random data. Else, seg fault. Char a. Chars are 1B. 256 possible patterns. b. Most to least significant bit c. 0-255 d. Ascii is the one (there used to be another one. Endibec?). lower ascii is totally standard (only using the last 7 bits - 0-127). Upper ascii - extended is nonstandard. Stuff with accents. There was not widespread agreement. Thus, standard that you say what characterset you use. Line Feed (10). Character Return (13). We dont use Teletype Terminals. Unix puts in a 10. Old (pre Next) macs used 13. Then, unix transitioned to 10. PC uses both: CRLF. Lots of programs will try to cope with nonstandard line endings. Char (1), Short (2), int (4), long (4-8), long long. The only ANSI requirement is the order between these -that ints are at least as many bytes as shorts, for instance. a. Endian. Which is the most significant byte? The least significant bit in each byte is to the right. b. Intel is Little Endian. That means that the lowest address is the smallest part of the number. c. Network order is big endian. d. Printing a byte will always print that byte in big-endian format even though bits might not have any sane ordering. Decimal a. Decimal doesnt have a nice mapping.

2.

3.

4.

b. 5. Gdb

Hex does. 0-F. Covers 1/2 byte. 0x11 = 16+1 = 17.

6.

7.

8.

9.

10. 11.

12. 13.

a. List lists code b. p ch if its typed, it knows how to print it. Prints ascii val and nonascii c. p/x ch prints in hex d. p/t ch prints in binary e. x &i examine. Prints the actual memory rather than the f. x/4bt &i examine 4 bytes. Display them indivduall. Dont reorder them. g. Uses the ? whenever it sees a high order ascii char. h. x/10i main prints 10 instructions starting at main. Push, move, pop, call bitwise ops a. &, |, ^ (XOR), ~ (inverse) b. Only work on lower order stuff (char, short, int, long). c. << pushes the bits to the left or to the right by some number of spaces. <<2 pushes bits 2 left. Adds 0 to the right and creates data loss when something goes off the edge to the left. d. >> right shift. e. Left shift and right shift is multiplying by 2 and dividing by 2. f. In many architectures, this is faster (particularly division), so compilers g. Bit masking: when trying to pack a data structure tightly, you store lots of different stuff in one byte. Storing lots of bools word size a. int, pointer, float word size. b. Size that its convenient for the machine to extract. 32 bit machines are good at extracting 32 bits at once even if they only need 8 bits. arithmetic a. addition i. add and carry! Same thing you do in grade school. b. Multiply i. Multiply. ii. Shift for 10s place, etc. iii. Add each thing. iv. (just like you do to multiply by hand) Signedness a. Most significant bit is the sign bit typically. b. Numbers wrap around. So, if you have a 4 bit integer: 0000 is 0, 0001 is 1, 1111 is -1. Then, adding with a negative number is the same as adding in general. Subtracting means flipping one bit and then adding. limitations in the number system a. you can wrap around. b. There is one more negative number than positive number. conversions a. smaller to larger (int i = ch) takes the least significant byte and assigns that. b. ch = i //can cause info loss c. i = ch: Code Floating point: float, double

10/9/2009 - function pointers, bit operations, data


1. 2. intro a. use discussion form! b. Change in office hrs schedule. function pointers a. allows generic functions.

3.

Void mapArray (void *arr, int size, int n, void (*fn)(void *)) { //left: return val. Right:parameter. Having the return type and the parameter types exactly match what we want i. For (int I = 0; I < n; i++) { 1. void *nth = (char *) arr+(size*i) //we cant just dereference because there is no reasonable interpretation of dereferencing void *. We cant array[I + 1] because arrays use the size of the thing. Thus, typecast is necessary so that we can add stuff to it without scaling 2. fn(nth); c. *(int*) myVoidIntPtr d. Client data pointer auxiliary data pointer i. Void MapArray(void *arr, int sz, int n, void (*fn) (void *elem, void *data), void *data) e. PrintInt with parens invokes it. PrintInt without parens is a pointer to it. You can also say &PrintInt. And you can (*PrintInt)() to call it when its passed as a function pointer. You dont have to, though, and its cleaner without. f. How to use it if you dont need client data? Pass in null pointer, have a void* in the prototype, and just ignore it. g. Void* can be assigned back and forth with other starts. Ie, for void *ptr: char *str = ptr h. The callback gives you a pointer to an element in an array. So, if you have char*s, the pointer would be a char **. commandments a. be wise in the ways of pointers/memory b. prefer array notation to pointer arithmetic where possible i. arr[index] is easier than ii. *(arr + index) iii. *((char *)arr + index*(sizeof(*arr))) c. use void* only when you must i. if you know pointer type, dont keep it secret! d. use memcpy only when you must i. if you know type being copied use assignment e. prefer stack to heap allocation where possible i. cheaper, more readable, less potential for error f. dont declare/pass vars with unnecessary levels of indirection i. use extra layor of pointer when needed, not just because you can g. use pointer typecasts exactly and only when required i. dont ignore warnings about pointer mismatch. Dont cast indiscriminately.

b.

4.

10/7/2009 - Lab2
&array on the stack is the same as array because its just a virtual address in the compiler. When you pass an array by value, it actually needs to pass a value.

10/5/2009 - void*, c generics, function pointers


1. 2. intro a. courseware is still funky. It probably wont completely fail again, though. string constants a. char buf[10]; //allocated in the stack b. char *p = malloc(10); //allocated in the heap c. char *lit = cs107; //allocated in the area for global variables. Area marked read only. Handled at compile time. d. lit[0] = x; //will likely fail

3.

4.

5. 6.

char s[] = Stanford just makes a 9-size array on the stack. Just like when making an int [] = {3,4,5}. f. Char *s = Stanford does use a string constant g. Char *s[] = {Stanford, university}; //an array of char* string concatenation a. printf(first char is + lit[0]); i. first char is = char*. Lit[0] is an int. ii. first char is + lit[0] = adding an int to a string, so it moves the pointer lit[0] characters to the right. iii. DONT use + for string concat. void* a. pointers are all as big as each other bc they point to memory, so you can use one pointer rather than another and it will still compile. It can mess up, though, because swapping two ints will just swap 4 bytes of the pointer passed in. b. making a general function to work with different data types c. rather than using int*, use void*. Also need to pass in sizeof int. size_t is a data type that is exactly like an int. d. char tmp[sz]; //you cant have a void array. You use a char array bc chars are 1 byte. Sizeof(char) is defined to be 1. Could also use void *tmp = malloc(sz), but the stack is less error prone, cheaper, faster, better, so dont. e. it is not legal to dereference a void*. Derefrenecing requires knowing how many bytes to read past the start of the pointer. Need to do a manual copy: memcpy(destination, source, size); //memcpy is defined in string f. what happens if you do funky stuff? Void* gets rid of compiler error checking. i. Pointer mismatch: int*, float*, 4 1. no warning; exchanges the values, but those values are reinterpreted incorrectly once the original function knows what type they are. ii. Size mismatch: int*, int*, 2 1. swaps the first 2 bytes of each number. iii. Not a pointer: int, int, sizeof(int) 1. compiler warning. 2. wouldnt compile if you used something less like a pointer than an int -- if you used a struct or a float. 3. runtime accesses memory that it cant access. You probably arent at a reachable 4. segmentation fault is cs way of not having to say Im sorry iv. Wrong level of indirection: char *, char*, sizeof char* 1. need to pass char**, char**, sizeof char* if you want to make the pointers point to each others data. g. Generics have to be done through void* since there is no template. h. Arrays are not pointers! Array gives you &array[0], but there is no pointer, so you cant actually edit the pointer. You cant &&array[0]. function pointers preview a. the client of the generic can provide the function for you to use in your generic. q+a a. strdup is not ansi standard. Provided as a gnu extension. However, because it allocates memory, people dont like it. b. Bus fail v seg fault? c. Char *s = &arr[0] //does this work? Why did c designers decide to make it different? d. Unicode?

e.

10/2/2009 - pointers, malloc / free, strings


1. intro a. yes, every malloc call needs a free call

2.

3.

4.

no, there are no conveniences for making space or anything. You can put a null at the end to shorten it. You can str+3 to ignore the first 3 spaces. pointers, passing pointers, stack, heap a. main i. int nums[4]; ii. int *ptr; iii. ptr = &nums[1]; iv. ptr++; v. Fill(nums, 4); b. void Fill( int *arr; int n) { c. you can say int arr[]; you can say int arr[4]; putting a number in the brackets is ignored, so cut it out. The one piece of info that is passed is the memory address pointed to. Difference in notation is only for communication: int* probably means one int pointed to, whereas int arr[] might refer to more. d. There is no automatic mechanism to know the size of the array. You have to tell it that as part of the parameter passing. e. what if you want to copy an array? Do it manually! The one way that you can force it to happen automatically: structs are copied ,so a struct with an array in it will be copied. But thats clunky. f. Sizeof(nums)/sizeof(nums[0]) will get the number of elements ALLOCATED to the nums array. i. However, when you pass the array into a new function, the compiler doesnt know that. So, the size of a ptr will be 4 -- because its just one block pointed to. ii. Sizeof(nums) is the one situation where arr =/= &nums[0] iii. You cant use sizeof with anything dynamically allocated. g. C99 lets you change the stack allocation size. You can say int arr[n]. it is super handy. You wont see it much bc everyone else is a dino. h. If you allocate an array on the stack and return the pointer, BAD. The array went out of scope. i. Pointers are passed by copy. Need to use a double pointer (or &ptr in function call) if you want to change where the original pointer points. If, within a function, there is an unadorned change to a parameter, its probably wrong; you should j. Reading and writing null pointers will seg fault. k. Shorthand for stack array: int num[4] = {2,4,6,8}; //youre allowed to drop the size also: int num = {2,4,6,8}; Then, youll often figure out how many numbers are in it i. If you have a mismatch (ie, int num[3] = {2,4,6,8};), compiler will give you an excess initializer warning. ii. If you leave some uninitialized, thats fine. iii. Often will be int n = sizeof(nums) / sizeof(*nums)); right after declaring an array with curly braces so that you know how many elements you put in it. malloc and free a. int *arr = malloc(sizeof(*arr) * n); i. you give malloc one arg: the number of bytes that you want. ii. Mallocs return type is void*, which is compatible with other stuff. So you dont need to type cast it. In C++, the cast is required (but Julie thinks thats a mistake in c++: you should type cast when its required, but putting them in when clean suppresses compilers type system, which protects you). iii. Malloc returns null if it wont work. Its a good habit to do because it sometimes happens, but its nonessential. b. free(ptr) i. you free the piece of memory that was malloced. ii. What if you try to free something that isnt the first memory location that you were freeing? What if you free something that was already freed? What if you try to free a stack array? All bets are off. iii. After you free ptr, null it out so that you dont accidentally use it. c. there is exactly one free call per malloc call. d. Realloc lets you resize heap memory. debugging string code a. void mystrcpy(char *dst, char *src) {

b.

i. dst = malloc(strlen(src)); ii. while (*dst++ = *src++) {;} b. c. } Debugged: void mystrcpy(char *dst, char *src) { i. dst = malloc((strlen(src) + 1)); //but delete this entire line ii. while (*dst++ = *src++) {;} iii. dense stuff i. strlen will not count the null character at the end. ii. but it doesnt ever return, so you dont need to malloc it. iii. In strcpy, they let you allocate the memory. They dont deal with it. We need to allocate the memory first. iv. The bind on the ++ is tighter than on the *, so the loop works exactly is desired. v. When you copy a char thats nonzero, the loop keeps going, so when a null is returned, you cant keep going. After it copies the null, it works.

d.

9/30/2009 - Lab 1
1. GCC gcc hello.c -o hello //outputs as hello rather than a.out gcc -g hello.c -o hello //adds debugging tags to code so you can step through 2. Make make //runs a Makefile in current directory. Name of program: file it needs, command you use to compile hello: hello.c gcc -g hello.c -o hello clean: rm -f hello make clean //runs the clean tag. If you dont provide a tag, it runs the first tag in your makefile. Makefiles can have variables before first command CC = gcc hello: $(CC) -g hello.c -o hello Example makefile: http://www.stanford.edu/class/cs107/other/Makefile Gnu project on make: http://www.gnu.org/software/make 3. Mercurial Master copy of the assn Clones You can check it out and check it back in as needed. Guide: https://courseware.stanford.edu/pg/pages/view/9826 Larger tutorial: http://www.selenic.com/mercurial/wiki/index.cgi/Tutorial 4. gdb gdb file in gdp: run b 15 // break on line 15 n //move on to next line p fp // find the value of fp backtrace //if it crashes, this prints the stack frame that led to segfault gdb basic commands: http://www.cs.mcgill.ca/~consult/info/gdb.html cs107 gdb reference: http://www.stanford.edu/class/cs107/other/gdbrefcard.pdf

gnu gcc reference: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/ gnu gdb reference: http://sourceware.org/gdb/current/onlinedocs/gdb_toc.html julies articles on breakpoints (http://www.stanford.edu/class/cs107/other/1637.pdf) and gdbs greatest hits (http://www.stanford.edu/class/cs107/other/gdb_jz.pdf)

9/28/2009 Intro to C Pointers


1. 2. intro a. updated office hrs b. assn1 going out soon c pointers are king a. c++ and java have limited pointers b. pass by reference must be handled manually c. c string is a char* d. arrays are pointers e. c allows interesting pointer arithmetic f. function pointers g. you can take the address of any l value and manip thru pointers why are pointers cool? a. Theyre smaller than the data they point to. b. Shared copy: so that functions can manipulate values passed in. using c pointers a. [type of pointee] *varName; i. The * can go after the type or before the variable, but its good to put it before the variable: ii. Int* a,b; iii. Int *a,b; iv. Each of the above will declare a to be a pointer and b to be an int. v. Needs to be int *a,*b. Having the * before the var makes you remember that. b. Dereference: * left of the pointer c. Reference: &varName d. int *p; int num; printf(%d, *p); p = &num; e. using uninitialized pointers can cause the runtime enviro to crash. Segmentation Fault you attempted to reach an address outside of your defined segment. f. Typesafe: you can only point to an L-Value. 3 or num + 3 cant take assignments. It has to be a variable. And the variable has to be the correct type. g. c forces you to manually pass by reference i. you need pass &varName when calling a function and have var *varName in the function declaration. This increases code readability: you know when youre passing named constants a. #define SENTINEL -1 //no semicolon b. Compiler runs through the code in a pre-pass looking for the defined constant and replacing it. c input a. printf is output b. scanf is input: i. scanf(%d, *num) ii. scanf returns the number of successful conversions made. arrays a. int main() { ant a,b;

3. 4.

5. 6.

7.

ant arr[3]; int *p, *q; //imagine we assigned a,b,arr[0]..arr[2] to be 12345 respectively p = &a; q = &arr[1]; *p = *q; // a now stores 4 p = q //now, p points arr[1]; p = NULL; // used as a sentinel value. Points to 0. the pointer isnt usable 8. pointer math a. if (p==q) // they point to the same location b. if (*p == *q) // the value that they point to is the same c. if (p < q) // p is in a lower address in memory. Not commonly used. d. p and q both point to 1012 now. a = *(p+1); // in pointer arithmetic, p+1 advances one pointer unit. It adds by sizeof the var. So, int is 4 bytes, so (p+1) == 1016. e. You can add too much to pointers if you add too much, it might overwrite something you dont want overwritten. Same with arrays. (P + 10 ) or (arr[10]) would both just go beyond the bounds of the array. f. You should only use pointer arithmetic when dealing with self-managed arrays. Else, it will go off into random memory. 9. arrays and pointers a. p = &arr[0]; b. p = arr;; //exactly the same as the above line. Means you can also apply * to arr. c. a = *(arr); d. a = *(arr+2); e. arr = p; //FAILS. arr is not an lvalue. Arr == &arr[0] f. a = p[0]; //you can use array syntax with pointers p[1]++; g. ptr[num] === *(ptr + num) h. you can use that with any address: &a[4]; i. ADDITION IS COMMUTATIVE: 3[arr] === arr[3] j. You can subtract pointers: p[-2] k. You cannot multiply pointers l. You can subtract pointers from each other: p-q; //takes the number of neighboring elements in them. Address of q minus address of p divided by sizeof their type. 10. misc a. while(1) rather than while(true) though c99 supports true b. sizeof (type or expression) i. sizeof(int) ii. sizeof(a)

The C Book
1. %[#]d: print a digit right aligned in # amount of spaces. Ie, %2d where d is 5 = 5

9/25/2009 Intro to C
1. C background a. quirky, flawed, and an enormous success Ritchie b. Low level OS, fast, direct access to hardware. c. c combines all the elegance and power of assembly lang with all the readability and maintainability of assembly lang

2.

3.

4.

5.

Small language footprint: you can learn almost everything about the language in 200p. its small, and its direct. You cant be an expert on c++ or java too big. e. Language inertia: you would have to rewrite something (with reduced functionality and bugs) to get it out of c. f. C was written by two guys who needed to write an OS. They included what they needed. C++ was written by a massive committee. They put in everything that someone might need. C++ is a superset of c, so everything that compiles in c compiles in c++. Thus, it didnt take away the hard parts from c. java adopts c/c++ syntax, but not prog philosophy. i. Java is the loving grandmother. C++ is the modern concerned but empowering/indepenceing parent. ii. C is the crackmom g. There are trivial differences between c and c++ (printf != cout; structs are slightly different), but I wont focus on them. minor differences a. printf b. struct, enum tags c. global constants: there is no (real) const keyword. You use #define. d. for (int i = 0;) //you have to initialize the variable outside of the loop. C99 lets you do this, but some others dont. e. bool doesnt exist in old c. in c99, it does. moderate a. no references. No pass by ref. use pointers! b. No templates. No generic programming. Trying to write one sort that works for different types of data is hard. c. No operator overloading d. No fancy libraries major a. no real string facility. So you have to deal with the terminating character. With them as arrays. b. no classes. No object. No provided objects like vector or map. c. Memory, pointers ubiquitous. malloc and free rather than new and delete some Julie code a. #include <stdio.h> int main() { printf(Hello CS107!\n); return 0; } compiling a. gcc FILENAME if you need only one file b. Compiler will produce a.out for the executables name. c. Makefile (theyll provide it for us for now) lets you manage multiple files. Then, you just call make MAKEFILE mercurial a. revision control system. Others: cvs, svn, git, others. b. Ie, keep a working version and potentially go back to old edits. c. Critical for working with multiple people. d. hg commit then, add a comment about what it does. e. hg log f. hg diff rNUMBER rNUMBER g. hg up NUMBER updates your code to version NUMBER. args a. int main(int argc, const char *argv[]) { for (int I = 0; I < argc; I++) { printf(%d %s \n, I, argv[i]); }

d.

6.

7.

8.

9.

return 0; } b. argc is the number of args c. %d says that you substitute this with a decimal number d. % buggy Julie code. a. atoi: ascii to integer b. total is never initialized c. index 0 is the name of the prog d. argc is off by one e. avg isnt typecast f. risk of divide by 0 g. exit(-1); lets you exit right out. h. Typecast: put the type in parens in front of the expression i. %d is explicitly an int. %g is a floating point number.

9/23/2009 Unix Lab


Terminal woot. whoami time [command] fortune echo stringtoecho echo n hello world runs on same line ~ = home . = current dir .. = one level up pwd = print working directory ls print whats in dir ls l detailed list ls a show all, incl hidden cd change dir tab completion man manual. man whoami documentation on whoami. And on all c commands! top other processes being run, whos running it. cat /dev/zero pwns a machine cat prints a file to terminal ctrl+c kill process? mkdir diretoryName makes a new directory cp copies a file cp r does it recursively. That means it goes into it and copies everything inside. rm delete remove rm f doesnt ask you if you wvant to delete. rm r recursively. For directories. mv [startloc] [endloc] Renaming is the same thing as moving * wildcard. You can use regex too. Ie, & putting & at the end lets you open your new program in the background. killall name If you need to quit something. Like, vim if you cant do anything. kill -9 priority kill. -nw no window. Makes it not graphical

gedit, emacs, vi

ssh X nameofcomp Y works the same. Starts xWindows putty: just lets you ssh into it xming lets you use x on windows x windows (x11, xserver) graphical display. X is client/server: the applications live on the unix server, but it displays stuff on your machine cant use guis when sshed in launch programs. Ie, evince pdf reader .cshrc c shell config. There are also different shells that you can use. alias [commandThatYouType] outputInQuotes set path = (newPathVar) to run an executable in the current dir: ./progName Else, the program will try to look in the path. emacs ctrl+x ctrl+s ctrl+x ctrl+c save close

9/21/2009 - Intro
1. logistics a. Lectures Mon + Fri (up to 75m. Julie will tyr to end at noonish) b. No lecture Wednesday. Instead, classroom across the hall (gates b08) is lab (110m). sign up on website. First come first served on cs107.stanford.edu c. New fancy website course management system d. Textbook: Computer Systems very good. + C Language text and resource. the c programming language e. Programming done on the unix machines f. Midterm in class. Final in final. g. Skills for success: i. Cs106 exp ii. Curiosity, perseverance, hard work, when to get help h. Getting Help: forum, office hours, email i. Dont plagiarize j. Unix user session during this week of labs. Go to any lab. learning goals a. mastery i. can write and debug c code with complex use of memory/pointers ii. have accurate model of address space and runtime behavior of program 1. what the compiler does for you 2. what happens when its loaded into the OS b. competency i. can translate c code to/from assembly language equivalent ii. can write c code that respects limits of computer arithmetic iii. can identify bottlenecks and improve runtime performance of c code iv. can write code that correctly ports to other architectures v. can work effectively in unix dev enviro c. exposure i. have working understanding of computer architecture ii. can trace actns of sequential / pipelined cpu iii. can write concurrent programs with appropriate use of synchronization celebrate the programmer! a. Most systems courses are implementation centric: i. Building a compiler, os, database, microprocessor, etc.

2.

3.

4.

5.

6.

Cs107 is programmer centric i. Building you into a master prog ii. Your code will be more robust, efficient, portable, reliable. Understanding the system: everything between the programmer and the system. iii. Finds the hidden hacker within! iv. What does it mean to be a good systems programmer? past experiences a. not as friendly as cs106 there are fewer teacher folks b. interview skills + confidence c. this is a war! turns the boys into men puts the hair on your chest d. Educ psych: break it up, get students engaged and doing something rather than just taking notes shocking truths about comp systems a. comp arithmetic !=pure math i. signed number can wrap around ii. floating point doesnt have infinity precision. Thus, really big number + 1 + really negative number != really big + really negative + 1 because the precision will die. b. Working knowledge of assembly != pass c. Memory matters i. Try to do cheap memory stuff. ii. Cache versus reprocessing values. What are tradeoffs? d. Performance is more than Big-Oh. The constant terms do matter. Optimize for how memory works e. Von Neumann is dead. Long live von Neumann! i. Sequential fetch cycle paradigm. Computers have parallelism inherent. Chips can do multiple things in each cycle. Some chips have multiple cores or multiple processors. your to do list a. enroll with course website cs107.stanford.edu. join jourse and lab b. unix help sessions this week c. assign0 learning style survey d. do the reading before each lecture

b.

You might also like