Professional Documents
Culture Documents
Edgar Gabriel
Overflow handling
generate an overflow signal after every threshold events are counted
each counter has to be registered separately the value of each registered hardware counter is maintained separately (LONG_)LONG_MAX: 32 bit: 2,147,483,647 64 bit: 9,223,372,036,854,775,807
overflow_vector: a bit-array that can be processed to determined which event(s) caused the overflow
e.g. using PAPI_get_overflow_event_index()
1st Assignment
Rules
Documentation (.pdf, .doc, .tex or .txt file) Deliver electronically to gabriel@cs.uh.edu Expected by Friday, March 9, 11.59pm In case of questions: ask the TAs first, if he doesnt know the answer, he will ask me. Ask early, not the day before the submission is due
Part 1: Instrument the code in order to use hardware performance counters to determine the behavior of the trivial and of the blocked implementation for different block sizes Goal is to be able to see how the counter values change with the block size You will have to provide measurements for matrixes of size 512 and 1024 Note, that for development purposes you can run the code of course with much smaller matrices, e.g. 64
The hardware performance counters should be based on the PAPI library, and you could monitor the following values: L1 and Level 2 Cache misses and/or Cache miss rate Translation look aside buffer misses stall cycles waiting for various events conditional branch instructions mispredicted Whether you can access these values will depend on the processor you are really using! You will have to add code to handle counter overflow or convince me otherwise that overflow does not occur. If you just ignore this item, you will loose points.
COSC 6385 Computer Architecture Edgar Gabriel
Part 2: Run the modified code on the shark cluster. Generate graphs for 3-5 PAPI hardware counters showing the values for each block size identified in Part 1 separately for both matrix sizes of 512 and 1024. Comment on your findings on how the parameter values change with the block sizes for each matrix size Make sure you run your tests multiple times, and document how often you run it, whether you show average, minimum, maximum etc. Please document (you can use PAPI to determine many of these things!) : Processor type, frequency Operating System (as precisely as possible) Cache hierarchies and sizes
COSC 6385 Computer Architecture Edgar Gabriel
Notes
The PAPI version installed on shark is 4.2.0 On the front-end node you can find tons tons of examples in C and Fortran on how to use PAPI in /opt/papi/4.2.0/share/examples/ctests. E.g. all_events.c -> how to check on a processor whether a counter is available low-level.c -> how to use the low-level API of PAPI memory.c -> how to extract information of the memory subsystem (e.g. cache sizes) overflow_index.c -> how to handle overflow correctly
1st Assignment
The Documentation should contain
(Brief) Problem description Solution strategy Results section Description of resources used Description of measurements performed Results (graphs + findings)
1st Assignment
The document should not contain
Replication of the entire source code thats why you have to deliver the sources Description of your laptop, ssh implementation used etc. Only items contributing towards the results matter! Screen shots of every single measurement you made Actually, no screen shots at all. The slurm output files
shark08
Notes
PAPI Documentation: http://icl.cs.utk.edu/projects/papi/wiki/Main_Page If you need hints on how to use a UNIX/Linux machine through ssh:
http://www.cs.uh.edu/~gabriel/courses/cosc4397_s06/ParCo_08_IntroductionUNIX.pdf