You are on page 1of 47

Pablo and Autopilot: Performance Tuning in Distributed Computing Environments

Ruth Aydt Pablo Research Group Department of Computer Science University of Illinois at Urbana-Champaign http://www-pablo.cs.uiuc.edu
Pablo Research Group - Department of Computer Science - UIUC

Presentation Outline
Requirements for successful performance tuning Pablo toolkit components - how we got here Autopilot
Basic concepts Component interactions Fuzzy Logic decision infrastructure

Pablo-provided monitor/control programs


Autodriver Virtue

Case study of Parallel Rocket Simulation Code Current Work

Pablo Research Group - Department of Computer Science - UIUC

Requirements for Successful Performance Tuning in a Distributed Environment:


top to bottom

and end to end real-time performance data capture appropriate performance data detail and granularity just enough but not too much! tools to help correlate and interpret captured data dynamic policy selection in response to current resource availability and application demands
Pablo Research Group - Department of Computer Science - UIUC

Pablo Toolkit Components:


a Decade of Performance Monitoring and Analysis Tools

Pablo Research Group - Department of Computer Science - UIUC

Pablo Trace Library and Extensions

Libraries linked with application to trace generic events and also loops, message passing, procedure calls, Unix I/O, MPI I/O, HDF routines Standard function names (e.g. read) replaced with tracing version (e.g. traceREAD) by preprocessor for C codes. For Fortran, calls bracketed by traceReadBegin / traceReadEnd manually Timestamped event data written to buffer and flushed periodically to per-processor files
Pablo Research Group - Department of Computer Science - UIUC

Pablo I/O, MPI I/O, HDF Analysis



Produce reports from I/O event data Sample MPI-IO summary report shown:

Pablo Research Group - Department of Computer Science - UIUC

Pablo Self-Defining Data Format



A performance data metaformat that specifies both data record structures and data record instances Unlimited set of event types supported depending on the interesting performance data SDDF library provides classes to read and write files in SDDF format General-purpose tools can be written using the library and the Record/Field names in the SDDF files
Pablo Research Group - Department of Computer Science - UIUC

Sample SDDF File showing Data Structure and Data Instance


SDDFA #337: // "description" "IO Read" "Read" { // "Seconds" "Timestamp" double "Seconds"; // "Event ID" "Corresponding event" // "700009" "read" // "700011" "fread" int "Event Identifier"; // "Node" "Processor number" int "Processor Number"; // "Duration" "Event duration in seconds" double "Duration"; // "File ID" "Unique file identifier" int "File ID"; // "Number Bytes" "Number of bytes read" int "Number Bytes"; };; "Read" { 0.019991, 700011, 0, 0.000203, 3, 3 };;
Pablo Research Group - Department of Computer Science - UIUC

SDDFStatistics Analysis Program for SDDF Files

Pablo Research Group - Department of Computer Science - UIUC

SvPablo

A graphical source code browser and performance capture/correlation tool Allows user to select loops and procedures to instrument in C, F77, F90 code. Automatic instrumentation for HPF via PGI performance interface. Collects performance data and later displays it relative to source code line Option for real-time data transmission via Autopilot tagged sensors (more later)
Pablo Research Group - Department of Computer Science - UIUC

SvPablo GUI

Pablo Research Group - Department of Computer Science - UIUC

Virtue

A collaborative virtual environment for direct software manipulation

Hierarchical graph representations that show


software structure, dynamics, and performance Manipulation tools for augmented interactions with the virtual environment Annotation tools for distributed, collaborative exploration and recording

Uses OpenGL and EVL CAVE library for 3-d effects in CAVE, ImmersaDesk, and desktop environments
Pablo Research Group - Department of Computer Science - UIUC

Autopilot :
Performance Tuning in Distributed Computing Environments

Pablo Research Group - Department of Computer Science - UIUC

Autopilot Toolkit
Provides a framework for the capture and analysis of real-time application and infrastructure data in a multi-threaded distributed environment Offers the ability to control volume of performance data through

Includes a control interface to allow steering of infrastructure policies and applications, either interactively or via automated decision procedures
Pablo Research Group - Department of Computer Science - UIUC

selective registration and property matching analysis and data reduction at point of collection constant, periodic, or on-demand transmission of data ability to dynamically enable/disable data collection

Basic Autopilot Concepts

Sensors: provide data to remote processes, allowing real-time monitoring

intrinsic (procedural - push) extrinsic (threaded - push) transfer data when requested by remote process (pull)

Sensor Attached Functions: transform sensed data via user-defined functions before it is recorded by the sensor, providing an important data-reduction technique
Pablo Research Group - Department of Computer Science - UIUC

Basic Autopilot Concepts

Actuators: provide remote processes the ability to invoke local functions or update data, allowing remote steering

synchronous (application controls when updates are made; requests may be held in pending buffer) asynchronous (updates are made when request received from external agent)

Properties: key-value pairs that are associated with and used to identify a sensor or actuator, allowing remote processes to be selective about the sensors and actuators they connect to
Pablo Research Group - Department of Computer Science - UIUC

Basic Autopilot Concepts

Sensor Client: a process that connects to one or more sensors with matching properties and receives data from those sensors Actuator Client: a process that connects to one or more actuators with matching properties and sends data to those actuators, causing application variables controlled by the actuators to be updated or functions to be invoked
Pablo Research Group - Department of Computer Science - UIUC

Basic Autopilot Concepts

Autopilot Manager: a daemon process that is responsible for handling registration requests from sensors and actuators, and matching sensor client and actuator client requests to registered sensors and actuators.

* AutopilotManager daemons may be run on multiple

hosts throughout the computational grid, allowing sensors, actuators, and clients to tailor data transfer volumes to appropriate levels for local and distant tasks.
Pablo Research Group - Department of Computer Science - UIUC

Tagged Sensors, Actuators, Clients

Information about the structure of the data is forwarded when a client first connects to a matching sensor or actuator, allowing the client to perform verification checks and ignore unwanted data. Tagged data sets map naturally into what we normally think of as event trace records. Sometimes called SDDF-enabled because the buffer contents can easily be translated to SDDF

Pablo Research Group - Department of Computer Science - UIUC

Autopilot and Nexus/Globus



Autopilot uses the Nexus component of the Globus toolkit (http://www-globus.org) to provide...

Nexus creates a global address space that encompasses all processes executing on a distributed network Nexus Remote Service Requests (RSRs) used by Autopilot classes to transmit messages, insuring optimal underlying transfer protocol Nexus multi-threaded handlers used by Autopilot classes to process RSRs Most Nexus details hidden by Autopilot classes
Pablo Research Group - Department of Computer Science - UIUC

communication substrate & multithreading capabilities

Autopilot Component Interactions


Autopilot Manager
1. sensors and actuators register with their properties 2. clients request matching sensors and actuators 3. global pointers returned for matches

Instrumented Task

4. sensor and actuator controls and actuator data

Monitor/Control Task

5. sensor data

Pablo Research Group - Department of Computer Science - UIUC

Instrumented Tasks

May contain multiple sensors and/or

actuators Many instrumented tasks may be active at any given time May register sensors and actuators with multiple Autopilot Managers running on different hosts

May be application code or


Instrumented Task

infrastructure resource monitor (lmon)

Pablo Research Group - Department of Computer Science - UIUC

May contain multiple sensor clients and/or

Monitor/Control Tasks

actuator clients Many monitor/control tasks may be active at any given time May query multiple Autopilot Managers running on different hosts May implement human in the loop (Autodriver, Virtue) or automated fuzzy logic decision server (PPFS II) Monitor/Control May be monitor only, Task writing collected data to a file or displaying it

Pablo Research Group - Department of Computer Science - UIUC

Fuzzy Logic Decision Infrastructure


Monitor/Control Task(s)

Knowledge Repository Fuzzy Logic Rule Base

Defuzzifier

Fuzzifier

Fuzzy Logic Decision Process

Sensors Sensors

System
Instrumented Task(s)

Actuators Actuators

Pablo Research Group - Department of Computer Science - UIUC

Outputs

Inputs

Sample Fuzzy Logic Rule Base for Temperature Control


rulebase FurnaceRules; // decide what to do based on roomtemp which falls into 3 ranges var roomtemp(0,100) { set trapez cold ( 0, 50, 0, 20 ); set trapez medium( 50, 70, 10, 10 ); set trapez hot ( 80, 100, 20, 0 ); };
1

roomtemp truth values

cold medium hot

10

20

30

40

50

60

70

80

90

Pablo Research Group - Department of Computer Science - UIUC

100

Sample Fuzzy Logic Rule Base for Temperature Control (continued)


// control the furnace value in a range of 0-1, with 0 = off var furnace(0,1) { set triangle off ( 0, 0, 0.1 ); set triangle half( 0.5, 0.1, 0.1 ); set triangle full( 1, 0.1, 0 ); };

// if if if

the rules ( roomtemp == cold ) { furnace = full; } ( roomtemp == medium ) { furnace = half; } ( roomtemp == hot ) { furnace = off; }

Pablo Research Group - Department of Computer Science - UIUC

Fuzzy Logic Decision Infrastructure


Autopilot sensors provide a stream of room temperature readings. After fuzzification, this stream defines the value of the roomtemp fuzzy variable. Rules whose conditions are non-zero all contribute to determining the value of the output fuzzy variable furnace. After defuzzification, the value of furnace defines the action taken by the Autopilot actuator. Fuzzy logic handles noisy data and conflicting goals. Fuzzy logic separates data sets (definition of fuzzy variables) and rules (assertions and consequents) allowing each to be independently adjusted for a particular computing environment without re-coding the decision procedure.
Pablo Research Group - Department of Computer Science - UIUC

Autodriver Monitor and Control Architecture


Autodriver Java GUI Java Remote Method Invocation

Autodriver Autopilot Adapter Task

Instrumented Task

Autopilot Manager

Pablo Research Group - Department of Computer Science - UIUC

Autodriver Startup

User specifies hosts for Autopilot Manager and, if remote, Adapter Main window displays currently registered sensors and actuators User selects sensors and/or actuators they are interested in
Pablo Research Group - Department of Computer Science - UIUC

Autodriver Field Selection


When a tagged sensor is selected, a new window showing the list of fields in that sensor are displayed

The user selects the field(s) they want to view

Pablo Research Group - Department of Computer Science - UIUC

Autodriver Numeric Display


Data can be displayed as numeric values The user can choose to save the data values to a file for later analysis

Pablo Research Group - Department of Computer Science - UIUC

Autodriver Plot Display


Using ptplot package from Berkeley, values can be plotted as connected or unconnected points

Multiple fields can be plotted to a single window User can control number of points to display in window and zoom in on area of graph
Pablo Research Group - Department of Computer Science - UIUC

Autodriver Actuator Interaction


User may enter value for selected actuator and transmit it to the remote process Interface may be customized for nonnumeric data entry such as pull-down menu choice of LRU or MRU for actuator controlling cache replacement policy
Pablo Research Group - Department of Computer Science - UIUC

Virtue Monitor and Control Architecture

Virtue

Tagged Sensor data

Actuator controls

Instrumented Task

Autopilot Manager

Pablo Research Group - Department of Computer Science - UIUC

Virtue Display and Control


Each sphere in the ring represents a workstation

lmon collects processor utilization data and makes it available via sensors
Virtue maps the data to the display Data transmission frequency can be adjusted via slider connected to lmon actuator
Pablo Research Group - Department of Computer Science - UIUC

Case study: Rocket Simulation Code


Code developed by DOE ASCI Center for Simulation of Advanced Rockets (CSAR) at UIUC 40,000 lines of Fortran, MPI for communication between processes, runs on SGI Origin 200+ hours on 128 PEs to simulate 1/2 second of burn Ultimately want to model 2 minutes for complete booster burn-off
* Saves Date * Advances Time Step

Init

Fluids Code (10 fluid iterations) Interpolation Solids Code Do 3:1 Multigrid Solution for Each of the Meshes

* Could Modify Iterations with Actuator

* 3 for coarse grain mesh; 1 for fine grain

Convergence Test

Y
Output

* Check Against a Residual * Best Case, Converge on First Try

Pablo Research Group - Department of Computer Science - UIUC

Execution Environment
Running on SGI Octane and Immersadesk in Pablo group Virtue
Running on systems around the country Lmons on Lmons on systems across lmon gathering systems across the country network data the country

CSAR code instrumented via SvPablo Autopilot Manager Running on SGI Origin at NCSA Running on SPARC in Pablo group
Pablo Research Group - Department of Computer Science - UIUC

Wide-area Network Performance Data


Network latency statistics gathered via modified traceroute and made available via Autopilot sensors
Edge color represents latency -- warm colors for high latency Cutting plane shows max value of intersected edges
Pablo Research Group - Department of Computer Science - UIUC

Time Tunnel in Display Hierarchy


Time tunnel is second level in Virtue display hierarchy, showing application behavior on a single parallel system Notice long delays for some MPI allreduce calls (shown in white)

Pablo Research Group - Department of Computer Science - UIUC

Application Phases and Communication Patterns

Pablo Research Group - Department of Computer Science - UIUC

View from inside Time Tunnel


User can fly around within the virtual environment to get different views MPI profiling wrappers provide MPI call information via Autopilot Sensors SvPablo provides code region information via Autopilot Sensors
Pablo Research Group - Department of Computer Science - UIUC

Call Graph in Display Hierarchy


For each processor in the time tunnel, you can drill-down to the procedure call graph
SvPablo provides call graph layout and dynamic updates via Autopilot sensors

Pablo Research Group - Department of Computer Science - UIUC

Call Graph Close-Up View


Color mapped to inclusive procedure execution time Size mapped to number of times procedure called Magic lens exposes the procedure names

Pablo Research Group - Department of Computer Science - UIUC

Source Code Text Billboard


The user can select a procedure in the call graph display and drill-down to the final level, which is the source code for the procedure

Pablo Research Group - Department of Computer Science - UIUC

Current Efforts

SvPablo: version with output via Autopilot sensors generally available Virtue: new displays and controls for interacting with Autopilot sensors and actuators Autodriver: integrated event definition, recognition, adaptation, and notification Trace Library and Extensions: rework to use Autopilot as infrastructure, providing automatic instrumentation of I/O, MPI I/O, and HDF calls with corresponding well-defined sensor data structures
Pablo Research Group - Department of Computer Science - UIUC

Current Efforts

Integrate sensors and actuators into Globus infrastructure Provide translators from

Continue to explore analysis, visualization, and control techniques in dynamic, distributed environments
Pablo Research Group - Department of Computer Science - UIUC

(appropriate) tagged sensor data to NetLogger format Netlogger format to SDDF SDDF to XML XML to SDDF

Pablo Group Participants


Professor Dan Reed, Pablo Project Director
Randy Ribler* Huseyin Simitci Jim Oly Nancy Tran Guoyi Wang Don Schmidt Jeff Vetter* Luiz DeRose* Ying Zhang Mario Pantano* Eric Shaffer Shannon Whitmore Ben Schaeffer Dan Wells Deb Israel and lots of others who have been part of the Pablo group over the years

* postdocs previously with the Pablo group

Pablo Research Group - Department of Computer Science - UIUC

You might also like