Proceedings of The Third Large Scale Complex IT Systems Postgraduate Workshop

Large Scale Complex IT Systems
LSCITS
Proceedings of the Third Large Scale Complex IT Systems Postgraduate Workshop
Goodenough Club, London, UK 31st October 2011
Editor: Jason S. Reich Co-chairs: Yasmin Raq Jason S. Reich
Copyright 2011 by the authors.
Preface
The Large Scale Complex IT Systems (LSCITS) initiative was established to improve existing technical approaches to complex systems engineering and to develop new socio-technical approaches that help us understand the complex interactions between organisations, processes and systems. We hold an annual postgraduate workshop to collect experience and results from the community of LSCITS researchers, both within the initiative and beyond. The topics we seek to cover include: complexity in organisations, socio-technical systems engineering, high-integrity systems engineering, predictable software systems, novel computational methods, and mathematical foundations of complexity.
This year, we are fortunate to present a programme that touches the full range of LSCITS interests. Researchers based in both academia and industry are participating in the talk and poster presentation strands, a testament to the reach and activity of the eld. The chairs would like to express their gratitude to Professor Dave Cliff for his overall direction and advice, to Mrs Clare Williams for providing organising our wonderful venue and accommodation and to Professor Richard Paige for helping with the selection of an invited speaker. Finally, we would like to thank the invited speaker and participants for their support of the workshop and the programme committee for providing invaluable feedback to developing researchers. Yasmin Raq and Jason S. Reich Co-chairs, LSCITS PGR Workshop 2011 October 2011
Programme Committee
Leon Atkins (Bristol) Alex Fargus (York, Cybula) David Greenwood (St. Andrews) Ken Johnson (Aston) Christopher Poskitt (York) Yasmin Raq (Aston) Asia Ramzan (Aston) Jason S. Reich (York) Owen Rogers (Bristol) James Williams (York)
Contents
Talk Abstracts
1 2 5
Invited talk: Managing Complexity in Large Scale Organizations . . . . . . Simon Smith CloudMonitor: Software Energy Monitoring for Cloud . . . . . . . . . . . . . . . . . . James Smith Socio-technical Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthew Hudson Multiple Time Series Analysis Through Dimensionality Reduction and Subspace Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chris Musselle Making Smart Devices more Reliable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lu Feng Lazy generation of canonical programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jason S. Reich, Matthew Naylor and Colin Runciman Optimising the Speed Limit: Improving Speed Scaling by Limiting Maximum Processor Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leon Atkins
8 10 12
14
II
Poster Abstracts
19 20
Securing ofine integrity and provenance of virtual machines . . . . . . . . . . . Mustafa Aydin Sleep Stage Classication in Rodents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alex Fargus Cloud Objects: Programming the Cloud with Data-Parallel, Multicast Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julian Friedman Control-theoretical approach for workow management . . . . . . . . . . . . . . . . Hashem Ali Ghazzawi Socio-technical Analysis of Enterprise-scale Systems . . . . . . . . . . . . . . . . . . . David Greenwood and Ian Sommerville
21 22 24
VI
The Cloud Adoption Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Khajeh-Hosseini Stack usage analysis for large complex embedded real-time systems . . . . William Lunniss Examining the Relative Agreement model with Clustered Populations . . Michael Meadows Techniques for Learning the Parameters of QoS Engineering Models . . . . Yasmin Raq Dynamic Energy Footprinting of Online Media Services . . . . . . . . . . . . . . . . Daniel Schien, Chris Preist, Paul Shabajee and Mike Yearworth Continuous dynamics of automated algorithmic trading . . . . . . . . . . . . . . . . Charlotte Szostek
25 26 27 28 29 30
Part I
Talk Abstracts
Invited talk: Managing Complexity in Large Scale Organizations

Simon Smith
MooD International simon.smith@moodinternational.com
Biography
Simon sets the technology strategy at MooD International. He works with MooD Internationals customers and partners to progress MooD Internationals vision for how architecture-based technologies will be exploited in the future, and manages the companys software delivery organization to realize this strategy. Prior to joining MooD International Simon lectured in software engineering at the University of Durham, UK, managing research and development in corporate knowledge and evolutionary software architectures. This work was funded equally by government science councils and major UK industrial partners. Simon began his career as a Research Associate at the University of York, UK where, in conjunction with researchers from Logica, he developed his interests in applying enterprise architecture methods and formal logics to achieving better business outcomes.
CloudMonitor: Software Energy Monitoring for Cloud

James Smith
School of Computer Science, University of St Andrews james.w.smith@st-andrews.ac.uk
Cloud Computing and Energy
Cloud Computing has the potential to have a massive impact on the future carbon footprint of the ICT sector. It may lead to an increasing demand for energy to power large datacentres or those very datacentres may shift the location of traditional computer resources from relatively inefcient local machine rooms to the extremely efcient plants of computation that are now being developed [1]. Facebook, Inc. announced in April 2011 that their new data center in Oregon, USA has a power Usage Effectiveness (PUE) rating of 1.07 [2], meaning that 1 in every 1.07 Watts of electrical power entering their plant goes directly to powering servers doing useful work. The margins for improving power consumption by decreasing the amount spent on auxiliary work and other mechanical areas (such as cooling) are increasingly slim. The largest remaining potential for reducing power costs is increas-ing the efciency of the IT systems themselves.
Machine Utilisation
The key in the eld of IT energy efciency is that most systems today do not gracefully adjust their power consumption according to the amount of useful work they are doing. An ideal scenario would be that computers consume an amount of power directly corresponding to their current level of utilization. Far more commonly, current IT systems consume a disproportionate amount of power compared to their level of utilization, particularly at low workloads. For example, at a 10% level of utilization a typical blade server will consume 50% of its peak power. So after some initial threshold, power consumption is directly effected by workload It is therefore possible to reduce energy consumption by introducing virtualisation to increase physical machine utilisation levels and reduce the number of devices required.The energy required to run a particular Virtual Machine is dictated by the activity of the application and the resources it consumes (e.g. CPU, Disk, Memory, Network) [3].
Private Cloud
At present there are a number of barriers to creating an energy efcient workload scheduler for a Private Cloud based data center. Firstly, the relationship between different workloads and power consumption must be investigated. Secondly, current hardware-based solutions to providing energy usage statistics are unsuitable in warehouse scale data centers where low cost and scalability are desirable properties. Recent work at the Cloud Computing Co-Laboratory in St Andrews (StACC) to ascertain the connection between workload and power consumed by computing devices has uncovered a noticeable, yet predictable, difference in energy consumption when servers are given tasks that dominate various resources (CPU, Memory, Hard Disk and Network). This insight lead to development of CloudMonitor, a software utility that is capable of greater than 95% accurate power predictions from monitoring resource consumption of workloads, after a training phase in which a dynamic power model is calculated. More discussion of CloudMonitor beyond that presented here, including a statistical breakdown of its power model, can be found in [4].
Informing Workload Schedulers
CloudMonitor reads the resource (CPU, Memory, Hard Disk and Network) usage levels of the system it is monitoring in real time. In the initial training phase, these are correlated with power usage levels from an attached Power Distribution Unit (PDU). Linear regression analysis is performed to develop a power model, after which the PDU can be disconnected and the software will be able to use this model to produce Energy utilisation values based solely on the computational resources consumed. CloudMonitor can then be deployed across all machines of the same conguration, proving a low-cost, scalable, cross-platform solution for monitoring energy consumption and computational performance in Cloud Computing systems [5]. Evaluating the work done and energy used of a VM may lead to the possibility of producing a mix of work, a collection of running VMs, which is optimal for a given physical machine or Private cloud system. This opens the door to the possibility of scheduling workload mixes for optimal performance or for energy efciency. If VMs are consistently run executing specic workloads, then each of those tasks can be proled, allowing great accuracy in the scheduling decisions for each subsequent deployment. Additional research is now required to investigate optimal mixes of workload types, with the view to scheduling in such a manner.
Bibliography
[1] Environmental Protection Agency, Report to Congress on Server and Data Centre Energy Efciency. Public Law, Vol 109, p109-431, 2007. [2] Facebook Inc, Faceboook Launchs Open Computer Project; Press Release. https://www.facebook.com/press/releases.php?p=214173, 2011. [3] A. Kansal, F. Zhao and A.A. Bhattacharya, Virtual Machine Power Metering and Provisioning. ACM Symposium on Cloud Computing (SOCC), 2010. [4] J.W. Smith and Ian Sommerville, Workload Classication and Software Energy Measurement for Efcient Scheduling on Private Cloud Systems, http://arxiv.org/pdf/1105.2584, 2011. [5] Bohra H, Ata E, Chaundhary, V VMeter: power modelling for virtualized clouds, IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW) 2010. [6] Shari, Mohsen and Salimi, Hadi and Najafzadeh, Mahsa, Power-efcient distributed scheduling of virtual machines using workload-aware consolidation techniques, The Journal of Supercomputing, Springer Netherlands,
Socio-technical Simulations
Matthew Hudson
Department of Computer Science The University of York hudson@cs.york.ac.uk
Introduction
This abstract introduces the research topics which are central to the industry based work which I will carry out during the course of the EngD. The leading motivation of this research is to investigate immersion, learning and social presence in Virtual Environments (VEs), specically Live Virtual Constructed (LVC) environments. LVC simulations are used in military training and involve real pilots in real planes interacting with real pilots ying in simulators, virtually constructed enemies, and so on. Simply put an LVC is a socio-technical system used for multi-user training, a virtual environment populated by constructed entities and accessed by real humans via virtual or augmented reality technology from a variety of geographical locations. The purpose of the research is to establish the importance of various features and phenomena within the LVC simulations, nding a balance between the technology and human elements so that effective transfer of skills, knowledge and competency is achieved by all trainees involved.
Background
Much of the current literature on simulation focuses on the notion of physical delity and physical presence (the feeling of physically being in the VE), however there are other more subtle issues in simulation training which are rarely addressed yet are potentially more important to the efcient and effective training of personnel. The research which I will conduct over the next 3 years will be focused upon the social presence (social connections within the VE) and psychological delity (the conceptual realism), and how sociotechnical systems such as LVC simulations might evolve to encourage these cognitive phenomena. In essence it is the using of the technology for teaching a potential pilot not simply how to y, but how to ght. It is often the case that the advancement of computational technology drives advances in simulation [6][7], and that these advances go on to inuence training [5], however this research aims to show that the human mind
Supervisor: Paul Cairns, Department of Computer Science, The University of York. Industrial Supervisor: Bruce Daughtrey, BAE Systems.
is at the center of any training exercise and thus the human mind should be the driving force for future advances in simulation training. Technology is not at a point where we are able to create high quality simulations which create real physiological responses [1][3][9] even when using relatively simple simulators. However, while these may give the trainee physical realism, in reality ying the plane is only half the battle, learning to ght in a pragmatic and strategic way is critical for success. This heightened state of knowledge, skill and experience is known as competency. This research hopes to aid in training pilots in this competency by discovering ways to create psychological delity within LVC simulations, exploring how to manipulate the socio-technical systems to create realism in the social aspects of the environment. To accomplish this it will be important to clarify and explore key notions in simulations and virtual worlds. Issues such as the difference between presence and telepistemic access[2] (remote viewing), the difference between a virtual space and a virtual place [8], and the debate over how the concept of mediationplays a role in presence and immersion into virtual worlds[4].
Current Work
The research conducted for this project will consist of a number of experimental, sociology and ethnography based studies. One methodology which will be used throughout the EngD will be the use of games as research instruments. Games are similar to simulations in many ways, differing mostly in their purpose, games can provide a useful experimental environment in which to test social and cognitive phenomena. The rst study planned using this methodology will attempt to capture the essence of social connections through socio-technical systems such as LVC simulations, and explore or physical and conceptual difference between users can affect social presence. In this experiment participants will play a cooperative version of Tetris, in a number of experimental conditioned designed to explore how people react to physical and conceptual distance between anonymous operators working on a collaborative task over a network.
Future Research
Future work in this study will investigate immersion and social presence in Virtual Environments (simulations, augmented reality environments and games).To work towards this the aims of the future research in this study will be to investigate cognitive and behavioural differences between actors working in real and virtual environments, explore the notions of immersion and social presence in the context of LVC environments, and to empirically investigate learning within virtual environments. To this end a number of studies have
been planned, experimental and sociological studies of attitudes towards collaborative action in virtual environments, and investigating effective methodologies for assessing cognitive and behavioural differences between virtuality and reality.
Bibliography
[1] N. Dahlstrom and S. Nahlinder. Mental workload in aircraft and simulator during basic civil aviation training. The International Journal of Aviation Psychology, 19(4), 2009. [2] L. Floridi. The philosophy of presence: From epistemic failure to successful observability. PRESENCE: Teleoperators and Virtual Environments. Special Issue on Legal, Ethical, and Policy Issues Associated with Wearable Computers, Virtual Environments, and Computer Mediated Reality., 2005. [3] S. Magnusson. Similarities and differences in psychophysiological reactions between simulated and real air-to-ground missions. The International Journal of Aviation Psychology, 12(1), 2002. [4] G. Mantovani and G. Riva. Real presence: How different ontologies generate different criteria for presence, telepresence, and virtual presence. Presence-Teleoperators and Virtual Environments 8, no. 5, 1999. [5] N.J. Maran and R.J. Glavin. Low- to high-delity simulation a continuum of medical education? Medical Education 37 (Suppl. 1):2228, 2003. [6] J. McHale. High-delity COTS technology drives ight simulation. [Online] Retrieved: 09/10/11, Available: http://tinyurl.com/6crojdz, 2009. [7] W. Pang, J. Qin, Y. Chui, and P. Heng. Fast prototyping of virtual reality based surgical simulators with physx-enabled gpu. Transactions on Edutainment IV LNCS, Volume 6250, 176-188, 2010. [8] A. Spagnolli and L. Gamberini. A place for presence. understanding the human involvement in mediated interactive environments. PsychNology Journal, 3 (1), 2005. [9] J.A. Veltman. A comparative study of psychophysiological reactions during simulator and real ight. The International Journal of Aviation Psychology, 12(1), 2002.
Multiple Time Series Analysis Through Dimensionality Reduction and Subspace Tracking
Chris Musselle
University of Bristol
Multiple data stream analysis has received increasing attention in recent years as many applications are now able to generate large quantities of continuous streaming data in parallel. Such applications include computer network analysis, sensor network monitoring, moving object tracking, nancial data analysis, and scientic data processing among many others. In LSCITS, a major application area is in monitoring large scale data centres, where multiple trafc volumes, CPU temperatures, and power usages are continually producing data. It is desirable to carry out monitoring, prediction and anomaly detection across these data streams, however their properties present their own set of problems to overcome. As data is continuously being generated at a high rate, and any delay in processing must be minimised, traditional batch machine learning approaches are typically not sufcient for the task, as mining the data off-line is simply not an option. Any online testing will also require substantial training with no guarantee the modelled normal of anomalous behaviour remains constant. This can be very problematic in cases where the data streams drift over time. Another approach is to use techniques focused on using fast online/incremental methods, which utilise a sliding window approach or sequential updating, so as to operate on a single pass over the data as it is generated. Such algorithms are typically a lot faster, and so can provide prompt feedback, though at the expense of some accuracy. My PhD looks at synthesising a novel algorithm to perform anomaly detection in multiple data streams. Here, an anomaly is dened as a sudden change point in the data e.g. a brief spike in trafc patterns, sudden shift in temperature, or drop in power usage. To do this I will be using a technique from signal processing known as subspace tracking [2] used in adaptive lters. Subspace trackers are pre-processing algorithms which recursively estimate the principle subspace of the approximated covariance matrix of the input data. In a way they are similar to conducting Principle Component Analysis (PCA), picking up the dominant subspace that accounts for the greatest amount of data variation. However one main difference between the two is the iterative nature of subspace trackers, compared to the batch methods of conducting PCA. The plan is then to use the subspace tracker in [2] to monitor for changes in variation across the data streams. The assumption being that a change in
variation across multiple data streams could signal the presence of an anomaly or change point in the data. In this talk I will be presenting a recent algorithm that aims to employ this idea in an adaptive fashion (the Fast Rank Adaptive Householder Subspace Tracker or FRAHST algorithm [1]). I will then present my own empirical evaluations of the algorithm on synthetically generated data sets designed to assess FRAHSTs input and parameter sensitivity.
Bibliography
[1] P.H. Pedro Henriques dos Santos Teixeira and Ruy Luiz R.L. Milidiu. Data stream anomaly detection through principal subspace tracking. Proceedings of the 2010 ACM Symposium on Applied Computing, pages 16091616, September 2010. [2] Peter Strobach. The fast recursive row-Householder subspace tracking algorithm. Signal Process., 89(12):25142528, December 2009.
10
Making Smart Devices more Reliable

Lu Feng
Department of Computer Science, University of Oxford, Parks Road, Oxford, OX1 3QD, UK lu.feng@cs.ox.ac.uk
We are entering an era of ubiquitous computing [4], in which smart devices are witnessed everywhere in our everyday life, from smart phones to pollution sensors, from personal mobile ofce to intelligent household devices, and from invisible embedded computing miniatures to wireless human body health monitoring system. Our reliance on the functioning of smart devices is growing rapidly; however, the breakdown of these devices may cause catastrophic damages. For example, a single faulty piece of embedded software on a network card brought the entire system of the LA Airport to a halt in 2007, causing over 17,000 planes grounded for the duration of the outage; and the Mars Polar Lander failure in 1999 due to a software aw cost about $154 million. So, how can we make smart devices more reliable? I think (probabilistic) model checking offers a good solution. For a given system, model checking is capable of analysing every possible execution path of a computer program and detecting any potential errors. Model checking provides more rigorous results than software testing, and it has been used in industry, e.g. by Microsoft to verify the Windows device drivers and by Coverity to detect aws in the HTC Incredible Smartphone Software. Meanwhile, probabilistic model checking targets more complex programs, and could be used to analyse quantitative, dependability properties, e.g., what is the expected time to transfer data between two wireless devices? what is the probability of device failure in the rst hour of operation? My research aims to improve the scalability of probabilistic model checking for realworld applications, by developing automated compositional verication techniques. The key idea is to use the assume-guarantee reasoning, in which a large program is decomposed into several small components and individual components are veried under assumptions about their environment. I have contributed to the automated assumptions generation in particular, developing novel algorithmic learning techniques. I have also implemented a prototype and successfully applied it to several case studies, such as the ight control software for JPLs Mars Exploration Rovers, to help avoid problems mentioned above. For more details please refer to [2], [3] and [1]. I am very optimistic about the future of my research, and believe that it could be used to improve the reliability of smart devices everywhere!
11
Acknowledgments. The author is sponsored by the Predictable Software Systems (PSS) project of the UK Large-Scale Complex IT Systems (LSCITS) Initiative (EPSRC grant EP/F001096/1).
Bibliography
[1] Feng, L., Han, T., Kwiatkowska, M., Parker, D.: Learning-based compositional verication for synchronous probabilistic systems. In: Proc. 9th International Symposium on Automated Technology for Verication and Analysis (ATVA11). LNCS, vol. 6996, pp. 511521. Springer (2011), to appear [2] Feng, L., Kwiatkowska, M., Parker, D.: Compositional verication of probabilistic systems using learning. In: Proc. 7th International Conference on Quantitative Evaluation of SysTems (QEST10). pp. 133142. IEEE CS Press (2010) [3] Feng, L., Kwiatkowska, M., Parker, D.: Automated learning of probabilistic assumptions for compositional reasoning. In: Giannakopoulou, D., Orejas, F. (eds.) Proc. 14th International Conference on Fundamental Approaches to Software Engineering (FASE11). LNCS, vol. 6603, pp. 217. Springer (2011) [4] Kwiatkowska, M., Rodden, T., Sassone, V. (eds.): Proc. From computers to ubiquitous computing by 2020. Royal Society (November 2008)
12
Lazy generation of canonical programs

Jason S. Reich, Matthew Naylor and Colin Runciman
Department of Computer Science, University of York, UK {jason,mfn,colin}@cs.york.ac.uk
Testing, when used effectively, can reveal a wide variety of programming errors. For the time invested in the implementation of testing, it can return a large improvement in condence of software correctness. For example, a suite of programs is often used for verifying a compilers correct behaviour. Property-based testing [1] is a lightweight but often highly effective form of verication. Under this scheme, properties are dened in the host language as functions returning a Boolean value. A property-based testing library then instantiates the arguments of these functions, searching for negative results. Unfortunately, it relies critically on the method used to generate test cases. Compilers and related tools are frequently used to manage the complexity of software development. Any errors in the implementation of these tools could propagate bugs through to output executables. Therefore, one could verify the properties of compilers and minimise these risks, if a generator for test source-programs can be dened. Writing an effective program generator for property-based testing is typically challenging because (1) the conditions for a valid program are quite complex, (2) a large number of programs are produced, and (3) many of these programs are only distinct for insignicant reasons. The data generation technique available in Lazy SmallCheck [2] offers a way to overcome these challenges. Lazy SmallCheck supports the lazy evaluation of constraining conditions on partially-dened test data. A single test for which a condition fails may allow a large family of cases to be pruned from the search space. A test for which a condition is undened (because of the partiality of data) drives a renement of the test data at just the place where it is needed. We describe experiments generating functional programs in a core rstorder language with algebraic data types. Candidate programs are generated freely over a syntactic representation with positional names indexed by lazy natural numbers. Static conditions for program validity are dened quite separately. So too are conditions for a program to be a canonical representative of a large equivalence class. The goal is an exhaustive enumeration of all valid, canonically representative programs up to some small but useful size.
Acknowledgements This research was supported, in part, by the EPSRC through the Large-Scale Complex IT Systems project, EP/F001096/1.
13
Bibliography
[1] Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of haskell programs. In Proceedings of the fth ACM SIGPLAN International Conference on Functional programming, ICFP 00, pages 268279. ACM, 2000. [2] Colin Runciman, Matthew Naylor, and Fredrik Lindblad. SmallCheck and Lazy SmallCheck: automatic exhaustive testing for small values. In Proceedings of the rst ACM SIGPLAN symposium on Haskell, Haskell 08, pages 3748. ACM, 2008.
14
Optimising the Speed Limit: Improving Speed Scaling by Limiting Maximum Processor Speed
Leon Atkins1
Department of Computer Science, University of Bristol
In 1995, the groundbreaking paper of Yao et al. [1] introduced the idea of speed scaling a processor in order to save energy. The concept was extended in [7] to the online ow time model we introduce here. In this model, a processor consumes power at a rate of P(s) = s where > 1 is a xed constant, usually around 2 or 3 for current generation CMOS based processors[2]. Each job i is released at time ri , and has an associated processing requirement pi . The processor runs at a speed s between 0 and , where the processor is able to process s units of work in one unit of time. The goal is to select which job to work on from the set of released but uncompleted jobs and at what speed to run. As such, any algorithm will have two components: a job selection policy and a speed scaling policy. The overall goal of the algorithm is to minimise the total ow time plus energy used, where the ow time of a job Fi is the time between the job being released (ri ) and being completed (ci ). Energy usage is dened as the integral of power with respect to time. The problem of minimising a linear combination of total ow plus energy can be viewed in an economic perspective as how much energy (and by extension, money), a user is willing to spend for an improvement of one unit of ow time. The standard measure for success or failure of an online algorithm is competitive analysis, where the performance of the online algorithm is measured against an adversary who decides the job sizes and release schedule. The adversary is seen as choosing the release times and job sizes in order to exploit any weaknesses in the online algorithm, and because of this, competitive analysis can be viewed as a worst-case measure. In our work, we have created a simulator to explore real world performance of speed scaling algorithms. With the advent of cloud computing services such as Amazon EC2, the worst case of an algorithm is of less importance than having a good average case, and making good use of standard resources. By using our simulator we can examine many different quality of service measures of given algorithms when run on a real world dataset. In our testing, we have used data made publicly available which represents a job execution trace over a production Google cluster 1 . This dataset gives us the details of processes over a seven hour period on this cluster. We have also sampled the data to create a second, smaller dataset with the same job size distribution.
1
http://code.google.com/p/googleclusterdata/
15
The majority of previous work has focussed on optimising for the innite maximum speed case, where the maximum speed of the processor is unlimited, although some previous work on speed bounded processors can be found in [4]. Unbounded maximum speed processors are clearly unrealistic, and their use is a simplication to ease analysis. In our work we explore intelligently limiting the maximum processor speed in addition to speed scaling to improve the real world performance of speed scaling algorithms. Through simulation work on both the original Google dataset and our simulated version of the dataset, we nd that limiting the maximum speed of the processor can improve the performance of a speed scaling algorithm. Furthermore, when speed limited, the best speed scaling algorithm may change. Most interestingly, in many cases a good ow/energy trade-off can be found simply by limiting the maximum speed and running at this constant speed, without any speed scaling. This is of importance as changing the frequency and voltage in the real world has a time and energy penalty which is not accounted for in either the theoretical literature, or the simulator. Specically, we show that when a server is underloaded, processors should use a speed scaling algorithm that will increase in speed quickly when jobs arrive, but that the processor should be limited to a comparatively low maximum speed. When the processor is becoming more overloaded however, the processor should use a speed scaling algorithm that increases in speed more slowly when more jobs arrive but that the overall maximum speed should be raised. We show on our simulated Google data set, where we sampled jobs from the downloaded dataset, that the AJC speed scaler introduced in [5] was the best speed scaling algorithm in the unlimited case, but when we optimally limit the maximum speed that the faster LAPS speed scaler [6] was better. On the original Google dataset, where job arrivals are overloaded, we nd that the AJC* speed scaling algorithm[5] is the best speed scaler in both the limited and unlimited case, but that by limiting the maximum speed the total ow plus energy used can be improved signicantly. In our current work we are looking at creating a speed scaling algorithm using these ndings, speeding up and slowing down based on the rate of arrivals. We are also looking at nding a way to set the maximum processor speed to improve the overall ow plus energy used based on a knowledge of the arrival rate and job size distribution.
Bibliography
[1] F. Yao, A. Demers, and S. Shenker, A scheduling model for reduced CPU energy, Proceedings of the 36th Annual Symposium on Foundations of Computer Science (1995) pp. 374-382 [2] D. M. Brooks, P. Bose, S. E. Schuster, H. Jacobson, P. N. Kudva, A. Buyuktosunoglu, J.D. Wellman, V. Zyuban, M. Gupta, and P. W. Cook.
16
[3] [4]
[5]
[6]
[7]
Power-aware microarchitecture: Design and modeling challenges for nextgeneration microprocessors. IEEE Micro, 20(6):2644, 2000. N. Bansal, T. Kimbrel, and K. Pruhs. Speed scaling to manage energy and temperature. Journal of the ACM (2007) vol. 54 (1) N. Bansal, H.L. Chan, T.W. Lam, and L.K. Lee. Scheduling for speed bounded processors. Lecture Notes in Computer Science (2008) vol. 5125 pp. 409-420 T.W. Lam, L.K. Lee, I. To, and P. Wong. Speed scaling functions for ow time scheduling based on active job count. ESA 08: Proceedings of the 16th annual European symposium on Algorithms (2008) pp. 647-659 H.L Chan, J. Edmonds, T.W. Lam, L.K. Lee, A. Marchetti-Speccamela, and K. Pruhs. Nonclairvoyant speed scaling for ow and energy. Algorithmica pp. 1-11 S. Albers and H. Fujiwara. Energy-efcient algorithms for ow time minimization. ACM Transactions on Algorithms (TALG) (2007) vol. 3 (4) pp. 49
Part II
Poster Abstracts
19
Securing ofine integrity and provenance of virtual machines

Mustafa Aydin
University of York, BT mustafa.aydin@bt.com
Security is one of the main concerns of cloud computing users, with the isolation of virtual machines being a particular worry. Part of this uncertainty comes from the ability to suspended, resume, copy, migrate, and roll back to previous states of virtual machines. Each of these provides new ways to compromise and attack the integrity of virtual machines, and the data residing on them. With so many ways to attack virtual machines ofine, there is a worry about whether a resumed virtual machine can be trusted. Integrity measurements are able to provide uniquely identiable measurements of the state of a system by using trusted computing, a hardware chip kept seperate from the processor able to provide a unique digital signature which only it has access to. Signing integrity measurements using this technique can provide a higher level of trust when checking them at a later date. In addition, this technique could be used to provide a history of a virtual machine from its inception, providing a higher level of trust for those who require more knowledge about the each interaction a virtual machine has had. A further aim would be to provide this through a cloud broker, an intermediary who looks after the cloud services of a user on multiple cloud providers. This would enable the integrity measurements to be used even when migrating to different services. By incorporating these techniques, the ofine security of virtual machines can be made more secure and its users can be given better guarantees that their security needs are being met on and ofine.
20
Sleep Stage Classication in Rodents

Alex Fargus
University of York, Cybula Ltd. alex@cybula.com
Removed from published proceedings due to commercial sensitivity.
21
Cloud Objects: Programming the Cloud with Data-Parallel, Multicast Objects

Julian Friedman
University of York julz.friedman@uk.ibm.com
We describe our library, Cloud Objects, which allows Object-oriented programs written in Java to be easily parallelised using Map/Reduce, with the addition of simple, declarative annotations to the code. We show that using Cloud Objects we can rewrite many data-intensive programs in a simple, readable, object-oriented manner, while maintaining the performance and scalability of the Map/Reduce platform. The system has been validated using a set of sample applications.
22
Control-theoretical approach for workow management

Hashem Ali Ghazzawi
The University of York, Computer Science Department, Real-Time Systems Group hag@cs.york.ac.uk
Modern aerodynamic simulation techniques demand utilizing complex processes run on a range of computer resources. These complex processes include workows of ever-growing complexity. These workows represent simulation jobs, where each job contains several tasks. There are dependencies between workows, and/or between jobs, and/or between tasks. Identifying and incorporating the required computational power for these simulation jobs into a dynamic computer platform on top of some end-user requirements such as deadline/priority etc. forms a real-time and scheduling problem [1]. Literature suggests looking for dynamic approaches that guarantee end-to-end tasks while maintaining good utilisation of the available compute resources [4]. Traditional scheduling approaches have some answers but lack adaptivity to sudden changes in, for instance, workload and CPU/memory etc. The authors initial work is based on Lus admission control [2]. A PID controller is composed of Proportional, Integral and Derivative parts. The main objective of such controller is to minimise the system error in order to achieve the desired performance expressed as a function of time. In other words, it takes into consideration the current system error (the proportional part) along with the past error (the derivative part) and the rate of change (the integral part) [3]. There was the need to adopt multi-CPU platform as well as incorporating multiple optimisation objectives. Thus, model predictive (MPC) control was adopted for the current work. MPC features support for non-square non-linear multiinput multi-output (MIMO) systems. The controller predicts how much each system output deviates from its set-point (desired value) within the prediction horizon. It multiplies each deviation by the system outputs weight (i.e. worthiness in control computation), and computes the sum of squared deviations accordingly [3]. References 1. Hellerstein, J. L., Diao, Y., Parekh, S. and Tilbury, D. M. (2004). Feedback Control of Computing Systems. John Wiley & Sons. 2. Lu, C., Stankovic, J. A., Tao, G. and Son, S. H. (1999). Design and Evaluation of a Feedback Control EDF Scheduling Algorithm. The 20th IEEE RealTime Systems Symposium , 5667. 3. Ogata, K. (2001). Modern Control Engineering. 4th edition, Prentice Hall PTR, Upper Saddle River, NJ, USA.
23
4. Sha, L., Abdelzaher, T., rzn, K.-E., Cervin, A., Baker, T., Burns, A., Buttazzo, G., Caccamo, M., Lehoczky, J. and Mok, A. K. (2004). Real Time Scheduling Theory: A Historical Perspective. Real-Time Systems, 28, 101155.
24
Socio-technical Analysis of Enterprise-scale Systems

David Greenwood and Ian Sommerville
School of Computer Science, University of St Andrews, UK dsg22@st-andrews.ac.uk www.dgreenwood.com
Understanding socio-technical issues and risk is particularly important to managers and engineers of enterprise systems. This is because their careful management reduces the likelihood of a system failing to deliver its intended benets. The principal aim of my research is to develop approaches to sociotechnical analysis for enterprise systems. My thesis provides four key contributions. Firstly that the notion of responsibility provides a suitable abstraction for assessing the socio-technical dependability of coalitions-of-systems that is systems whose continuing operation are dependent upon a coalition of technical and organisational agents fullling obligations. The second contribution is that the notion of responsibility provides a suitable abstraction for representing and understanding system deployment and adoption situations where a human agents abilities to resist/conict with a change need to be taken into account. The third contribution is that responsibilities provide a suitable component for an abstraction for troubleshooting problematic enterprise-scale information systems. The fourth contribution is that responsibilities form part of a potentially highly scalable abstraction when used with techniques from social network analysis. These techniques aid an analyst by indicating elements that may be important or complex in situations where the number of nodes and their interconnections are too large for a human to be able to analyse in a timely manner.
25
The Cloud Adoption Toolkit

Ali Khajeh-Hosseini
University of St Andrews ak562@cs.st-andrews.ac.uk www.cs.st-andrews.ac.uk/akh
Enterprises are interested in using public clouds, especially infrastructureas-a-service (IaaS), because of its scalability, exibility and apparent cheapness. However, unlike start-ups that develop systems from scratch, enterprises often have to deal with so-called browneld development where new systems have to inter-operate with a range of existing systems. Decisions to migrate existing systems to public IaaS clouds can be complicated as evaluating the benets, risks and costs of using cloud computing is far from straightforward. Organisational and socio-technical factors must also be considered during the decision making process as the shift towards the cloud is likely to result in noticeable changes to how systems are developed and supported. The St Andrews Cloud Computing Co-laboratory is developing the Cloud Adoption Toolkit that aims to support decision makers using a collection of impartial tools: 1. 2. 3. 4. 5. Technology Suitability Analysis: is IaaS suitable for a given system? Benets and Risks Assessment: what are the risks and benets? Stakeholder Impact Analysis: how will it affect my organisation? Cost Modelling: how much will it cost? Service Pricing Analysis: how much should I charge for my computing services if I deploy them on the cloud?
We are also investigating the use of private clouds and how such clouds can be optimised for energy use.
26
Stack usage analysis for large complex embedded real-time systems

William Lunniss
University of York wlunniss@cs.york.ac.uk
As todays embedded real-time systems become increasing large and complex, it is becoming ever more difcult to manually verify that they function correctly. In addition to timing and code coverage, systems that wish to comply with DO-178B or ISO 26262 must verify their memory and stack usage. This poster presents a measurement based stack analysis tool that can analyse the stack usage of a system. Rather than using static analysis, it analyses the system running on the target hardware. The process to go from C, C++ or Ada code to a report is explained and an example report is shown. Limitations of the tool are noted and some further work to overcome these limitations are suggested.
27
Examining the Relative Agreement model with Clustered Populations

Michael Meadows
University of Bristol mm8030@bris.ac.uk
Following on from the corrections presented in (Meadows and Cliff 2011) of Deffuant et al.s Relative Agreement model of Opinion Dynamics (Deffuant et al. 2002; Deffuant 2006), a further examination is presented of agents interaction when the population is not fully connected. The population has been grown into a clustered arrangement using the Klemm-Eguluz algorithm (Klemm and Eguluz 2002) so that the specic effects of different population structures can be examined in isolation. It was found that while the underlying pattern of social conformity and convergences could be seen in many of the various experiments, there exist a number of parameters that can signicantly alter a populations susceptibility to external inuences. Examples of these variations are presented in this poster along with a discussion as to the causes behind the variances and their corresponding real-world situations with ways in which this research could be applied to various situations.
28
Techniques for Learning the Parameters of QoS Engineering Models

Yasmin Raq
Aston University rafiqy@aston.ac.uk
Computer systems are increasingly important in modern society. They are widely used in critical applications in areas ranging from banking and nance to healthcare and government. The likely consequence of failures in such applications vary from nancial loss to loss of human life. As a result, signicant research has focused on monitoring, modelling, analysing and verifying the compliance of computer systems and of their components with their functional and non-functional requirements. Many of the formalisms proposed by this research are targeted at the analysis of quality-of-service (QoS) properties of computer systems such as reliability, performance and resource usage. Nevertheless, the effectiveness of such analysis depends on the accuracy of the models used in the analysis. Building accurate formal models of computer systems is a great challenge for two reasons. First, the necessary knowledge may not be available until very late in the lifecycle of the system. Second, computer systems are increasingly used in applications characterised by continual change in workload, objectives and environment, so models are often rendered obsolete as the system evolves. Our research addresses this key challenge by developing a set of rigorous approaches for the online learning of the models used in the analysis of QoS properties of computer systems based on observations of their behaviour. In particular, the project proposes a broad range of new algorithms for learning and updating the parameters of several types of Markovian models and queueing networks used in QoS engineering for computer systems. This includes discrete-time Markov models (used to model reliability properties of computer systems), continuous-time Markov models and queueing networks(both used to model performance-related properties).
29
Dynamic Energy Footprinting of Online Media Services

Daniel Schien, Chris Preist, Paul Shabajee and Mike Yearworth
University of Bristol schien@cs.bris.ac.uk http://sympact.bris.ac.uk
Energy consumption of digital media services in the Internet has previously been assessed based on assumptions of average use patterns and power draw of devices. Existing estimations either vastly overestimate service power consumption or only consider certain subsystems. With increasing popularity of these services also grows the need for models which accurately assess the power consumption for their provision. We present a dynamic modelling approach based on real-time measurements at datacentre servers, the network infrastructure and the end user devices. For the servers we allocate power consumption based on customer demand. For the network we estimate power consumption based on the network distance in hops. For the end user devices we approximate power consumption based on the device class. We nd that each subsystem (server, network or consumer device) can have the most signicant power draw depending on the type of service, location in the network or consumer device. Our method can serve to corroborate previous results and in the future design of more energy efcient digital services. We recommend better reporting of power consumption for services. Our assessment method can act as a guideline for the development of such standards.
30
Continuous dynamics of automated algorithmic trading

Charlotte Szostek
University of Bristol charlotte.szostek@cs.bris.ac.uk
Automated trading in global nancial markets has systemic level impacts. The need to understand these impacts (for example the ash crash) calls attention to the possibilities and limitations of modeling nancial markets as a system. This project explores the capabilities of novel methods to illuminate the dynamics of heterogeneous nancial markets, that is those populated by both human and automated trading agents. Our research hypothesis is that this human-automation heterogeneity also leads to different market dynamics in a broader sense. Continuous-time mathematical foundations are an appropriate method of describing heterogeneousagent populated market at a microsecond resolution. This research is aimed at understanding the impact of algorithms in the nancial market system. Adaption of existing algorithm descriptions could allows the use of traditional continuous analysis. This work asks what can analysis of a continuously described system contribute to our understanding of global nancial markets? Although established algorithms requires signicant interpretation and dissection, it is possible to produce a continuous system that can be validated from the discrete system. This is achieved by building a validation model. This validation provides a solid grounding for continuous system analysis.

Proceedings of The Third Large Scale Complex IT Systems Postgraduate Workshop

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proceedings of The Third Large Scale Complex IT Systems Postgraduate Workshop

Uploaded by

Copyright:

Available Formats

Large Scale Complex IT Systems

Proceedings of the Third Large Scale Complex IT Systems Postgraduate Workshop

Goodenough Club, London, UK 31st October 2011

Editor: Jason S. Reich Co-chairs: Yasmin Raq Jason S. Reich

Copyright 2011 by the authors.

Invited talk: Managing Complexity in Large Scale Organizations

CloudMonitor: Software Energy Monitoring for Cloud

Cloud Computing and Energy

Informing Workload Schedulers

Making Smart Devices more Reliable

Lazy generation of canonical programs

Securing ofine integrity and provenance of virtual machines

Sleep Stage Classication in Rodents

Removed from published proceedings due to commercial sensitivity.

Cloud Objects: Programming the Cloud with Data-Parallel, Multicast Objects

Control-theoretical approach for workow management

Socio-technical Analysis of Enterprise-scale Systems

The Cloud Adoption Toolkit

Stack usage analysis for large complex embedded real-time systems

Examining the Relative Agreement model with Clustered Populations

Techniques for Learning the Parameters of QoS Engineering Models

Dynamic Energy Footprinting of Online Media Services

Continuous dynamics of automated algorithmic trading

You might also like