You are on page 1of 11

CTC-Swarm: a cloud resource manager used to

scrape large amounts of online data

Marc Juchli Mathijs Hoogland Mateusz Garbacz


Technical University of Delft Technical University of Delft Technical University of Delft
Email: m.b.juchli@student.tudelft.nl Email: m.h.hoogland@student.tudelft.nl Email: m.garbacz@student.tudelft.nl

Course Instrutors: Dick Epema, Distributed Systems Group, section VI the experimental setup is presented together with
EEMCS, TU Delft, Email: D.H.J.Epema@tudelft.nl the evaluation of specific features of the system. We end our
Lab Assistant: Apourva Parthasarathy, Email: report with section VII, containing our conclusions.
a.parthasarathy@student.tudelft.nl
II. BACKGROUND ON CTC-S WARM
Abstract—In this report, we present the CTC-Swarm, a cloud
resource manager that runs multiple data retrieval tasks as a CTC-Swarm is an application that can be used for many
preparation for automated crypto-currency trading. The software purposes. However, in this report, we focus on the data
automatically deploys the tasks on the cloud, monitors them collection tasks, especially online information related to the
and restarts in case of errors. Two provisioning policies are trading of crypto-currencies. This information will be further
introduced, and tests show that a choice needs to be made fed to an automated trading application developed by the
between lower costs and higher execution speed in order to choose WantCloud company. Therefore, the purpose of CTC-Swarm
the optimal policy. We also manage to lower the downtime of is to manage 3 data retrieval tasks which are provided by the
streaming services to one minute in case of error, by constantly
observing the pressure of the incoming data.
company:

Keywords—IaaS, Cloud, Cluster Management 1) News extraction from websites: the task downloads
a provided RSS feed with an arbitrary number of
I. I NTRODUCTION elements, specifying the most current crypto-currency
related articles, their addresses, titles and release
Cloud services have caused a revolution in the dynamics of dates. For each article, a dedicated crawler ex-
IT industry by providing scalable and virtualizable resources. tracts the content and stores this information in the
Infrastructure as a Service (IaaS) is one of the three types of database. Once all the provided URLs are crawled,
cloud service models, which supplies its users with access to the program terminates.
virtual computing resources such as server space or hardware 2) Tweet streaming: This task consists of streaming,
[1]. However, oftentimes the user of such an infrastructure does processing and storing of tweets that contain certain
not receive support in deploying and managing his application. pre-defined keywords. From each object, the program
In this report, an application is introduced that orchestrates extracts the text of the body, converts and performs
and manages the extraction of large amounts of data from the the pre-processing. Finally, each tweet is stored in the
internet. This data is used as input for an automated trading database.
algorithm for crypto-currencies. In such trading applications, 3) Trade information processing: this program continu-
it is crucial to be able to get data as soon as it is available and ously streams the current data regarding trades exe-
employ it. Because of this requirement, the resource usage of cuted on one of the online crypto-currency markets
our application can vary from time to time. and summarizes them forming one-minute windows
of the data. Therefore, every minute thousands of
To be able to efficiently cope with these peaks in demand, trades are loaded, accumulated, and stored in the
this report presents CTC-Swarm. CTC-Swarm is an auto- database.
scaling resource management system that schedules and scales
computing resources based on demand. The software utilizes CTC-Swarm is responsible for the execution of these jobs
EC2 computing instances from Amazon Web Services (AWS) and needs to monitor and optimize the resource usage of the
in order to run several data extraction and data mining appli- application.
cations. Further, it runs scripts packed into docker containers
in an automated and elastic fashion and allows monitoring and III. S ERVICE L EVEL O BJECTIVES
scheduling of the jobs.
It is crucial for CTC-Swarm to ensure that any employed
Therefore, the report is structured as follows. We start with finite task (i.e. the news extraction) is executed successfully
the presentation of the overview of CTC-Swarm’s main fea- and any continuous streaming task (e.g. the tweet streaming)
tures and capabilities in section II. We then describe our service break for too long, even if an error occurs. We define a
level objectives in section III. In section IV the architecture of Service Level Objectives (SLO) for our application to make
our software is presented. In section V we present the imple- this requirement measurable. For the finite task, the SLO would
mented features and characteristics of CTC-Swarm. Further, in be to successfully finish any task scheduled or delegated by a
user. In case of an error, the task needs to be restarted and Further more, the Manager further provides the interface
finished. When it comes to the streaming jobs, the SLO of run(policy), allowing to define which policy (see V-A)
our application is to stream the data continuously. Whenever the manager should follow while processing those tasks from
a break occurs, this break may not be longer than one minute. the queue.
This means that, if, for example, a Tweet streaming task
The AWSManager class handles resources we de-
crashes, our application needs to restart this Tweet streaming
mand from Amazon Web Services (AWS), such as RDS
task in: less than a minute. Finally, an SLO violation is any
databases, EC2 instances and their utilization using Cloud-
violation of the two previously defined objectives. They are
Watch. Therefore we use the API wrapper library boto3[2].
counted by taking a sum over the number of not successfully
Next to the obvious functionality that lists currently run-
executed continuous tasks added and the number of minutes
ning resources, more complex operations are implemented
that the streaming tasks did not return any data.
as well. For example, launchEC2Instance(name) and
launchRDSInstance are serving the purpose of provi-
IV. S YSTEM DESIGN sioning a resource, EC2 or RDS respectively, and ensure that
Our application uses a micro-service architecture. Each of the resource is being started correctly. As those functions are
the 3 tasks is designed as a standalone dockerized application time-consuming, hence blocking, we used Pythons asyncio
and in our case deployed on an EC2 computing instance from library in order to work in asynchronous fashion. Once the
AWS. Multiple applications can, therefore, be deployed on one Future results in the expected result, the instances are ready
instance. for services to be deployed on.
The DockerManager is responsible for handling con-
A. Interaction tainer (e.g. service) related concerns. We, therefore, rely on
the Docker API wrapper library docker-py[3]. This includes
Figure 1 contains a high-level overview of the architecture
the deployment of our images: ctc-news, ctc-trades-watch,
of our application. There are three types of initiators which
ctc-twitter and are deployed with the functions: deployNews,
start an interaction with the underlying components:
deployTrades, deployTwitter. In order to deploy any of the
1) Users interact with the application through a RESTful images, the DockerManager needs to connect to a docker
API (see description in IV-C) whose requests are installation on one of the deployed instances on AWS. This
handled by the Request Handler. This class can be done in two ways. The simple approach is to con-
translates user requests for the Manager, which nect to the machine which is passed directly to the class
manages the task execution (see IV-B). In case the using the function setClientByInstance(instance).
user wishes to schedule future tasks upon request, However, there are some cases where we have multiple
the translated task, which is to be completed, is sent instances running and wish to connect to the most suit-
to the task queue S together with the UTC-timestamp able one based on some strategy. Therefore, the func-
of its preferred execution. tion setClientByLowestContainers(instances)
The Scheduling thread, not only plans future allows to connect to the one instance which currently maintains
tasks but is also responsible for maintaining the the least running docker containers.
company’s SLO’s (III). That is, by approaching the Once services are deployed and running, the
StatisticsManager for past container utiliza- StatisticsManager provides essentially two purposes:
tion, CPU utilization and application specific infor- report and retrieve pressure for services. That is, reporting
mation such as trade-, news- and tweet pressure. pressure is handled by the monitoring thread (running on
2) Therefore, the Monitoring thread continuously the server) and connects to the Results-Database, determines
observes utilization of infrastructure and applica- pressure for trades, tweets, or news, and writes the observed
tions and reports to the internal database. Specif- pressure into our Application-Database. We define the table
ically, it uses the AWSManager to retrieve CPU that reports pressure for an arbitrary service s as follows:
utilization, the DockerManager to observe the pressure(s): id, timestamp, minute,
number of deployed containers per instance and hour, option
the StatisticsManger to report the pressure of The ids and timestamps are incrementing, the minute and
trades, tweets and news per minute and per hour. hour fields describe the pressure for the last minute or hour
and the option field allows to define e.g. the instance (for
B. Components CPU pressure) or the market (for trade pressure) that was
observed. Retrieving the pressure is required in the scheduler
The Manager class serves as an intermediary for handling
and simply reads out previously reported pressure of a service.
interactions on both AWS and Docker. Its other duty is to
provide a task queue T, where provisioning tasks can be The Scheduler class is an extension used by the
managed using the interfaces addTask and removeTask. Scheduling thread and serves a queue S that keeps track of
We define a Task to be a tuple consisting of a service all tasks that need to be executed in the future. The function
(described within the DockerManager) and an operation addTask(task, timestamp) therefore accepts the to be
(either start, stop or restart): Task: (Service, executed task and the timestamp of its execution. The task
Operation). The reason for handling tasks on manager level again is a tuple of a service and its operation, whereas the
instead of docker level is the fact that services can run on an service is in our case a docker container. In addition, we also
arbitrary AWS instance, depending on the provisioning policy. provide capabilities of accepting cron tasks, by specifying an
Hence, the manager holds a dependency for both classes. interval instead of a timestamp: addTask(task, cron)
C. REST API be minimum for the threshold to run the number tasks in the
queue. Then, in each iteration it first checks whether any of the
As mentioned in IV-A, CTC-Swarm exposes a restful VMs can be closed or any new ones need to be booted based
API such that users can interact with the infrastructure. The on the current number of tasks running and the queue size.
endpoints can be described as follows: Then, it tries to run as many tasks as possible on the instances
POST /manager/strategy: name, args with the lowest number of containers running. However, the
POST /task: service, operation main limitation of this policy is the fact that it does not take
DELETE /task: service, operation into consideration how resource demanding some applications
DELETE /allTasks may be. This problem is addressed by the second policy.
POST /schedule/task: service, operation
As on can see from the naming of the endpoints, the API serves b) Utilization threshold of the VMs: The second policy
a subset of the capabilities provided by the components de- takes two parameters as input. It requires the maximum uti-
scribed in IV-B. The RequestHandler therefore translates lization threshold, which specifies what level of utilization of
the requests and calls the appropriate function in the specific a VM the manager will assign tasks to it, and initial instance
class. allocation that is a guess of the user of how many containers
the instance can handle at the same time. However, the second
V. I MPLEMENTED FEATURES parameter is used only in case of many tasks being added
to the queue at the same time, making the manager start
A. Automation booting multiple instances, instead of booting one at a time,
The automation part of this project involved the handling of and waiting for it to be fully utilized. The policy performs
RDS instances, EC2 instances and Docker containers, laying similar steps in each iteration, the only modifications are that
out a foundation for other features to rely on. it additionally boots a new instance if all instances reach the
utilization limit and assigns tasks to instances with the lowest
The AWSManager provides interfaces to communicate utilization. The policy improves upon the previous policy in
tasks to concerning a resource provided by Amazon AWS. The terms of monitoring the CPU usage of the instances and
full class documentation is listed in Appendix -A. As one can possible cost savings.
see from the function descriptions, the functionality is mostly
concerned about launching, reading and terminating instances. These two policies are evaluated in the next section for
multiple parameter sets.
The DockerManager provides interfaces to handle con-
tainers (to be) deployed on EC2 instances. Much of the
C. Reliability
functionality we use is already provided by the API, however,
our project involves, for example, the deployment of containers The CTC-Swarm implements the reliability feature, by
which require extensive configuration at deployment. There- making sure that the streaming tasks are down for an as short
fore Appendix section -B describes the from us developed time as possible and the finite tasks are quickly restarted in
interfaces. case of an error. This is done on two levels.
Whereas the previous two classes describe infrastructure 1) Firstly, the Manager class does it at the container
functionality, the class StatisticsManager is more tar- level. In each iteration checks, whether any of the
geted to observe and report application-level behavior. As it Docker containers running has the failed status,
can be seen from Appendix section -C, for every type of meaning that there was a tasks error. Such containers
service there is a reporting function as well as a getter. are restarted directly.
2) Secondly, the Scheduler does it in a streaming
B. Elasticity / Load-balancing application-specific way, by monitoring the rate of
When it comes to the elasticity, the Manager takes care of incoming trades and in case of a drastic decrease, it
booting new instances whenever necessary and closing them warns the Manager. The Manager checks again the
whenever they are idle and there are no jobs to be executed. status of the container, and if Docker indicates it is
The strategy the manager uses in order to launch instances and running correctly and it has not just been restarted by
containers is what we call the Provisioning Policy. Depending the Manager, the restart process is invoked. Thanks to
on the policy, the Manager manages the VMs differently. this, we make sure that even if there was an error, that
We set a maximum of 10 VMs that can run at the same did not cause the container to fail, while negatively
time, preventing our application from unexpectedly becoming influencing the streaming, the task gets restarted.
more expensive than anticipated. The evaluation of our tests
VI shows that this number is sufficient. Two main policies D. Monitoring
managing the load balancing: one based on the threshold of
instances per VM, while the other one based on the utilization The monitoring feature is implemented in CTC-Swarm
threshold of the VMs. These will be discussed in more detail on the application specific level for the streaming tasks. The
in the upcoming paragraphs: system monitors the number of tweets and trades for different
markets streamed per minute and stores it them into a separate
a) Maximum number of tasks run on a single instance: database. Additionally, this allows for monitoring whether
The first policy takes the maximum number of containers to some of the streaming tasks have been broken and restarting
be run on a single VM as a parameter. Once the Manager runs these tasks by the manager if the rate of streamed data is too
with this policy, it boots as many instances as it expects would low.
Fig. 1. CTC-Swarm Architecture Overview

Mean Standard deviation


E. Scheduler Creation time EC2 instance 35,8 5,27
Time for EC2 instance to react 241,2 8,9
Finally, there is a scheduler feature that enables users to Time to shut EC2 instance down 38,2 1,8
schedule tasks that need to be executed in the future. Users TABLE I. C LOUD LEVEL BENCHMARKS
can submit tasks in combination with a timestamp through
the Request Handler API to CTC-Swarm. The scheduler
will then make sure that these tasks will be executed at the 3) Reliability Tests
given timestamp.
The aim of the first type of tests is presenting the general
VI. E XPERIMENTS AND R ESULTS task execution’s as well as cloud-specific properties. Then, the
performance tests compare the different Provisioning Policies
This section describes the experiments performed to eval- of the system, therefore they test the automation, elasticity and
uate the system. The following subsections present the overall load balancing. Finally, the reliability tests evaluate the fault-
experimental setup and the specific test cases for each feature. tolerance and monitoring features.

A. Experimental setup 1) General benchmarks: These tests were executed to


evaluate the general properties of the system. The results were
The general setup for the experiments is established as used as a basis to discuss the results of the further tests and
follows. The program is capable of running on any machine as to get a feeling of the response times of our application. We
we specify AWS accounts using environment variables. This measured the general benchmarks on two levels, cloud and task
allows us to develop and run tests from our local machine. level. Running these tests took roughly 4 hours of charged-cost
Using the AWS API, EC2 instances can be listed which allows therefore becomes 40 euro-cent.
us to connect to the exposed Docker API within the machine.
a) Cloud level: To test the general benchmarks from
In a production setup, the CTC-Swarm application would run
a cloud level perspective, we ran the following test: execute
on as a docker container within one of the available instances.
10 new EC2 instances separately. We measured the following
In fact, CTC-Swarm is a stateless application and hence can be
values:
run simultaneously on multiple instances in order to contribute
as an extended reliability factor. The specification of the 1) Time it took for our EC2 instance to be created
machine CTC-Swarm is run on can generally be neglected, 2) Time it took for our EC2 instance to be responsive
as no computationally intensive tasks will be executed. More 3) Time it took for our EC2 instance to be shut down
important are the servers (e.g. AWS EC2 instances and RDS
instances) that are used for the deployed containers. In order Table VI-B1a shows the averages and the standard de-
to not exceed costs, we fined a fixed instance type for both viation of our measurements. As can be seen, the standard
EC2 and RDS: t2.micro and db.t2.micro respectively. deviation is quite low. This is because the response time of
AWS is constant.
B. Experiments b) Task level: To measure the task level benchmarks,
There are three main types of tests performed we executed the following tests 10 times: deploy each of the
three tasks on an empty EC2 instance, deploy each of the three
1) General Benchmarking tasks on an instance with already an instance running, measure
2) Performance Tests for different Provisioning Policies the execution time of the news task.
Mean Standard deviation
Deployment time empty EC2 0,9 0,01
Deployment time non-empty EC2 0,9 0,01
Execution time news task 523,4 10,4
TABLE II. TASK LEVEL BENCHMARKS

The results of these measurements can be seen in table


VI-B1b. As you can see, the standard deviation of the task
deployment is almost zero. This is because the deployment of
tasks is with docker is extremely constant. Since we are only
using these benchmark tests to get a feeling of the execution
times, we did not measure the deployment time for instances
with a higher CPU-utilization. This will be measured in the
following tests.
2) Performance Tests for different Provisioning Policies: Fig. 2. Average Instance utilization for the defined policies with different
In this report, we introduce two provisioning policies. The workloads
aim of this section is to evaluate these policies on the news
scraping task, since it is the only finite tasks we can apply.
We set up an experiment where we test these policies for
two different workloads. The aim of the experiment is it to
successfully execute the number of finite tasks, with a high
utilization of the instances for cost savings and low execution
time. In the time span of 10 minutes, we run several news
loading tasks at random times, 25 and 50 tasks respectively.
These workloads can show how well the policies deal with
low and high numbers of tasks to be executed. The results
are averaged over 3 runs. Experimentally we conclude that by
running 10 containers at the same time on a single instance
results in CPU utilization of around 50% and at the same time
the instance does not slow down significantly. Therefore, we
evaluate the program for 3 policy and parameters variants:
Fig. 3. Average number of instances running and executing the tasks over
• P1: Policy 1 with the maximum of 10 containers the time of execution of different workloads for the defined policies
running at the same time on a single machine
• P2: Policy 2 with maximum CPU utilization of 50%
per instance and initial container allocation of 10 of different policies and workloads. The results are presented
in the Figure 3. For the lower workload, the P1 and P2 policies
• P3: Policy 2 with maximum CPU utilization of 70% on average run the least instances, however, the difference is
per instance and initial container allocation of 10 minimal. When it comes to the high workload, the P2 and P3
have the highest number of instances running. This indicates
In the experiment, we focus on multiple aspects of the that whenever the workload is high, these policies detect high
execution. In order to evaluate the policies we measure the CPU utilization and open new instances.
average utilization, the number of instances running, the time
of execution of the workload and container and finally the cost Next, we analyze the average time of workload and con-
for each execution. The total charged time for EC2 instances tainer execution, see Figure 4. We see that in general P2 has
in this experiment is 19 hours and the cost is 1.9 euros. the lowest execution times, with one exception. This shows
that even though it requires running more instances it also
Firstly, the average CPU utilization of the running in- manages the task running in a way that does not cause a large
stances is reported in the Figure 2. Clearly, the P2 has the the execution time elongation.
lowest utilization for both workloads, while P1 and P3 have
comparable ones. This indicates that either the P2 runs more Finally, we compare the costs of employing the workloads
instances than P1 and P3 to execute the same workload with each policy. The costs are 6.7, 7.19, 7.45 cents for work-
or by employing the containers while keeping in mind the load 1 and 11.2, 13.6, 12.4 cents for workload 2, for policies
maximum instance utilization it allows the tasks to run more P1, P2 and P3 respectively. Therefore, we can conclude that
smoothly, without hitting the resource bottleneck. Since the the first provisioning policy, in general, is cheaper to employ,
news running task has the highest CPU utilization at the start, however, comes with longer execution times, while the second
by employing P2, starting too many containers at the same policy makes sure that there is a low container running time.
time may be prevented, and the slow down of the execution
3) Reliability tests: The final type of experiments measures
may be prevented.
how reliable the application is when it comes to random faults
To confirm that we also compare the average number during task execution. In this experiment, we run a streaming
of instances running in general and the average number of task and simulate an error on the application level during the
instances executing any tasks over the time of the execution instance execution, such that the pressure of the service (e.g.
The log holds 4 observations (indicated with the separator line)
of 3 running services each time.
There are two parameters which have a direct impact on
the data loss: observer frequency and pressure window.
1) The observer frequency in which the scheduler ob-
serves the pressure determines how responsive the
scheduler will react upon a pressure drop.
2) The pressure is defined as the number of items
received in one minute. The pressure window is
the number of minutes that is being observed. If
the pressure window is set to 10 minutes, then, we
would sum the accounted q-minute trades (or tweets)
over the last 10 minutes, regardless of the observer
Fig. 4. Average time of execution of a single container and full workload frequency. This is important because different types
using different policies for different workloads of services experience naturally a different pressure
load (e.g. GDAX provides a much lower pressure (42
trades / minute on average) compared to Bitfinex (235
tweets or trades) drops to zero. Table III provides an overview trades / minute on average) because we are watching
and summarizes the test scenario. EUR trades on GDAX and USD trades on Bitfinex).
3) Note: In theory, there would be room for a third
TABLE III. OVERVIEW T EST 3
parameter: a pressure threshold which defines at
Category Reliability what stage the services experiences faulty behavior.
Purpose Reduce data loss in case of application error
Parameters 2 (observer frequency, pressure window) However, for the services which are executed in the
Nr. of executed tests 5 CTC environment, a pressure threshold fixed at 0 is
Outcome Data loss, SLO violations sufficient.
The resulted downtimes, given different observing frequen-
The procedure of this test looks as follows: cies are indicated in Table IV.

1) Stopping the monitoring of a specific service which TABLE IV. E VALUATION OF DOWNTIME (DT) AND NUMBER OF
VIOLATIONS (V)
leads to an immediate pressure drop.
2) Then, the longest time between two consecutive Service
Pressure DT: observe DT: observe DT: observe
Window every 1min every 2min every 5min
tweets or trades is measured, which in comparison to Trades(GDAX) 3min ∼180s / 3V ∼210s / 4V 300s / 5V
general benchmarks, may indicate how long it took Trades(Bitfinex) 1min ∼75s / 2V 120s / 2V 300s / 5V
for CTC-Miner to automatically restart the instance Trades(Bitstamp) 1min ∼90s / 2V 120s / 3V 300s / 5V
and how many tweets or trades are lost in that time. Tweets 5min ∼348s / 6V ∼354s / 6V ∼445s / 8V

3) The results are averaged over 5 tries.


This experiment focuses on the streaming tasks, since for the First of all, one of the goals was to find the right pressure-
news loading task, in case of a random error, the preparation window parameter. Setting the window too low, it might lead
for news scraping has to be run again. This is straightforward to empty pressure, although the system is running normally.
and therefore no concern for the company. Setting the factor too high results in having a much longer
response time because the system will think there is still
Using log files we are able to determine at what point in pressure whereas the pressure is actually fading off. Indicated
time the Scheduling-thread observes the pressure, detects a in the table are the parameters which, through empirical
drop and restarts the particular container that runs the affected testing, provide just enough margin to not result in 0-pressure
service. An example of such a log of trade pressure looks as due to naturally low pressure.
follows:
As it can be seen from the results, the downtime obviously
INFO : M a r k e t gdax n o r m a l : 177.00000 increases with the frequency of observing pressure, up to the
INFO : M a r k e t b i t f i n e x n o r m a l : 5 1 3 . 0 0 0 0 0
INFO : M a r k e t b i t s t a m p n o r m a l : 1 7 6 . 0 0 0 0 0 point where the downtime equals the frequency, which implies
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− that waiting time was added by having a too large observing
INFO : M a r k e t gdax n o r m a l : 47.00000 frequency. We can further see that with a frequency of one
INFO : M a r k e t b i t f i n e x n o r m a l : 1 9 2 . 0 0 0 0 0 minute, the downtime was only slightly more than the incor-
INFO : M a r k e t b i t s t a m p n o r m a l : 3 3 . 0 0 0 0 0 porated pressure window, which implies that the observing rate
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
INFO : No p r e s s u r e f o r m a r k e t : GDAX. is chosen well and very little time is wasted while the service
INFO : R e s t a r t i n g c o n t a i n e r : was not working. Regarding the violations, it can be seen that
<C o n t a i n e r : 2 d709537a4> for trade-services with high pressure (Bitfinex), a frequency of
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− two-minutes is actually sufficient, whereas the lower pressured
INFO : M a r k e t gdax n o r m a l : 23.00000
INFO : M a r k e t b i t f i n e x n o r m a l : 1 9 2 . 0 0 0 0 0 ones require a one-minute observing frequency. Likewise, the
INFO : M a r k e t b i t s t a m p n o r m a l : 1 9 . 0 0 0 0 0 violations occurred during a pressure drop in the tweet service
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− are the same whether we observe every one- or two-minute.
In favor of saving computational power, the higher frequency
is better.
In addition to the Scheduler-thread which provides reliabil-
ity on the application level, we have also implemented methods
to restart failed services on container level. Depending on the
selected provisioning policy, as explained in V-B, the amount
of time until the Manager attempts to restart the particular
container can change. That is because, for example, P1 needs
to wait for CPU utilization information. However, tests have
shown that there is no significant difference and therefore
consider any of the policies as supportive in terms of reliability.
However, as an infrastructure provider, we have to expect the
provided applications, provided as docker containers, to be
fully functional. Hence, we don’t expect these kind of errors
in a well-developed applications running in the cloud and see
no need to compare the difference introduced by the policies
we provide. The main importance is, that if a container was
unable to launch, an attempt to restart will follow.

VII. C ONCLUSION
In this report, we designed and evaluated the cloud resource
manager CTC-Swarm. CTC-Swarm is able to orchestrate the
execution and the scheduling of 3 different types of data
extraction tasks. The application is also able to scale in
resources whenever necessary. We described the architecture
and implemented several features.
We evaluated CTC-Swarm with 3 types of tests: benchmark
tests, tests on different provisioning policies and reliability
tests. The tests show that, when choosing a provisioning policy,
a trade-off is done between increasing the performance and
lowering the costs. The tests also show that, in order the
minimize SLO violations, the pressure window of streaming
services needs to be chosen based on the different pressure
loads of services that are streamed.
While developing a cloud-based application we came to
experience that it is very difficult to think of all the edge
cases. For example, the simple task of starting an instance
is not that trivial considering that the amount of time until the
instance is fully ready for containers to be deployed on can
differ a lot. We also noticed that the computing power that
was provided over our free-tier AWS instances can vary. We
solved this problem by taking the averages of the experiments
and introduced extensive error handling within the code.

R EFERENCES
[1] S. Goyal, “Software as a Service, Platform as a Service, Infrastructure
as a Service A Review,” International journal of Computer Science
Network Solutions, vol. 1, no. 3, pp. 53–67, 2013.
[2] “Boto 3 documentation.” [Online]. Available:
https://boto3.readthedocs.io/en/latest/
[3] “docker-py.” [Online]. Available: https://github.com/docker/docker-py
C LASS D OCUMENTATION
A. AWSManager
NAME
aws manager

CLASSES
builtins . object
AWSManager

c l a s s AWSManager ( b u i l t i n s . o b j e c t )
| Methods d e f i n e d h e r e :
|
| init ( self )
| I n i t i a l i z e s e l f . See h e l p ( t y p e ( s e l f ) ) f o r a c c u r a t e s i g n a t u r e .
|
| deleteRDSInstance ( s e l f )
|
| getAllInstances ( self )
|
| getAllInstancesReady ( self )
|
| getInstanceById ( self , id )
|
| getInstanceUtilization ( self , instanceId )
| D e t e r m i n e s l a t e s t known a v e r a g e CPU u t i l i z a t i o n u s i n g CloudWatch
|
| R e t u r n s o n l y t h e l a t e s t known o b s e r v e d t i m e s t a m p .
|
| getInstancesById ( self , ids )
|
| getRDSInstance ( s e l f )
|
| isInstanceReady ( self , instanceId )
|
| l a u n c h I n s t a n c e ( s e l f , name )
|
| p r o v i s i o n I n s t a n c e ( s e l f , i n s t a n c e T y p e = ’ t 2 . micro ’ )
|
| p r o v i s i o n R D S ( s e l f , i n s t a n c e C l a s s = ’ db . t 2 . micro ’ , u s e r , p a s s w o r d )
|
| terminateAllInstances ( self )
|
| terminateInstance ( self , instance )
|
| terminateInstances ( self , ids )
|

B. DockerManager
NAME
docker manager

CLASSES
builtins . object
DockerManger

c l a s s DockerManger ( b u i l t i n s . o b j e c t )
| Methods d e f i n e d h e r e :
|
| init ( self )
| I n i t i a l i z e s e l f . See h e l p ( t y p e ( s e l f ) ) f o r a c c u r a t e s i g n a t u r e .
|
| close when empty ( s e l f , i n s t a n c e )
|
| deployCandleProcessor ( s e l f , containerName )
| Available containers :
| c t c −t r a d e −c a n d l e s −b i t s t a m p −1m
| c t c −t r a d e −c a n d l e s −gdax −1m
| c t c −t r a d e −c a n d l e s −b i t f i n e x −1m
| c t c −t r a d e −c a n d l e s −b i t f i n e x −15m−s h i f t e d
| c t c −t r a d e −c a n d l e s −gdax −15m−s h i f t e d
| c t c −t r a d e −c a n d l e s −b i t s t a m p −15m−s h i f t e d
| c t c −t r a d e −c a n d l e s −gdax −60m−s h i f t e d
|
| deployNews ( s e l f , s e r v i c e N a m e = ’ c t c −news ’ )
|
| deployTrades ( s e l f )
|
| deployTwitter ( self )
|
| getAllRunningContainers ( self , instances )
| L i s t s running c o n t a i n e r s fo r a l i s t of i n s t a n c e s .
|
| A z i p l i s t i n t h e form o f : [ ( i n s t a n c e , [ c o n t a i n e r s ] ) ] i s r e t u r n e d .
|
| getClient ( self )
|
| getClientByInstance ( self , instance )
|
| getImageName ( s e l f , c o n t a i n e r , v e r s i o n = F a l s e )
|
| getNumOfRunningContainers ( s e l f , i n s t a n c e )
|
| getRunningContainers ( self , instance )
| L i s t s t h e r u n n i n g c o n t a i n e r s f o r an i n s t a n c e
|
| notifyWhenLessThanTargetNumOfContainers ( s e l f , instance , t a r g e t )
|
| restartContainer ( self , container )
|
| setClient ( self , client )
|
| setClientByInstance ( self , instance )
|
| setClientByLowestContainers ( self , instances )
|
| toEnviron ( self , l )
| C o n v e r t s and r e s o l v e s a l i s t o f e n v i r o n m e n t v a r i a b l e s t o a d o c k e r
| environment entry .
| E . g . [ ’ CTC DB ’ ] r e s u l t s i n : [ ’ CTC DB= v a l u e ’ ]

C. StatisticsManager
NAME
statistics manager

CLASSES
builtins . object
StatisticsManager

class StatisticsManager ( builtins . object )


| Methods d e f i n e d h e r e :
|
| init ( s e l f , awsManager=None )
| I n i t i a l i z e s e l f . See h e l p ( t y p e ( s e l f ) ) f o r a c c u r a t e s i g n a t u r e .
|
| g e t C u r r e n t N e w s P r e s s u r e ( s e l f , minutes =1)
|
| g e t C u r r e n t T r a d e P r e s s u r e ( s e l f , market , minu tes =1)
| Note : T r a d e s a r e s t a m p e d w i t h UTC and t h e f i e l d i s a VARCHAR.
| T h e r e f o r e we c a n n o t u s e MySQL t i m e s t a m p f u n c t i o n a l i t y , however ,
| we c a n f i l t e r m i n u t e s u s i n g e . g . 20171025 2 1 : 0 7 : %
|
| g e t C u r r e n t T w e e t P r e s s u r e ( s e l f , minutes =1)
|
| g e t N e w s P r e s s u r e ( s e l f , min utes =1)
|
| g e t P r e s s u r e ( s e l f , c u r s o r , minutes =1)
|
| getTimeNow ( s e l f )
|
| g e t T r a d e P r e s s u r e ( s e l f , market , minut es =1)
|
| getTradeTable ( s e l f , market )
| Finds t r a d e t a b l e according to market s h o r t t a g :
| − gdax
| − bitfinex
| − bitstamp
|
| The r e a s o n f o r t h i s i n c o n v e n i e n c e i s t h e f a c t t h a t t h e t r a d e w a t c h e r
| l i b r a r y i n u s e a p p e n d s t h e d a t e when t h e j o b i s s t a r t e d i n h a r d−c o d e d
| f a s h i o n . E . g . ” e x c h g d a x b t c e u r s n a p s h o t 2 0 1 7 1 0 2 5 ” . Thus we f i n d t h i s
| t a b l e by l o o k i n g up ” gdax ” .
|
| g e t T w i t t e r P r e s s u r e ( s e l f , minutes =1)
|
| reportCpuUtilizations ( self )
|
| reportNewsPressure ( s e l f )
|
| r e p o r t T r a d e P r e s s u r e ( s e l f , market )
|
| reportTweetPressure ( self )
|
| −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| Data d e s c r i p t o r s d e f i n e d h e r e :
|
| dict
| dictionary for instance variables ( i f defined )
|
| weakref
| l i s t o f weak r e f e r e n c e s t o t h e o b j e c t ( i f d e f i n e d )

DATA
db = <peewee . MySQLDatabase o b j e c t >
news db = <peewee . MySQLDatabase o b j e c t >
t r a d e d b = <peewee . MySQLDatabase o b j e c t >
t w i t t e r d b = <peewee . MySQLDatabase o b j e c t >
T IME SHEET
D. Report and Code
• the total-time = 200h
• think-time = 50h
• dev-time = 60h
• xp-time = 40h
• analysis-time = 10h
• write-time = 20h
• wasted-time = 20h

E. Experiments
1) Experiment 1:
• total-time = 5
• dev-time = 4
• setup-time = 1
2) Experiment 2:
• total-time =15
• dev-time =10
• setup-time =5
3) Experiment 3:
• total-time = 15
• dev-time = 10
• setup-time = 5

You might also like