Professional Documents
Culture Documents
Examensarbete 30 hp
Juni 2014
Abstract
Detecting changes in UERC switches
Jacob Hellman, Lars-Gunnar Olofsson
Popul
arvetenskaplig sammanfattning
Mobiltelefonindustrin har under de senaste decennierna utvecklats fr
an enkla telefoner
med begransad funktionalitet till hogteknologiska apparater med tal, data och video
a de aktorer som distribuoverforingar. Denna extrema utveckling har stallt hoga krav p
erar det mobila natverket till slutanvandarna. For att kunna havda sig p
a marknaden
och forbli konkurrenskraftig kravs s
aledes att det mobila natet standigt utvecklas for att
mota anvandarnas krav. Utvecklingen gors normalt i labbmiljo for att testa eventuella
Preface
This master thesis was written by Jacob Hellman and Lars-Gunnar Olofsson, Engineering
Physic students at Uppsala University. The thesis was performed at Kista, Stockholm
in collaboration with Ericsson AB.
Both students have participated throughout the whole project and have a good understanding of all parts, Jacob had the major responsibility for the ASN.1 parser, constructing the sequences and the methods in discrete-time. Lars-Gunnars main responisbility
was the methods in continuous-time; the BEST analysis as well as constructing the
continuous-time Markov process.
Contents
1 Introduction
1.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
2
2 Background
2.1 WCDMA network . . . . . . . . . . . . . .
2.1.1 RRC and RAB . . . . . . . . . . .
2.1.2 User Equipment RAB Combination
2.2 General Performance Event Handling . . .
2.2.1 Storing and retrieving GPEH logs .
2.3 User Equipment Real-Time Trace . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
4
5
5
6
7
3 Theory
3.1 Null Hypothesis . . . . . . . . . . . . . . . . . . . .
3.2 UERC Sequences . . . . . . . . . . . . . . . . . . .
3.2.1 Introduction . . . . . . . . . . . . . . . . . .
3.2.2 Definition . . . . . . . . . . . . . . . . . . .
3.2.3 Normalized Sequences . . . . . . . . . . . .
3.2.4 Limitations . . . . . . . . . . . . . . . . . .
3.3 Sequence Analysis . . . . . . . . . . . . . . . . . . .
3.4 n-gram . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Introduction . . . . . . . . . . . . . . . . . .
3.4.2 Definition . . . . . . . . . . . . . . . . . . .
3.5 Discrete-time Markov Process . . . . . . . . . . . .
3.5.1 Definition . . . . . . . . . . . . . . . . . . .
3.5.2 Transition Matrix and Absorbing States . .
3.5.3 Example . . . . . . . . . . . . . . . . . . . .
3.6 Non-parametric Chi-Square . . . . . . . . . . . . .
3.6.1 Introduction . . . . . . . . . . . . . . . . . .
3.6.2 Definition . . . . . . . . . . . . . . . . . . .
3.7 Continous-time Markov Process . . . . . . . . . . .
3.7.1 Introduction . . . . . . . . . . . . . . . . . .
3.7.2 The Q-matrix . . . . . . . . . . . . . . . . .
3.7.3 Exponential time and Kolmogorov equations
3.7.4 Example . . . . . . . . . . . . . . . . . . . .
3.8 Bayesian estimation testing . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
9
9
9
9
9
10
10
11
11
11
12
12
13
13
13
14
14
14
15
16
17
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3.8.1
3.8.2
3.8.3
3.8.4
3.8.5
3.8.6
Introduction . . . . . . . . . . .
Variable distributions . . . . . .
Mean and standard deviation .
Effect size . . . . . . . . . . . .
Region of Practical Equivalence
Selecting sample size . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
18
18
21
22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
24
25
25
25
26
26
27
27
27
5 Results
5.1 User Equipment Real-Time Trace . . . . .
5.2 Detecting Events and Sequences . . . . . .
5.2.1 ASN.1 Parser . . . . . . . . . . . .
5.3 n-grams and Markov Chains . . . . . . . .
5.3.1 Unigrams . . . . . . . . . . . . . .
5.3.2 Bigrams . . . . . . . . . . . . . . .
5.3.3 Trigrams . . . . . . . . . . . . . . .
5.4 BEST . . . . . . . . . . . . . . . . . . . .
5.4.1 Define range of ROPE effect size .
5.4.2 UERC 4 - (FACH) and 21 - (URA)
5.4.3 UERC 25 - (EUL/HS) . . . . . . .
5.5 Continous time Markov process . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
31
32
32
33
36
37
38
38
38
39
6 Discussion
6.1 ASN.1 parser . . . . . . . . . . .
6.2 Sequence Analysis . . . . . . . . .
6.2.1 UERC Switching Patterns
6.2.2 UERC Holding Times . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
42
42
42
43
43
7 Conclusion
7.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Sequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
45
46
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Abbreviations
Abbreviation
CN
CS
CTMP
DTMP
GPEH
OSS
PS
RAB
RBS
RNC
RRC
UE
UERC
UERTT
UMTS
WCDMA
Description
Core Network
Circuit Switched
Continuous-time Markov Process
Discrete-time Markov Process
General Performance Event Handling
Operational Support System
Packet Switched
Radio Access Bearer
Radio Base Station
Radio Network Controller
Radio Resource Control
User Equipment
User Equipment RAB Combination
User Equipment Real-Time Trace
Universal Mobile Telecommunications System
Wideband Code Division Multiple Access
Chapter 1
Introduction
Wideband Code Division Multiple Access (WCDMA) is a technology within the standardized 3G mobile communication system and although it is over a decade ago since the first
version was released it is the most widely spread mobile network system in the world with
over 1 billion subscribers. Ericsson supports more than 240 different customers around
the world with WCDMA technology and about 45 percent of all smartphone traffic goes
through Ericsson networks (Ericsson 2014).
With this many users it is essential that the network is always operational and delivers
a stable user experience. One of Ericssons tools for monitoring and trouble shooting
the network are the log files called General Performance Event Handling (GPEH). They
contain information about the activities specific to every user equipment (UE) in the
network. Every network is administrated by a Radio Network Controller (RNC) which
can handle up to 100,000 users at the same time. The log files quickly become very large
which makes a fast manual analysis impractical.
User Equipment Real-Time Trace (UERTT) is another system recently deployed to enable
real-time monitoring for a faster analysis and trouble shooting of a WCDMA network.
By changing the output format to protocol buffers, Ericsson managed to shrink the data
size which then made it possible for a real-time trace. The UERTT system is currently
used for monitoring UEs experiencing problems in the network.
The WCDMA network is continuously evolving with upgraded software and tweaked
settings. It is very important to monitor the network after any changes have been made
to it in order to find abnormal behaviour. One way of doing that is by examining the
GPEH or UERTT log files. Although built on different technologies, the data handled
have similar structure (key-value pairs) and represents the same events which means they
can be examined by the same analysis.
There are several ways of trouble shooting a network and finding abnormal behaviour
quickly becomes a philosophical question rather than a scientific one. This project is
therefore limited to analysing sequences of switches between different Radio Access Bearers (RABs). A RAB is essentially a channel that handles data or voice with different
quality. If a UE requires both it can have two RABs simultaneously, a so called multiRAB. Every RAB and multi-RAB has a Unique id number which is conveniently referred
1
1.1
Problem description
GPEH and UERTT logs can be used to track the traffic of a specific UE and create
sequence of UERC switches. This would open up for a new type of analysis based on the
sequences with unexplored potential.
There are also issues regarding latency involved with the current analysis methods and it
is desired to automate as much as possible by utilizing data mining and machine learning
methods. UERTT could, theoretically, provide a real-time analysis and find abnormal
behaviour much quicker than the GPEH logs could ever offer.
Given UERC sequences the model should be able to detect abnormal behaviour given access to sequences with normal behaviour. The question is, given two data sets, are there
any statistical differences between the data sets, i.e. can we reject the null hypothesis.
1.2
Goal
In order to improve the trouble shooting in the WCDMA network by analysing sequences
of UERCs there are several questions that needs to be answered. Which events are
relevant to UERC switches? Ca UERTT offer a quicker way of trouble shooting? What
ca UERC sequences tell us?
Besides answering these questions, this project aims to develop and implement methods
for sequence analysis that should detect changes in UERC switches given two different
data sets where the conditions have been changed. The analysis should detect the following changes:
a UERC have either been removed or added
Switching behaviour have changed, e.g. switching between two UERCs have been
added or removed
The holding time in a UERC have changed
To reach these goals, methods in both discrete and continuous time need to be implemented. The discrete-time methods should detect changes in the switching pattern and
the continuous time methods should work with the holding times and detect changes
there.
2
Chapter 2
Background
The UERC sequnces are based on events in Ericssons WCDMA network so a basic
understanding of how the network works and what the different events really mean has
been essential for the success of this master thesis. The way to construct sequences was
not known from the beginning and it proved to be as much a philosophical questions as
a scientific one.
The background section will go into how the WCDMA network works, its components,
and other basic facts about the network. It will take a deeper look into the RRC and RAB
which play a major part in event handling, and the background of GPEH and UERTT
will also be explored.
2.1
WCDMA network
The WCDMA Radio Access Network (RAN) provides the connection between the Core
Network (CN) and the UE. The network is comprised of RNCs and Radio Base Stations
(RBSs). They communicate in different interfaces (see Figure 2.1) where transport of
signaling and user data is performed.
The UE includes all equipment types used by subscribers. They are divided into
two types; the mobile equipment (ME) and the identity module (USIM). The ME
part is used for radio communication and the USIM holds the subscribers identity,
performs authentication and stores information from the UE. (Eri 2002)
The RBS is the component that serves one ore more cells with radio coverage. It
is also responsible for radio transmission and reception to the UE. (Eri 2002)
The RNC controls the use and integrity of the system. It manages the RABs
(between the UE and the CN) for data transport and mobility.(Eri 2002)
The CN is responsible for switching and routing calls and data connections to
external networks. It contains the physical entities that provide support for the
network features and telecommunication services.
When traffic gets more sophisticated with both packet-oriented Internet traffic (3G
and 4G) and voice communication there is a need to track network components,
usage and traffic patterns, billing and reporting. The Operating Support System
3
2.1.1
Two of the most important concepts to understand when working with wireless networks,
such as WCDMA, is the function of the RRC and RAB. Together they play a major part
in the establishment and maintenance of a call, whether it is voice (CS) or data (PS).
The RRC is responsible for the signaling (or control) and the RAB for the data (or
information) of a call. It is the job of the RRC to send requests and receipts and the
RAB to carry the information across. Essentially, a RAB is channel that connects two
points. There are a number of different RAB configuration (channels) which are used
depending on the requested service (voice, data or both) and quality. A UE can have
multiple RABs, one for voice and one for data (Pedrini 2013).
Figure 2.2 illustrates the use of the RRC and RAB in a mobile network. All information
sent within the network is carried by the RAB. The RRC controls the connection between
the UE and the RNC. Figure 2.2 also illustrates how a call is being set up by the RRC.
First, the UE sends the RRC Connection Request event to tell the RNC it wants to
establish a connection. The RNC responds with a RRC Connection Setup containing
4
Figure 2.2: Illustration of the communication channels between UE, RBS, RNC and CN.
setup information. When the setup has completed successfully a RRC Connection Setup
Complete event is transmitted from the UE and now the UE has gone from idle to a
RAB called SRB. Shortly after that a Radio Bearer Setup is sent with information
about the available RABs to the UE and when it has connected to the requested RAB(s)
it will send a Radio Bearer Setup Complete.
2.1.2
The UERC is a code formalism that tells which RAB(s) a UE is currently connected
to. Since a UE can have several RABs at the same time it helps to have a system that
can identify all RABs as well as their combinations. This way, we reduce the number of
connections to just one UERC instead of (possibly) two RABs.
Table 7.1 in Appendix A contains the full list of available RAB combinations, their ID
and a short description.
2.2
The communication in the WCDMA network can be monitored by examining the event
logs created by the RNC and OSS. This binary data is called General Performance Event
Handling (GPEH) and for this project they are the main source of information. These logs
contain events regarding setting up a connection between the UE and RAN, measurement
reports as well as changing RAB, and a lot more. This report will focus on events needed
to trace which UERCs that have been used during a session.
In order to analyse the GPEH data it needs to be decoded from its binary format. For
this purpose a Perl parser will be used that decodes the data into a ASN.1 1 formatted
textfile. Listing 2.1 shows a sample output from one of those textfiles showing the format
for an event called internal-ue-move.
Abstract Syntax Notation number One (ASN.1) defines a formalism for representing, encoding and
decoding data regardless of language implementation and application. It is a standard of rules and
structures commonly used for describing telecommunication protocols (Union 2014).
All events contains information regarding when it happened, in which rnc-module it was
recorded and to which ue-context it is in regard to. Different events contain different
information, e.g. a internal-channel-switching event would also contain information
regarding source and target UERC.
A complication with the GPEH data is that most events do not contain the unique id
for the UE (IMSI 2 ), which can make it hard to know for which UE the specific event
belongs to. The ue-context is only unique as long as the connection to the specific UE is
maintained. When the connection is lost another UE will be assigned the same ue-context
and it can be hard to distinguish when this has happened.
2.2.1
The GPEH logs are stored in the OSS every fifteenth minute. They often occupy several
gigabytes of space and contains millions of events. Every RNC is divided into smaller
modules and every one of these modules creates its own GPEH log file.
Getting GPEH log files can be done either by using one of Ericssons test sites to set up
a network and record data from there. Another way, the way we did, is to contact one of
Ericssons customers and ask for real data from one of their networks.
2
International Mobile Subscriber Identity is a unique identification number that belongs to the SIMcard.
2.3
The UERTT is a relative new feature in the WCDMA network that enables an instantaneous analysis of the network. The real-time trace is built on the same events that GPEH
is comprised of but instead of storing the data locally the OSS streams the data directly
through an outgoing port. This is possible with the use of Google Protocol Buffers (GPB)
that uses significantly less space than GPEH data.
GPB is a flexible and efficient way of serializing structured data. It is similar to XML
but smaller and faster. GPB is language and platform neutral with a primary target on
communication protocols, data storage and more (Google 2012).
Chapter 3
Theory
The analysis part is built on the null hypothesis, which is an essential part of statistical
inference, and whether it is retained or rejected determines if the two data sets have
any statistical difference. The theory section also includes thesis specific topics, such as
UERC sequences and normalized sequences, to more general statistics and data mining
methods.
3.1
Null Hypothesis
3.2
UERC Sequences
The UERC sequences introduce a new way of analysing a mobile network and since
there has been no known previous work in the area it is important that the concept is
well-defined.
3.2.1
Introduction
Whether an UE is using voice or data in the mobile network it will always have to be
connected to at least one RAB throughout the whole session. As mentioned in section
2.1.1, there are different RABs for voice (CS) and data (PS), but to make it easier the
term UERC is introduced to include the combined RABs and assign them an unique
UERC id. An UERC sequence is basically a linked list of integers that describes in which
order the UE has switched between the different UERC ids.
3.2.2
Definition
3.2.3
Normalized Sequences
A normalized sequence is not a scientific established term but rather something specific
to this thesis. The UERC sequences generated from the data will contain some UERCs
that only occurs a very limited amount of times and not enough to be analysed and drawn
conclusions from regarding switching patterns and holding times. This raised the need for
a normalization of the sequences that disregards the rare UERCs and only considers the
frequent ones. The normalization is based on how frequent a pattern is in all sequences.
3.2.4
Limitations
In this thesis only sequences that start and ends in the specified RNC will be considered,
e.g. UEs coming from a soft or hard handovers from other RNCs will not be included in
the analysis.
3.3
Sequence Analysis
Analysing and comparing UERC sequences towards either rejecting or accepting the null
hypothesis requires several different methods. A mindmap over the different methods
used in the sequence analysis and there key components can be seen in Figure 3.1. There
are two groups of methods; the ones operating in discrete-time and the ones operating in
continuous time. Each group have two methods with n-gram and discrete-time Markov
Process (DTMP) belonging to discrete-time and Bayesian Estimation Supersedes the
T-test (BEST) and continuous-time Markov Process (CTMP) belonging to continuous
time.
ROPE
n-gram
BEST
Discrete Time
Sequence
Analysis
Continuous Time
DTMP
P-matrix
Effect
Size
CTMP
Q-matrix
Chisquare
3.4
n-gram
A first approach of finding information from a sequence is to look at the frequency of the
items and in which order they usually appear. For this purpose the n-gram method is
particularly useful. Christopher D. Manning and Hinrich Schutze (Manning & Schutze
1999) provided a good introduction to this topic that created the foundation for this
section.
3.4.1
Introduction
10
3.4.2
Definition
(3.4.1)
which basically says that the history (previous items) is used as a classification to predict
the next item. The estimation follows the Markov assumption, that the next item is only
affected by the (n 1) previous items. In other words, an n-gram is equivalent to an
Markov Process of order (n 1). Table 3.1 illustrates the most common n-grams, what
Table 3.1: Attributes for n-grams for n=1,2,3.
Referred to as:
Order of DTMP:
1-gram sequence
Unigram
0
2-gram sequence
Bigram
1
3-gram sequence
Trigram
2
they are referred to as and their corresponding order of DTMP. The reason why higher
order are not so common is because they quickly grow in size. If a sequence consists of
N items, the parameter space is equal to (N n N ) for a n-gram of order n (Manning &
Schutze 1999).
3.5
This section is introduced to show how a DTMP of order 1 works and its applications.
A discrete Markov chain is a mathematical system that experience transitions between
different states in a state space. It is an memory less process where the next step depends
only on the current state and not the sequence of events that it passed (only holds for
order 1). Kai Lai Chung (Chung 2000), Jacques Janssen and Raimondo Manca (Janssen
& Manca 2006) provided a good introduction to this topic that created the foundation
for this section.
3.5.1
Definition
Introduce a system S with m possible states with space states I = [1, 2, 3, ...., m]. The
system S acts randomly in the discrete-time T = [t0 , t1 , t2 , t3 , ...., tm ]. Define the state Xtn
of system S at time tn T and assume that in small interval of time there is no changes.
The random sequence Xtn , tn T is a Markov process when for all xt0 , xt1 , xt2 , ...., xtn S
P [Xtn = xtn |Xt0 = xt0 , ..., Xtn1 = xtn1 ] = P [Xtn = xtn |Xtn1 = xtn1 ].
(3.5.1)
(3.5.2)
3.5.2
i, j I,
tn T.
(3.5.3)
Define vector p = (p1 , p2 , ..., pm ) as the initial condition probabilities, with pi = P [X0 =
i], i I. The following conditions should be satisfied by the probabilities.
(
pi 0, i I,
P
i I
iI pi = 1,
(3.5.4)
if i = j
Otherwise
(3.5.5)
A state i I is absorbing if
pij (n) = 0,
pii (n) = 1,
j 6= i, n T , and
n T .
The system can not leave when it is in an absorbing state. And when a DTMP has at
least one absorbing state it is called absorbent which means that it can migrate from any
non-absorbing state to an absorbing. With at least one absorbing state the process can
give us the following information.
The discrete-time it takes before it finds its way to an absorbing state.
limt pij (t), i, j I
3.5.3
Example
Suppose a system S with four states where two of them are absorbing (see Figure 3.2).
Assume that S is homogeneous in time with the following probability matrix.
P21 (t)
P33 (t)
P23 (t)
P14 (t)
P44 (t)
P12 (t)
1 4a
a
0 3a
a
1 3a 2a 0
,0 < a < 1
P =
0
0
1 0
4
0
0
0 1
12
(3.5.6)
The migration matrix between non-absorbing states is given by A from the P matrix in
equation 3.5.6
1 4a
a
A=
(3.5.7)
a
1 3a
3.6
Non-parametric Chi-Square
The problem of comparing two transition matrices from different Markov processes can
be seen as a comparison of two proportions. In this case the proportions are each element
in the transition matrices, which describes the proportion of transitions from one state
to another given the total amount of switches from that specific state. One way of doing
this is by using the non-parametric chi-square test.
3.6.1
Introduction
3.6.2
Definition
For two proportions, the chi-square test estimates the pooled proportion
P =
n1 + n2
N1 + N2
(3.6.1)
where n1 and n2 are number of times this particular event happened and N1 and N2 are
total number of events. The expected outcome ne is then calculated as
ne1 = N1 P
ne2 = N2 P.
(3.6.2)
O = [n1 N1 n1 n2 N2 n2 ]
E = [ne1 N1 ne1 ne2 N2 ne2 ]
(3.6.3)
where O are the observed values and E are the expected ones. The chi-square value is
then calculated as
4
X
(O[i] E[i])2
2
(3.6.4)
=
E[i]
i=1
and it is a measurement of how big the difference is between the two proportions. The
obtained 2 is compared to a chi-square distribution table in order to get the p-value
which is used to test the null hypothesis. (McClean 2000)
13
3.7
This section is introduced to show how a CTMP works and its applications. A continuous
Markov chain is a mathematical system that takes values in from a finite set of data. It
analyses the time spent in each state before moving on to the next one and can only handle
non-negative real values. Same as the discrete-time process of order 1 it only depends
on the current state and not historical behavior. Kai Lai Chung (Chung 2000), Jacques
Janssen and Raimondo Manca (Janssen & Manca 2006) provided a good introduction to
this topic that created the foundation for this section.
3.7.1
Introduction
i I.
(3.7.1)
3.7.2
t, s R.
(3.7.2)
The Q-matrix
The probability of jumping in small intervals of time h > 0 can be defined by the function
f (h) = P (X(t + h) = j|X(t) = i), i, j I
(3.7.3)
O(h)
h
(3.7.5)
= 0 and since for all i 6= j,
(3.7.4)
(3.7.6)
ij h + O(h).
(3.7.7)
i6=j
i6=j
ij which
(3.7.8)
The Q-matrix is now defined as Q = [ij ]M M , for all i, j I, and it can be shown that
f 0 (t) = ii and the matrix has the following properties:
ii 0, i I.
ij 0, i, j I
P
i I
jI ij = 0,
ij is known as the instantaneous rate from i to j with the assumption that it is constant
through time.
3.7.3
Assume a random time V and if the system have not jumped after that time the remaining
time will have the same waiting time distribution as from the beginning. The random
time V satisfies the memory less Markov property,
P [V > t + h|V > t, X(t) = i] = P [V > t + h|X(t) = i] = P [V > h],
i, j I, t 0,
(3.7.9)
with an exponential waiting time distribution P [V > t] = et and instantaneous transition rate , it can be shown that
e(t+h)
= eh = P [V > h].
P [V > t + h|V > t] =
t
e
(3.7.10)
(3.7.11)
X
Qk tk
k=0
15
k!
(3.7.12)
1
0
c1
c2
c1 c2 cd
2
Qd =
..
.
.
.
0
d
cd
X
(Qt)k
k!
k=0
3.7.4
=C
X
D n tn
k=0
n!
C 1 = CeDt C 1 .
(3.7.13)
(3.7.14)
Example
Suppose that we have a two state matrix Q with the following properties,
1 1
Q=
2 2
(3.7.15)
pi = 1 = p11 + p12 ,
i = 1, 2 I
(3.7.17)
iI
2
1
p0 (t)
1 + 2 1 + 2 11
(3.7.18)
16
(3.7.19)
With the same solutions for the other equations we obtain the following result for p11 (t),
p12 (t), p21 (t), p22 (t),
p11 (t) =
2
1 +2
1
e(1 +2 )t
1 +2
p12 (t) =
1
1 +2
1
e(1 +2 )t
1 +2
p21 (t) =
2
1 +2
1
e(1 +2 )t
1 +2
p22 (t) =
1
1 +2
1
e(1 +2 )t
1 +2
(3.7.20)
2
1 +2
2
1 +2
1
1 +2
1
1 +2
(3.7.21)
3.8
2
1 +2
2
1 +2
+
+
1
1 +2
1
1 +2
2 +1
=
1 +2
2 +1
1 +2
1
=
.
1
(3.7.22)
The Bayesian Estimation Supersedes the T-test (BEST) algorithm was chosen because
it is very flexible and have a few other advantages compared to simpler methods such as
the t-test. This section is introduced to show how Bayesian data analysis works and its
applications(Kruschke 2013).
3.8.1
Introduction
3.8.2
Variable distributions
As mentioned earlier real world data often contains outliers. These outliers can have a
significant meaning to the result and should be included in the distributions. A normal distribution would have difficulty handling these where the BEST-algorithm uses
t-distribution with variable width. If the data contains outliers tv 0 the distribution
17
will have taller tails see figure 3.3. With compact data without outliers the t-distribution
will look more like an normal distribution tv (see Figure 3.3).
3.8.3
The mean and standard deviation of the data reveals information of change between two
groups of data. The BEST-algorithm uses the most fitting distribution which will affect
the certainty of both mean and standard deviation. Depending on what distribution that
have been chosen everything is calculated with an 95 % confidence. Every histogram
is marked with an 95 % highest density interval (HDI), which shows where the bulk of
most credible values falls. The definition tells us that every value inside the bulk have an
higher probability density than any value outside the HDI, and the total mass of values
inside is 95% of the distribution.
To better understand the difference two cases will be presented, one with moderate sample
size and one with an small group of sample size. The first case consider two groups of
people N1 = 47 and N2 = 42.
Figure 3.4c shows that 95% HDI falls above zero where 98,8 % of the credible values are
greater than zero. This means that they are credibly different and that it is an significant
change in means between the groups. The second case considered a case of small sample
size with N1 = 8 and N2 = 8 and even if the means are different of the two groups, the
posterior distribution reveals great uncertainty in the estimation. This because zero falls
within the 95% HDI and it cant be sure that an actual change have accoured between
the two data sets see figure 3.5c. The standard deviation is calculated in the same way
and presented with an 95% HDI.
3.8.4
Effect size
The effect size is a measure of the magnitude of how many standard deviations that
separates two groups of data. It can be calculated with the difference in mean between
two groups (1 2 ) divided by the standard deviation () of the population from which
18
(a) Distribution of .
(b) Distribution of .
19
(a) Distribution of .
(b) Distribution of .
20
it is sampled. There is now at least three different calculations with various advantages.
They are referred to as Cohens d, Glasss and Hedges g,
Cohen0 s d =
1 2
1 2
= qP
P
(X1 1 )2 + (X2 2 )2
SDpooled
(3.8.1)
n1 +n2 2
Glass =
Hedges g =
1 2
2
(3.8.2)
1 2
1 2
=q
.
(3.8.3)
n1 +n2 2
The only difference in the equations is the method for calculating the standard deviation.
If they are roughly the same it is reasonable to expect that they are estimating a common population standard deviation. This would favor Cohens d equation and pooling
the standard deviation would be appropriate. If on the other-hand the standard deviation differ it wouldnt be appropriate to pool because of the violation of homogeneity in
variance. Then Glasss function could be used where Glass argued that the standard
deviation of the control group is untainted by the effects of the treatment and will therefore more closely reflect the population. This is in direct relationship with the sample
size where more samples would resemble the population more likely. The last approach,
Hedges g formula is recommended if the groups have different sample size and weighted
standard deviation is appropriate. (Ellis 2010)
While it could be appropriate to weight the standard deviation there is a perspective
that the effect size is merely a re-description of the posterior distribution. Many different
data sets could have generated the posterior parameter distribution and because of that
the data should not be used in re-describing the posterior. (Kruschke 2013)
2
q1
12 +22
2
(3.8.4)
This form equation 3.8.4 is merely a copy of Hedges g formula without the weighting and
it doesnt change the sign or the magnitude in any greater extent. The distribution of
the effect size that is greater or less than zero remains unaffected (Kruschke 2013).
One rule of thumb for interpreting the effect size was proposed by Cohen (1988) where
he said that a small effect size is above 0.2, medium 0.5 and a large above 0.8.
However, Cohen did also warn that the rule can differ between fields of study and that the
user should define their own thresholds depending on purpose and area. (Cohen 1988)
3.8.5
The posterior distribution can be used to make a decision about the credibility of the test.
The way to do this is with the 95 % HDI, the range of the ROPE, and the value zero. If
the 95 % HDI is fine enough and falls entirely within the ROPE it means that with 95
21
% credibility the values are practically equivalent to each other and the null hypothesis
can be accepted (Kruschke 2013).
3.8.6
The ability to detect effects has a direct correlation with the number of samples. In
many cases it is fair to say that the success or failure of a project to reach an statistical
significant result hinges on it sample size. Let say if simulations was running to detect
an particular effect size and the simulation is based on to to few samples, statistical
significance could occur randomly. E.g. if you expect the difference between two groups
to be equivalent to an effect size of d = 0.20, and you wish to have at least an 85% chance
of detecting this difference, you will need at least 900 participants in your sample. As this
effect size relates to differences between groups, the implication is that you will need a
minimum of 450 participants in each group. If you wish to further reduce the possibility
of overlooking real effects by increasing power to 0.95%, you will need a minimum of
1302 participants or 651 in each group see table 3.2. So it is usefull to have an idea of
the sensitivity of the research design where the risk only can be reduced by increasing
samples. Minimum sample sizes for detecting a statistically significant difference between
two groups is presented in table 3.2.(Ellis 2010)
22
Table 3.2: Minimum sample sizes for different effect sizes and power levels. Note: The
sample sizes reported for d are combined N1 + N2 (Ellis 2010).
d
0.10
0.20
0.30
0.40
0.50
Power
0.75 0.80 0.85 0.90 0.95
2779 3142 3594 4205 5200
696 787 900 1053 1302
311 351 401 469 580
176 199 227 265 327
113 128 146 171 210
23
Chapter 4
Work Flow, Technology and Data
This chapter will explain the work flow, the technologies used and the data behind our
results and conclusions. It will also give the reader a better understanding of the results
and connecting it with the theory. The work flow describes the process step by step
and gives a detailed description of what every step involved. The technology section will
give a short introduction to the technologies used in this thesis and state why they were
chosen. The last section of this chapter will give a description of the data which the
results are based upon.
4.1
Work Flow
Discrete-time analysis
Continuous-time analysis
The first step will be to go through the data and find which events that can be useful.
When the desired events have been identified the ASN.1 parser will extract them from
the raw data and create call sequences. The second step is to apply our theory to the
gathered data and try to get as much information out of it as possible.
4.1.1
A major part of the thesis have been trying to understand the WCDMA standard, collect
the right type of data and merge to consistent sequences of events. The WCDMA standard is complex and has increased in complexity to cater for more advanced use-cases
over time when more and more advanced end user terminals has been introduced into
the market. For Ericsson to maintain a high performance in the network new functions
have been developed and released in software updates. This increased complexity of the
network has made the collection of call sequences very difficult and for some parts almost
24
impossible.
Knowledge Acquisition:
4.1.2
Before the data can be used it needs to be pre-processed and structured in a way to make
it compatible with our analysis-tools. The data is originally in a binary format but there
exist a decoder that turns it into ASN.1 formatted file. As there are not any suitable
open source decoders available for ASN.1 we need to construct our own. The goal of the
pre-processing is to get two comma seperated files (csv) where
1. Each line represent a sequence of UERC ids that all belongs to the same UE and
was recorded during the same session
2. Each line is a switch with source and target UERC and a timestamp
The holding time is then calculated with the timestamp for entering and leaving the
UERC. The csv format is very practical when working with large data sets and many
software supports the format.
4.1.3
Discrete-time analysis
The discrete-time analysis only considers switches between UERCs, meaning that the
time it stayed in the UERC is not taken into account. The sequence analysis for discretetime follows the following steps:
1.
2.
3.
4.
5.
6.
7.
The Markov chains are drawn with the R library igraph and the n-gram program from
the library tau.
4.1.4
Continuous-time analysis
In continuous-time it is the holding time that is analysed. This would be the actual
time each UE spends in every UERC before moving on to a new one. The holding
time is both deterministically specified in the system and by the behaviour of the user.
25
These holding times have been analysed with the continuous-time Markov Process and
the BEST-algorithm.
Material have been collected in databases with high-ranking publications, the databases
which provided the BEST information was:
IEEE Xplore Digital Library
ACM Digital Library
The programing environment used to develop the CTMP is MATLAB. The sequence
analysis for continues time follows the steps
1.
2.
3.
4.
The BEST-algorithm on the other hand was developed in R and came as a ready to use
product in an R library called BEST. This specific library needed the following packages
to be installed:
JAGS
rjags
To be able to define the range of the ROPE for the effect size, data from part one was
shopped up in batches of 2600 samples. The number of samples was chosen in relation
to the simulation time and maximum chance to detect a statistical significant difference
(see Table 3.2). These batches of data was compared to each other and because part one
did not include any known changes the corresponding magnitude of the effect size could
be treated as no change at all.
4.2
Technologies
This project requires the use of several different programs and technologies on both
the Linux and Windows environment. Some of them have been developed internally at
Ericsson but the ones most used are either an open-source or a commercial software. For
the data gathering and pre-processing a Linux lab computer with 8 processors will be used
for its superior computational power. The data used in this project was small enough to
analyse on a standard PC, otherwise it would have been more efficient to do everything
on the LAB computer. The primary technologies are the programming language Python,
the commercial high-level language and environment MATLAB and the free high-level
language and environment R.
4.2.1
Python
Python is a powerful and versatile programming language that can be used for both
object oriented programming as well as functional programming. It has a rich standard
library that supports many of the most common programming tasks such as connection
to web servers, searching text and more. One of its main features is that its easy to read
26
because Python does not end each line by a semicolon or use brackets to define code
sections, instead, Python uses a new line and indents. (Holden 2014)
Python is an excellent scripting language and has large support for searching in text.
That, and the ability to mix object oriented programming with functional, makes Python
the perfect language to write the ASN.1 parser in.
4.2.2
MATLAB
MATLAB is an abbreviation of MATrix LABoratory and as the name suggests its particulary good at manipulating matrices. It is a high-level language developed by Mathworks
that requires a license to use. By offering it for free to Universities and similar institutions it has become widely used as an research and learning tool. Its primary function
is numerical calculations but there exist several toolboxes that allows for more specific
applications, such as the Statistal Toobox which includes many pre-written methods from
data mining and machine learning. (The MathWorks 2014)
At Ericsson, MATLAB is a common tool for analysing the GPEH data, mostly for drawing
graphs based on certain KPIs. With an extensive toolbox for Statistics and the close
connection to GPEH data it was natural that MATLAB would play a part on this project.
4.2.3
4.3
The offline test data is from a major sport event in Sweden. The data is often parsed from
GPEH to a MATLAB file. The benefit of collecting data from sport events is because
users tend to stay in one place for a long time. This means that a very high amount of
users are connected to a few cells for a longer time and its easier to follow their mobile
traffic. This is very critical because users that move around constantly change RBSs
and this would increase the difficulty to keep track of the right user. There is also a
benefit in having a high amount of users in one concentrated place, live stress tests of
the network can be performed. This is important to analyse and be able to understand
how the network could perform in peak hours. As most games have period breaks the
interesting part of the the analysis happens during these breaks where most users start
to use their mobile phones. The network operator and Ericsson usually have an agenda
for every recording and changes settings in the network after half-time. This is done to
evaluate and tweak the system for better performance.
1
A GNU Project means that its free software, available under Free Software Foundations GNU
General Public License.
27
The two data sets used for comparison in this report actually comes from the same sport
event where some settings were changed in a period break. The data will be divided into
two parts; where part 1 will consist of the data before the change and part 2 after the
change. Methods from the sequence analysis part of the theory will be applied to the two
data sets in order to compare them with hope of identifying the changed settings.
28
Chapter 5
Results
The results chapter will cover the findings regarding the investigation of the possible use
of UERTT, the identification of the events needed to create UERC sequences and an
explanation of how the ASN.1 parser that extracts the events work. The chapter will
also show the results from analysing two different GPEH-files in order to find changes
regarding UERC switches.
5.1
The first task was to determine whether or not UERTT could be used to analyse the
network with data mining and machine learning methods. The UERTT works by tracking
a certain UE in the network on the OSS and then stream the events connected to that
UE to a port that can be listened to on another machine.
5.2
The first step in creating UERC sequences was to find which events contained relevant
information and to assign each event to a specific UE. This proved a lot harder than first
imagined. Some events are difficult to match to a certain UE since they do not carry
an IMSI number, they only contain the fields rnc-module-id and ue-context to identify
which UE it should belong to.
Figure 5.1 shows the logic behind an RNC and how it keeps track of all the UEs and
their events. Every RNC contains several modules which in their turn handles several
different UEs. Every UE have a corresponding list of events which, among other, tells
RNC
Module 1
Module 2
..
.
Module m
UE 1
UE 2
..
.
UE n
Figure 5.1: The logic behind a RNC, its modules and UEs, and their assiciated events.
29
in which order the UE has moved between different UERCs. The events that contain an
IMSI number can easily be matched to a specific UE but the ones that do not contain
it have to matched by their rnc-module-id and ue-context. At every instant of time, the
ue-context is unique for every module but it is however reused which means that when
a connection is dropped for one UE a new one will take its place and have the same
ue-context as the previous one. This problem was solved by finding start and end events
which determines when a new call sequence starts and ends respectively.
Table 5.1: The events with id and description that is needed to create UERC sequences.
Event id Name
415
internal-rab-establishment
416
internal-rab-release
387
internal-channel-switching
19
rrc-connection-release
438
internal-system-release
Description
Establishing a connection to a RAB
Releasing a connection to a RAB
The RAB connection has been reconfigured
(the UE switched to another UERC)
The connection with the RNC has been released (UE has gone to idle)
One or several RABs, or a standalone RNC,
have been released but the release was not a
normal one
After consulting several employees at Ericsson and by analysing the events by hand five
different events could be identified needed to create the UERC sequences that were unique
for every UE. The identified events can be seen in Table 5.1 with their id, name and a
short description.
A major part of the identification of the desired events was by examining which fields
they contained. As mention in section 2.2, there are some fields in the events that are
required in all events but most events also have its unique fields.
Table 5.2: Relevant fields in the events.
Field name
Description
timestamp
Date and time for when the event took place
event-id
Unique id for every event
rnc-module-id Id for the module that handled the event
ue-context
Id for the UE assigned by the module
cell-id
The cell (RBS) that handles the UE
source
UERC source of a channel switch
target
UERC target of a channel switch
imsi
Unique id for the UE
exception
Integer that tells if the event was successful or not
Table 5.2 displays the relevant field names. The first five fields; timestamp, event-id, rncmodule-id, ue-context-id and cell-id are all required fields for an event and also needed
to create UERC sequences. The following fields in the table only accompany some of the
events, e.g. the fields source, target and exception are related to events regarding UERC
30
so they are not present in the event rrc-connection-release and the imsi number is only
present in the internal-channel-switching event.
5.2.1
ASN.1 Parser
Based on the identified events required to create UERC sequences the ASN.1 parser could
be developed to extract the events and their relevant fields from the large GPEH files.
The parser was written in Python with ASN.1 formatted data as input (GPEH-files and
UERTT from port) and two csv-files as output.
Listing 5.1: Pseudo code for the ASN.1 parser.
rnc = RadioNetworkController ( )
f o r l i n e in input do :
i f l i n e has e v e n t i d do :
event id = line . getInt ()
else :
continue
i f e v e n t i d not in wantedEventList do :
continue
module id = l i n e . nextLine ( ) . g e t I n t ( )
timestamp = l i n e . n e x t L i n e ( ) . g e t I n t ( )
c e l l i d = l i n e . nextLine ( ) . getInt ()
i f l i n e . n e x t L i n e ( ) has s o u r c e do :
soource = l i n e . getInt ()
target = l i n e . nextLine ( ) . getInt ()
imsi = l i n e . nextLine ( ) . getInt ()
exception = l i n e . nextLine ( ) . getInt ( )
e v e n t = SwitchEvent ( e v e n t i d , timestamp , c e l l i d )
event . setSwitch ( source , t a r g e t )
event . setException ( exception )
event . s e t I m s i ( imsi )
else :
e v e n t = Event ( e v e n t i d , timestamp , c e l l i d )
r n c . addEvent ( module id , u e i d , e v e n t )
end
rnc . t o F i l e ( )
Listing 5.1 shows the main traits of the ASN.1 parser in pseudo code. Basically, it reads
every line separately in order and if it finds an event with an id that matches any of the
events in Table 5.1 the parser will continue to read the required fields and add that event
to the RadioNetworkController object.
In Figure 5.2 the work flow and object classes of the ASN.1 parser can be seen. The
work flow demonstrates how the parser is built from a higher perspective, with input
and output specified. The object classes are the implemented Python classes and the
figure also shows their dependencies towards each other. The parser is inspired by the
RNC logic presented in Figure 5.1 when it handles the event. The main script (shown in
the parser flow under ASN1 Parser) implements the pseudo code in Listing 5.1 and its
primary function is to add events to the rnc object which in turn will assign them to the
relevant module and then session (unique UE sequence).
31
5.3
For both part of the data, n-grams were calculated for n=1,2 and 3, e.i. the unigrams,
bigrams and trigrams respectively. The unigrams is basically just a list with how many
times a certain UERC is visited while the bigrams tells how many times a switch between
two UERCs has happened. The trigrams adds one more dimension and here it is possible
to see which UERC is most likely to be next after a certain switch. Or the other way,
given a certain state which switch is most likely to happen.
5.3.1
Unigrams
The unigram algorithm will count every occurrence of a specific UERC in all sequences.
Table 5.3 shows the results for both part 1 and part 2 of the data sets. The percentage
show the proportion of a specific UERC occurrence compared to total number of counts.
Although, the percentages differ some between the data sets the order of most common
UERC is the same. It appears that there are three dominant states; UERC 4, 21 and
25 with an occurrence of over 93 % and 91 % respectively. Comparing the individual
percentages for these three dominants in the two data sets shows that they are almost
the same for UERC 25 while UERC 4 occurs less frequently and 21 more frequently in
the second part.
32
UERC
4
21
25
1
0
53
9
123
15
2
438
62
113
69
52
67
456
19
10
Part
Counts
88195
51078
45473
5437
4057
1511
932
668
400
252
115
53
14
4
3
N/A
2
N/A
1
Part
% Counts
44.5
61923
25.77
40821
22.94
34252
2.74
5119
2.05
4402
0.76
1281
0.47
892
0.34
670
0.2
481
0.13
169
0.06
133
0.03
46
0.01
3
0
6
0
4
N/A
3
0
2
N/A
1
0
N/A
2
%
41.22
27.18
22.8
3.41
2.93
0.85
0.59
0.45
0.32
0.11
0.09
0.03
0
0
0
0
0
0
N/A
It can also be seen in Table 5.3 that some UERC only shows up in one part and not the
other. There are also several that occur so few times that their occurrence percentage is
rounded to zero (less than 0.005 % occurrence).
5.3.2
Bigrams
The tables for the bigrams are a lot longer than for the unigrams so they have been placed
in the appendix and can be seen in Appendix B where the bigrams for part 1 is in Table
7.3 and part 2 in Table 7.3. Due to the large number of bigrams a visual analysis is hard.
As mentioned in Section 3.4, a bigram is equivalent to a Markov process of order 1. This
will be used to draw Markov chains for both part 1 and part 2 as well as comparing their
transition matrices. The most frequent switches will also be analysed by drawing their
Markov chains based on the so called normalized sequences.
Markov Chains
The Markov chains are based on the bigrams and represents switches between the different
UERCs in the network. Figure 5.3 shows the Markov chain for the first part of the data
set. The background grouping tells if the UERC is for voice (CS, sky blue) or data
(PS, light red), or both. The arrows between the states represents occurred switches and
arrows that go back to the same state are non-fatal errors (e.g. a UE tried to access
UERC 25 but it was full so it had to stay on its current UERC). The green state (SRB)
represent UERC 1 which is the starting state of a sequences and the orange state (idle)
33
represent UERC 0, the end state. All other UERCs are represented by a blue colored
state. The red states does not represent an actual UERC id but rather the event id
associated with the errors:
438: internal-system-release - indicates an error that was so severe the session had to
be terminated
456: internal-call-setup-fail - indicates
failed RNC
Markovachain
part 1connection setup
SRB
UERC
Idle
Error
CS
PS
10
52
113
15
53
21
123
438
1
25
62
456
69
SRB
UERC
Idle
Error
CS
PS
52
15
438
4
25
19
53
67
69
113
9
21
123
62
456
to jump to a specific UERC given you are in another. Figure 5.5 shows the transition
matrices for both part 1 and part 2. It also displays the results from the chi-square
analysis, which transitions that accepted or rejected the null hypothesis as well as the
chi-square value for every transition. The matrice colors are normalized meaning that
the colors depend on the value, a higher value will give a darker color and a value close
to zero will be almost invisible. This has been done to magnify the results and important
elements in the matrices.
Figure 5.6a shows if the null hypothesis was rejected or accepted for each element. The
test was performed using the Chi-square method on each element from the transition
matrices. A dark red color means that the test was rejected, a light red means it passed
and a no color (white) means that the test could not be conducted (one or both had a
zero in the element).
The last matrix, in Figure 5.6b displays the Chi-square value, a measurement of the
change. Same as for the other matrices, the color is normalized meaning that it will
highlight large values and hide small ones, i.e. highlighting big changes and hiding small.
Normalized Sequences - Bigrams
These normalized sequences are based on the bigrams in Table 7.2 and Table 7.3 from
Appendix B. They represent the most common UERC switches that together make up
more than 90 % of the total number of switches. There are five frequent UERC ids; 0, 1,
4, 21 and 25. The normalized sequences were used to draw the Markov chains in Figure
5.7.
1.0
1.0
25
0.47
0.48
0.95 0.52
0.05
0.01
21
1.0
0.08
0.02
25 UEL/HS
25
0.76 0.50
SRB
0.16
21
1.0
FACH
21
URA
IDLE
5.3.3
Trigrams
The tables for the trigrams can be seen in Appendix C. The trigrams for part 1 is in
Table 7.4 and part 2 in Table 7.5. The trigram tables are very long and contain many
sequences that only occurs a couple of times, much too few to be able to compare and
draw any conclusions from. For the most common sequences there are however a lot of
data that can be analysed.
36
1.0
1.0
25
0.42
0.56
25
0.15
0.02
4
1.0
0.58
0.62
0.36
0.02
21
0.59
0.41
25
0.29
0.60
0.16
0.38
0.84
1.0
25
21
0.42
0.58
1.0
5.4
BEST
This section will primarily focus on UERC 4, 21 and 25 due to the call sequence analysis
in table 5.3. The table shows that more than 90% of the visits in both part 1 and 2 went
through these states and that the others have to few samples to maintain an high power
level see table 3.2. UERC 1 have enough samples but contain to many deterministically
set values and UERC 0 is an end state and will therefor not be evaluated. If the effect size
can detect major changes a deeper analysis will be performed for that UERC. Figures
presented in this section are based on the BEST algorithm and will show mean (),
standard deviation () and effect size.
37
5.4.1
To measure the range of ROPE for the effect size in UERC 4, 21 and 25 data in part one
was chopped up in batches of 2600 samples. Five simulations was done for every UERC
(see Table 5.4).
Table 5.4: Range of effect size.
4
-0.063
-0.075
-0.068
-0.071
-0.069
0.054
0.046
0.055
0.050
0.049
UERC range
21
-0.016 0.146
-0.023 0.132
-0.010 0.162
-0.020 0.147
-0.009 0.149
25
-0.023 0.139
-0.042 0.119
-0.032 0.142
-0.022 0.136
-0.030 0.143
The result in Table 5.4 shows that an range for the effect size between [0.20.2] would fit
as limits for every UERC. This range is now used to accept or reject the null hypothesis.
5.4.2
The BEST algorithm didnt detect any major changes for UERC 4 and 21 which Figure
5.9 and 5.10 shows. UERC 21 almost have twice as big effect size compared to state 4
and the distribution is wider. The range for the 95 % distribution is 0.0421 compared to
UERC 4 that have 0.0248. Both simulations had an very high amount of samples, where
UERC 4 had n1 = 80943 and n2 = 57092 and 21 had n1 = 43255 respectively n2 = 34225
samples.
5.4.3
UERC 25 - (EUL/HS)
The distribution for the effect size is far away from the ROPE limits and the null hypothesis is rejected. The magnitude of change is approximately 1.4 standard deviations
shown in Figure 5.11.
There is an significant change in mean for UERC 25 with an increase of 2.08, which
is equivalent to 33 %. The data also appears to be more spread out in part two, the
distribution is wider and more uncertain which can be seen in Figure 5.12b and 5.13a.
The simulation is based on n1 = 43271 and n2 = 30234.
38
5.5
This section will focus on UERC 25 due to the findings in 5.4.3. The continuous time
Markov process have an exponential distribution and Figure 7.1 in appendix D show all
holding times for UERC 25 plotted against an exponential distribution. Most of the data
can be approximated except the outliers, 96 % of the data is covered.
The components used to construct the Q matrix was mean of holding time i for each
state and the P matrix from discreet time see Figure 5.5a and 5.5b. The transition rates
from the P matrix was used to weight i for each state. Equation 5.5.1 is an definition
39
1
2 p21
Q = i pij =
..
1 p12
2
..
.
m pm1 m pm,2
1 p1n
2 p2n
..
...
.
m
(5.5.1)
The properties for the Q matrix holds (see theory part 3.7.2), every row sums up to zero,
diagonal element are less than zero and all the other elements are above zero.
When the construction of the Q matrix worked with the theory every state was plotted
in time to see how the transition rates changed. As mentioned earlier will this thesis
make an deeper evaluation of UERC 25 and Figure 5.14 shows the probability density
from zero up to fifteen seconds. The mean holding time for part one was 25 = 9.19 and
25 = 12.98 for part two. All transition rates sums up to 100 %. For the first 15 seconds
its an higher probability to stay in the source state. The intersection between UERC 21
and 4 happens earlier in the second part compared to the first.
40
41
Chapter 6
Discussion
In short, this thesis project describes a method to create user specific sequences of events
from ASN.1 formatted data (GPEH or UERTT) in a mobile network. The thesis also
demonstrates a methodology to detect changes between date sets based on these sequences. The derived methodology to detect changes between data sets was implemented
on two GPEH-files where actual changes had been made to the network. The results
from the analysis was that two major changes could be identified in the second part of
the data. The changes was:
1. Switching from UERC 25 to UERC 21 had been enabled
2. The holding time in UERC 25 had been increased
After verifying with employees at Ericsson, it could be concluded that the findings are
consistent with actual changes to the network settings.
6.1
ASN.1 parser
The ASN.1 parser fulfill its purpose but its very time consuming to parse the large GPEHfiles. The interesting part is rather how it works; which event are relevant and which fields
need to be stored in order to create a sequence of events specific to a unique UE. Although
it works acceptable well, the methodology is not perfect and there are several holes in
the sequences after the parser which can not be explained. It appears that the problem
is within the GPEH-files rather then with the constructed parser, a theory supported by
the same findings using other methods.
6.2
Sequence Analysis
Except constructing UERC sequences, the main goal of the thesis project was to analyse the sequences and compare two different data sets in order to find any statistical
differences between them. The analysis was performed by looking at the sequences in
discrete-time and finding patterns as well as analysing the holding times of some of the
more frequent UERCs.
42
6.2.1
Patterns in the sequences was identified by the n-gram algorithm which proved to be very
suitable for this purpose. Based on the unigrams, bigrams and trigrams, Markov chains,
transition matrices and normalized sequences could be constructed and used to identify
changes in the sequences. The unigrams were only analysed by inspection mostly because
they were not believed to be able to contribute anything new.
The Markov chains which were based on the bigrams serves as an illustrative method to
see how different UERCs are connected to each other but their complexity makes it hard
to draw any conclusions from. Although there are some noticeable differences in the two
Markov chains, such as new states, the bigrams show that there are not enough data
provided for these states to draw any conclusions from.
Analysing the transition matrices with the Chi-square method based on the null hypothesis (see Figure 5.6a) reveals many switches that rejects the null hypothesis. With 99
% confidence it could say that there have been a significant change in these switches.
However, there is a difference between a significant change and a practical change. When
looking at the Chi-square values in Figure 5.6b it reveals that the changes are of very
different magnitude. The biggest changes, by far, were in the switches 25 to 4 and 25 to
21.
The normalized sequence analysis was introduced to provide a more standardized structure of the network and find the essential behaviour. It showed that just five states could
represent more than 90 % of the UERCs being used and the model clearly shows how
switches from UERC 25 to 4 are decreased in favor of switches to 21.
The same conclusion can be drawn based on the trigrams and due to the extra dimension
it seems like the enable switch from 25 to 21 does not depend on where the UE came
frome.
6.2.2
The continuous time analysis proceeded with three different states, the states that stood
out and accounted for 93.21 % of all the visits in part one and 91.2 % in part two was
UERC 4, 21 and 25. As the goal for the thesis is to find a statistical change between two
data sets, determining the threshold for an actual change is critical. This limit have to
take into account both deterministic values in the system as well as values depending on
the user.
Theoretical and practical changes can sometimes differ where an small and significant
change not always have an practical importance. The limit for an practical change was
stated throughout several simulations where the range of the ROPE ended up going from
0.2 to 0.2 for effect size. An closer look at table 5.4 shows that an range from 0.15 to
0.15 also could have worked. This thesis used the same limit for every state, but UERC
4 would have fitted in between 0.1 to 0.1. The rule of thumb mentioned earlier in the
theory chapter treated an effect size of 0.2 to 0.2 as small (Cohen 1988). And even if
that rule was in the same area, no major conclusion could be done and no efforts was
made to define a small, medium or large change, this due to the complexity to actually
measure this.
43
Both UERC 4 and 21 accepted the null hypothesis and no practical change could be
detected, 100 % of data is within the ROPE limit. UERC 25 on the other hand rejected
the null hypothesis, 100 % of the data was outside the ROPE limit. The effect size
measured an change of approximately 1.4 standard deviations which was seven times
bigger than the limit. Clearly have something happened, users stay longer in state 25.
The threshold for maximum occupation time has changed, it is higher and the magnitude
of the change is calculated with an 95 % probability to be between 2.05 - 2.12 seconds
which Figure 7.6a shows. The increased time have to be deterministic, meaning that the
system have changed settings. It is still the same amount of users in the network, people
usually stay the whole game.
As UERC 25 had an major change in the holding time, it could be interesting to see what
happens with the state transition probabilities over time. Figure 5.14 shows how users
tend to stay longer in the source state before moving on to another one. This corresponds
well with what was discovered using the BEST algorithm. Comparing the mean holding
time 25 = 9.19 and 25 = 12.98 in part one and two with the transition rates in Figure
5.14a and 5.14b shows that they both have an probability of approximately 50 % to still
being in in the source state which makes perfect sense. The BEST algorithm uses a
t-distribution, comparing mean shows an big difference where BEST calculated a mean
of 6.26 (6.24 - 6.28) compared to 9.19 and 8.34 (8.32 - 8.37) compared to 12.98 seconds.
Live data contains outliers and the transition rates calculated with the Markov process
can therefore include some errors.
44
Chapter 7
Conclusion
The major findings in this thesis are not the identified changes in the two different data
sets - they were already known. That part is a proof of concept, to demonstrate that
sequence analysis can be applied to find changes in GPEH data in general and UERC
sequences in particular. The important parts are the identification of the events needed
to create UERC sequences, the methodology to put the events together in a unique UE
sequence, and the theory for analysing these sequences. This thesis only focused on
the UERC related events but the methodology could easily be widened to include other
events.
7.1
Pre-processing
The pre-processing of the data proved to be a lot more difficult than first imagined and
in the end it had taken up half of the entire thesis work. Partly because the initial plan
was to only use UERTT data but when this proved impractical a decision was made to
change to GPEH data instead. It also proved hard to find when a new UE is assigned
the same ue-context id as a previous one. There were also a problem with missing events
(the UERC switches did not make sense, e.g. an UE going to UERC 4 but in next event
it said it is in UERC 21) in the analysed GPEH files and a quick study showed that
approximately every tenth event was missing, leading to many sequences being thrown
away due to the holes, over half of them actually. This was considered too many to be
ignored and it was decided to manually add the event presumed missing in order to get
more sequences to analyse.
The missing events could be due to an error in our method, e.g. a missed event that
would have explained the gap, it could also be specific to these GPEH-files or due to
the logging in the OSS. A lot of time was spent trying to explain the missing events by
both us as well as employees at Ericsson but no conclusions could be drawn to why there
were missing events. In order to continue with the thesis project this question were left
unanswered. Hopefully, the manually added events will not affect the analysis part since
they are discarded in the BEST and CTMP analysis and for the sequences analysis every
added switch were under a best guess assumption, e.i. based on logic and experience
from Ericssons employees.
45
7.2
Sequence Analysis
Bayesian statistics proved to be a very efficient and effective tool comparing two different
data sets with holding times. Outliers and different sample sizes was handled in an
credible way with very extensive and detailed results. For further research could it be
interesting to try different distributions, the t-distribution can handle outliers which is
very good but it can also contain negative values. The BEST algorithm that is used
in this thesis could handle both the t-distribution and log-normal distribution. As the
log-normal distribution never include negative values it could be interesting to try that
and see how the results can differ. Analysing UERC sequences with the BEST algorithm
is very time consuming, these sequences tend to blow up and become very large. The size
was between 30 - 80 000 samples and it is strongly recommended to use a cluster with
multiple cores to be able to shrink the simulation time.
The continuous time Markov process showed some interesting results of the transition
rates but they have to be treated with some skepticism. The exponential distribution
cant handle outliers which makes the results very uncertain. The Markov process cant
change the distribution because it is a part of the solution, so an recommendation for
further research would be to find another algorithm that can track transition rates in
time with different distributions.
46
Bibliography
Chung, K. L. (2000), A Course in Probability Theory, Third Edition, Academic Press,
San Diego, USA.
Cohen, J. (1988), Statistical Power Analysis for the Behaviour Science, Lawrence Erlbaum Associates, Hove and London.
Ellis, P. D. (2010), The essential guide to effect sizes : statistical power, meta-analysis,
and the interpretation of research results, University press Cambridge, Cambridge
United Kingdom.
Eri (2002), WCDMA RAN Protocols and Procedures, r1a edn.
Ericsson (2014), WCDMA Radio Access Network , http://www.ericsson.com/
ourportfolio/products/wcdma-radio-access-network. [Online; accessed 2014-0508].
Google (2012), Protocol Buffers Overview , https://developers.google.com/
protocol-buffers/docs/overview. [Online; accessed 2014-05-13].
Holden, S. (2014), BeginnersGuide/Overview , https://wiki.python.org/moin/
BeginnersGuide/Overview. [Online; accessed 2014-05-04].
Janssen, J. & Manca, R. (2006), Applied Semi-Markov Processes, Springer, New York,
USA.
Kruschke, J. K. (2013), Bayesian Estimation Supersedes the t Test, Journal of Experimental Psychology 142(2), 573603.
Manning, C. D. & Schutze, H. (1999), Foundations of statistical natural language processing, Ed. Hinrich Sch
utze. MIT press .
McClean, P. (2000), The Chi-Square Test, http://www.ndsu.edu/pubweb/~mcclean/
plsc431/mendel/mendel4.htm. [Online; accessed 2014-05-19].
Pedrini, L. (2013), What is RRC and RAB , http://www.telecomhall.com/
what-is-rrc-and-rab.aspx. [Online; accessed 2014-04-28].
R-project (n.d.), What is R? , http://www.r-project.org/index.html. [Online; accessed 2014-05-04].
47
48
Appendix A
Table 7.1: Table of RAB combinations and description.
UERC ID Description
0
Idle
1
SRB (13.6/13.6)
2
Conv. CS speech 12.2
3
Conv. CS unkn (64/64)
4
PS Interactive (RACH/FACH)
5
PS Interactive (64/64)
6
PS Interactive (64/128)
7
PS Interactive (64/384)
8
Stream. CS unkn. (57.6/57.6)
9
Conv. CS speech 12.2 + PS Interactive (0/0)
10
Conv. CS speech 12.2 + PS Interactive (64/64)
11
SRB (13.6/13.6), pre-configured
12
Conv. CS speech 12.2 , pre-configured
13
Stream. PS (16/64) + PS Interactive (8/8)
14
Conv. CS unkn (64/64) + PS Interactive (8/8)
15
PS Interactive (64/HS)
16
PS Interactive (384/HS)
17
Stream. PS (16/128) + PS Interactive (8/8)
18
PS Interactive (128/128)
19
Conv. CS speech 12.2 + PS Interactive (64/HS)
20
Conv. CS speech 12.2 + PS Interactive (384/HS)
21
PS Interactive (URA/URA)
22
Stream. PS (128/16) + Interact. PS (8/8)
23
Conv. CS speech 12.2 + Stream. PS (128/16) + Interact. PS (8/8)
24
Conv. CS speech 12.2 + Stream. PS (16/128) + PS Interactive
(8/8)
25
PS Interactive (EUL/HS)
26
2* PS Interactive (64/64)
27
Conv. CS speech 12.2 + 2* PS Interactive (64/64)
28
PS Interactive (128/64)
29
PS Interactive (384/64)
30
PS Interactive (384/128)
31
PS Interactive (128/384)
49
UERC ID
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
71
72
73
Description
PS Interactive (384/384)
Conv. CS speech 7.95
Conv. CS speech 5.9
Conv. CS speech 4.75
Conv. CS speech 12.2 + PS Interactive (64/128)
Conv. CS speech 12.2 + PS Interactive (128/64)
Conv. CS speech 12.2 + PS Interactive (64/384)
2* PS Interactive (64/128)
Conv. CS Speech (12.65, 8.85, 6.60)
Conv. CS Speech (12.65, 8.85, 6.60), preconfigured
Conv. CS Speech (12.65, 8.85, 6.60) + PS Interactive (0/0)
Conv. CS Speech (12.65, 8.85, 6.60) + PS Interactive (64/64)
Conv. CS Speech (12.65, 8.85, 6.60) + PS Interactive (64/128)
Conv. CS Speech (12.65, 8.85, 6.60) + PS Interactive (128/64)
Stream. PS (128/HS) + PS Interactive (8/HS)
Conv. CS Speech (12.65, 8.85, 6.60) + PS Interactive (64/HS)
Conv. CS Speech (12.65, 8.85, 6.60) + PS Interactive (384/HS)
Conv. CS Speech 12.2 + Stream. PS (128/HS) + PS Interactive
(8/HS)
Conv. CS Speech (12.65, 8.85, 6.60) + Stream. PS (16/128) + PS
Interactive (8/8)
Conv. CS Speech (12.65, 8.85, 6.60) + 2* PS Interactive (64/64)
PS Interactive (128/HS)
PS Interactive (16/HS)
2* PS Interactive (64/HS)
2* PS Interactive (128/HS)
2* PS Interactive (384/HS)
Conv. CS Speech 12.2 + 2* PS Interactive (64/HS)
Conv. CS Speech 12.2 + 2* PS Interactive (128/HS)
Conv. CS Speech 12.2 + 2* PS Interactive (384/HS)
Conv. CS Speech 12.2 + PS Interactive (128/HS)
Conv. CS Speech 12.2 + 3* PS Interactive (64/HS)
2* PS Interactive (EUL/HS)
Stream. PS (16/HS) + 2* PS Interactive (64/HS)
Conv. CS Speech 12.2 + Stream. PS (16/HS) + 2*PS Interactive
(64/HS)
Conv CS Speech 12.2 + Stream. PS (128/HS) + 2* PS Interactive
(64/HS)
3* PS Interactive (64/HS)
PS Interactive (16/16)
PS Interactive (16/64)
PS Interactive (64/16)
Conv. CS Speech 12.2 + 3* PS Interactive (64/64)
Stream. PS (16/HS) + PS Interactive (8/HS)
Stream. PS (32/HS) + PS Interactive (8/HS)
50
UERC ID
74
75
76
77
78
79
80
94
95
113
123
124
125
128
176
Description
3* PS Interactive (64/64)
Stream. PS (128/HS) + 2*PS Interactive (64/HS)
Conv. CS Speech 12.2 + 2* PS Interactive (128/128)
Conv. CS Speech 12.2 + Stream. PS (16/HS) + PS Interactive
(8/HS)
Conv. CS Speech 12.2 + Stream. PS (32/HS) + PS Interactive
(8/HS)
CS Conv. CS speech (5.9, 4.75)
Conv. CS speech (5.9, 4.75) + PS Interactive (0/0)
SRB (3.4/3.4)
SRB (3.4/3.4), preconfigured
Conv. CS speech 12.2 + PS Interactive (16/HS)
Conv CS Speech 12.2 + PS Interactive (EUL/HS)
Conv CS Speech 12.2 + 2* PS Interactive (EUL/HS)
Conv CS Speech 12.2 + 3* PS Interactive (EUL/HS)
3* PS Interactive (EUL/HS)
Conv. CS speech (5.9, 4.75) + PS Interactive (EUL/HS)
51
Appendix B
Table 7.2: Bigram part 1.
Sequence
Sequence
Counts
21-4
4-21
4-25
25-4
25-21
1-25
25-0
4-0
4-53
53-4
123-9
9-123
9-4
1-53
4-9
53-0
53-15
15-15
15-53
1-2
21-0
25-25
25-123
2-0
2-123
21-438
123-25
25-438
62-25
53-21
438-21
35060
30370
29453
25735
5393
4538
2758
1027
712
709
594
486
373
362
295
266
248
233
199
159
155
149
102
86
80
75
64
48
46
45
39
25.02
21.67
21.02
18.37
3.85
3.24
1.97
0.73
0.51
0.51
0.42
0.35
0.27
0.26
0.21
0.19
0.18
0.17
0.14
0.11
0.11
0.11
0.07
0.06
0.06
0.05
0.05
0.03
0.03
0.03
0.03
4-21
21-4
25-4
4-25
1-25
25-0
4-0
4-53
53-4
25-21
123-9
9-123
9-4
1-53
4-9
53-15
21-0
1-2
53-0
15-4
2-123
15-15
2-0
123-25
1-0
4-1
21-438
25-123
25-1
25-25
15-53
Counts
44799 23.92
44050 23.52
42522 22.7
40577 21.66
4693 2.51
2031 1.08
1284 0.69
1048 0.56
989 0.53
695 0.37
562
0.3
483 0.26
430 0.23
410 0.22
356 0.19
278 0.15
245 0.13
239 0.13
237 0.13
182
0.1
128 0.07
122 0.07
116 0.06
96 0.05
93 0.05
60 0.03
58 0.03
57 0.03
53 0.03
53 0.03
52 0.03
52
62-25
15-0
25-438
4-62
4-4
25-62
4-438
113-9
9-113
123-2
2-1
25-69
53-21
9-2
15-52
2-113
52-0
9-0
1-456
69-25
69-4
10-0
113-53
123-0
2-9
53-1
53-113
53-438
62-0
9-10
9-21
52
41
39
34
20
19
17
13
10
9
4
4
4
4
3
3
3
3
2
2
2
1
1
1
1
1
1
1
1
1
1
0.03
0.02
0.02
0.02
0.01
0.01
0.01
0.01
0.01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
53
25-1
9-0
25-62
15-4
4-1
15-0
4-62
4-4
123-2
53-53
25-69
15-52
52-0
4-438
53-438
69-4
9-2
1-438
1-456
123-0
123-123
2-1
69-67
9-113
113-19
113-53
113-9
123-21
15-438
19-9
36
26
25
23
23
21
21
19
7
7
6
4
4
3
3
3
3
2
2
2
2
2
2
2
1
1
1
1
1
1
0.03
0.02
0.02
0.02
0.02
0.01
0.01
0.01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2-9
53-1
53-113
53-67
67-0
67-25
67-4
69-25
9-21
9-438
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
Appendix C
Table 7.4: Trigram part 1.
Sequence
Sequence
Counts
40101 22.12
39722 21.91
26139 14.42
25066 13.82
17757 9.79
15409
8.5
2660 1.47
1948 1.07
1540 0.85
889 0.49
805 0.44
661 0.36
541
0.3
478 0.26
427 0.24
408 0.23
309 0.17
308 0.17
290 0.16
283 0.16
273 0.15
271 0.15
239 0.13
220 0.12
214 0.12
212 0.12
209 0.12
142 0.08
141 0.08
137 0.08
122 0.07
4-21-4
4-25-4
21-4-25
25-4-21
21-4-21
25-4-25
4-25-21
25-21-4
1-25-0
1-25-4
1-25-21
25-4-0
4-53-4
21-4-0
9-123-9
21-4-53
53-4-21
123-9-123
53-4-53
9-4-21
123-9-4
21-4-9
1-53-0
4-25-0
4-53-15
15-53-4
15-15-15
4-9-123
53-15-53
21-0-1
4-9-4
27184
24376
19640
15418
14275
9723
4678
4578
2515
1279
682
542
480
439
426
420
391
332
281
257
245
240
235
234
175
170
167
154
139
128
127
20.01
17.95
14.46
11.35
10.51
7.16
3.44
3.37
1.85
0.94
0.5
0.4
0.35
0.32
0.31
0.31
0.29
0.24
0.21
0.19
0.18
0.18
0.17
0.17
0.13
0.13
0.12
0.11
0.1
0.09
0.09
4-21-4
4-25-4
25-4-21
21-4-25
21-4-21
25-4-25
1-25-4
1-25-0
21-1-25
25-4-0
4-53-4
4-25-21
21-4-53
53-4-21
53-4-53
9-123-9
9-4-21
21-4-0
21-4-9
123-9-4
123-9-123
25-21-4
25-21-1
4-21-0
4-53-15
1-53-0
4-9-123
53-15-4
4-9-4
1-53-4
21-1-53
Counts
54
1-2-123
1-2-0
2-123-9
15-4-21
2-0-1
9-4-25
123-25-4
4-25-0
15-4-53
53-4-0
9-123-25
21-21-1
21-1-2
438-1-25
1-53-15
4-21-438
25-123-9
53-15-15
0-21-21
25-4-9
4-1-0
15-53-4
4-25-123
25-1-0
21-0-21
53-15-53
53-15-0
15-15-4
62-25-4
21-4-1
21-438-1
15-0-1
4-62-25
438-21-4
1-25-21
25-438-1
1-25-1
53-0-21
21-4-62
4-53-0
25-25-438
4-25-25
21-438-21
1-0-21
4-25-1
25-25-25
119
114
103
99
99
99
91
81
79
79
73
72
71
65
59
56
51
50
48
48
48
46
46
45
43
43
41
40
39
37
35
34
34
34
33
32
29
28
26
25
24
24
23
20
20
19
0.07
0.06
0.06
0.05
0.05
0.05
0.05
0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0-1-2
21-1-53
4-21-0
25-123-9
25-25-25
9-4-25
1-2-0
438-1-25
4-25-123
2-0-1
1-2-123
2-123-9
0-21-21
53-15-15
15-15-53
1-53-4
1-53-15
4-21-438
9-123-25
21-438-1
25-21-0
4-25-25
1-0-1
53-0-21
53-21-4
123-25-4
21-1-2
25-4-9
53-4-0
25-1-0
438-21-4
25-438-1
25-25-438
4-53-21
21-0-21
4-53-0
62-25-4
25-62-25
1-25-123
21-438-21
123-25-21
4-62-25
53-15-0
4-1-0
9-0-1
1-25-1
55
117
101
95
92
90
84
82
82
76
75
74
74
72
66
60
59
54
54
54
53
50
47
45
40
39
37
37
37
36
35
35
33
32
29
27
27
26
25
22
22
21
21
21
20
20
19
0.09
0.07
0.07
0.07
0.07
0.06
0.06
0.06
0.06
0.06
0.05
0.05
0.05
0.05
0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.01
0.01
0.01
25-4-1
25-62-25
2-0-21
2-123-25
9-4-9
4-4-21
4-25-438
21-21-0
21-4-438
21-4-4
25-25-4
4-438-1
438-1-53
1-25-25
15-15-53
62-25-62
9-113-9
1-25-123
2-123-2
25-4-62
4-1-25
123-2-123
15-0-21
25-1-25
25-4-4
25-438-21
4-438-21
113-9-113
113-9-4
25-123-25
25-21-0
4-25-62
0-21-0
15-53-15
1-2-1
1-25-62
15-4-0
2-1-25
4-1-2
4-9-113
53-4-1
62-25-1
123-25-123
123-9-2
15-52-0
2-113-9
19
18
17
17
17
16
15
13
13
12
10
10
10
9
9
9
9
8
8
8
8
7
7
7
7
7
7
6
6
6
6
6
5
5
4
4
4
4
4
4
4
4
3
3
3
3
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
15-0-1
15-53-15
21-4-62
9-4-9
25-25-4
4-4-21
53-15-4
4-25-1
123-9-0
25-438-21
21-4-1
4-25-438
4-25-62
15-4-21
21-4-4
1-0-21
1-25-25
15-4-53
2-0-21
25-21-438
4-9-0
9-4-0
1-53-21
62-25-62
25-123-25
438-1-53
25-4-1
0-21-438
15-15-4
15-53-21
21-21-0
62-25-21
9-0-21
25-25-21
53-53-53
62-25-0
0-21-0
1-25-69
123-2-123
123-25-123
15-52-0
25-4-4
438-1-2
438-21-21
53-15-52
9-123-2
56
19
18
18
18
17
17
17
16
15
15
14
13
13
12
12
11
11
11
11
10
10
10
9
9
8
8
7
6
6
6
6
6
6
5
5
5
4
4
4
4
4
4
4
4
4
4
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
25-4-438
4-4-25
438-1-2
52-0-1
53-21-4
9-0-1
9-4-0
1-2-113
1-25-69
123-9-0
21-1-456
25-69-25
25-69-4
4-25-69
4-53-21
438-21-21
53-15-52
69-25-0
9-2-123
0-0-1
1-456-1
1-456-21
1-53-1
1-53-21
10-0-21
113-53-4
113-9-0
123-0-21
123-2-0
123-2-9
123-25-21
123-25-25
123-9-21
15-15-52
15-53-21
2-9-123
21-21-438
25-1-2
25-21-438
25-62-0
4-0-0
4-4-4
4-53-113
4-53-438
4-9-10
4-9-2
3
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1-25-438
1-25-62
123-2-0
15-53-0
2-123-2
25-4-62
25-69-4
52-0-21
53-21-21
53-21-438
53-438-1
69-4-25
9-4-4
0-1-456
1-2-1
1-438-1
1-456-1
1-53-438
123-0-1
123-123-9
123-25-0
15-0-21
2-1-25
2-123-25
21-21-438
21-4-438
25-25-0
25-25-69
25-69-67
4-1-25
4-4-25
4-438-1
4-9-2
456-1-25
9-2-123
0-1-438
1-2-9
1-53-1
1-53-113
1-53-67
113-19-9
113-53-0
113-9-4
123-21-4
123-9-2
123-9-438
57
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
438-21-0
456-1-25
456-21-4
53-1-25
53-113-9
53-21-1
53-4-9
53-438-1
62-0-1
69-4-0
69-4-21
9-10-0
9-113-53
9-123-0
9-123-2
9-2-0
9-2-113
9-21-4
9-4-438
9-4-53
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
58
15-438-21
15-53-438
15-53-53
19-9-113
2-123-0
2-9-0
21-1-438
25-1-25
25-123-123
25-123-21
25-25-1
25-4-438
25-69-25
4-1-2
4-438-21
4-53-53
4-9-113
4-9-21
52-0-1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
53-1-0
53-113-53
53-15-438
53-4-1
53-53-15
53-53-21
53-67-0
67-0-1
67-25-21
67-4-25
69-25-25
69-67-25
69-67-4
9-113-19
9-113-9
9-123-0
9-123-123
9-2-0
9-21-4
9-4-1
9-438-1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Appendix D
59
60
61
62