Performance Measurement Analysis of Random Access Memory, L2 Cache and L1 Cache On X86 Architecture With Micro Benchmarking Memory Mountain Method

International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842
Issue 05, Volume 5 (May 2018) www.irjcs.com
PERFORMANCE MEASUREMENT ANALYSIS OF RANDOM

ACCESS MEMORY, L2 CACHE AND L1 CACHE ON X86
ARCHITECTURE WITH MICRO BENCHMARKING
MEMORY MOUNTAIN METHOD
Nia Rahma Kurnianda
Faculty of Computer Science, Mercu Buana University, Indonesia
nia.rahma@mercubuana.ac.id
Manuscript History
Number: IRJCS/RS/Vol.05/Issue05/MYCS10090
Received: 07, May 2018
Final Correction: 12, May 2018
Final Accepted: 14, May 2018
Published: May 2018
Citation: Kurnianda (2018). Performance Measurement Analysis of Random Access Memory, L2 Cache and L1
Cache on X86 Architecture With Micro Benchmarking Memory Mountain method, IRJCS:: International Research
Journal of Computer Science, Volume V, 227-235. doi://10.26562/IRJCS.2018.MYCS10090
Editor: Dr.A.Arul L.S, Chief Editor, IRJCS, AM Publications, India
Copyright: ©2018 This is an open access article distributed under the terms of the Creative Commons Attribution
License, Which Permits unrestricted use, distribution, and reproduction in any medium, provided the original author
and source are credited
Abstract— Memory technology applied to the computer is certainly very important in supporting all activities
performed by the computer. To see the access and how much memory in the computer can accommodate and
provide paths for accessing or processing data into the processor, we uses the mountain benchmarking memory
method by using the mountain c code to map the memory activity from the fastest to the slowest accessed by the
CPU . The end result of this study is to measuring how the capacity and speed of a memory affecting each other.
Keywords— mountain.c code; memory mountain microbenchmarking; memory performance; performance
measurement; memory mountain;
I. INTRODUCTION
In the current era of globalization, The development of technology can grow rapidly due to the progress of culture
and civilization levels in humans[1], because the more advanced culture than the technology will continue to grow.
the role of computers is no longer questionable the importance of developing the Information Technology sector.
Memory technology applied to the computer is certainly very important in supporting all activities carried out and
system on the computer. A system can be defined as a collection or set of elements, components, or variables that
organized, interacting, interdependent on one another and integrated. The system is also a collection of
interconnected elements and work together to process the inputs intended for the system and process the input to
produce the desired output.[2] And one of those systems is memory. The types of memory used in computers are
subdivided into several parts that support each other over the performance of a computer. Of which the storage
capacity in it large but slow access power, until it is embedded in the processor with fast access but very small
capacity. To see the access and how much memory in the computer can accommodate and provide a path for
accessing data into the processor, the authors apply a research with mountain memory techniques using the code
of mountain c, to be able to map the memory activity from the fastest to the slowest accessed by CPU.
The purpose of this paper is to 1. Measure and evaluate the speed of data transfer between memory and CPU, 2.
View the performance of RAM, Cache L1 and L2 at workload and 3. Measure the relationship between storage
capacity and access speed.
________________________________________________________________________________________________
IRJCS: Impact Factor Value – SJIF: Innospace, Morocco (2016): 4.281
Indexcopernicus: (ICV 2016): 88.80
© 2014- 18, IRJCS- All Rights Reserved Page -227
In facilitating the reader in following the stages of research conducted, the authors divide this writing in several
sections. The sections are I. Introduction, II. Theoretical Fundamental, III. Methods, IV. Results and Discussion and
V. Conclusion.
II. THEORITICAL FUNDAMENTAL

1. Computer Architecture
Computer Architecture is a science for the purpose of designing computer systems. The goal of a computer
architect is to design a system with high performance at a reasonable cost, meeting other requirements.
In the field of computer engineering, computer architecture is the concept of planning and basic operating
structure of a computer system. This computer architecture is a blueprint plan and a functional description of the
needs of the hardware part that is designed (process speed and interconnection system). In this case, the
implementation of the planning of each section will focus mainly on how the CPU will work, and on how to access
data and addresses from and to cache memory, RAM, ROM, hard disks, etc.). Some examples of this computer
architecture are von Neumann architecture, CISC, RISC, blue Gene, etc [3]
Multilayered Machine
The basic levels of computer architecture are then developed by looking at the whole computer system as a
"multilayered machine" consisting of several layers of software on the following hardware layers a. CPU (Central
processing Unit), which controls all other computer system units and alters inputs b. Primary storage (primary
storage), containing data being processed and program, c. Control unit (unit of control), making all units work
together as a system, d. Arithmetic and logical Unit, e. Input Unit, entering data into primary storage, f. Secondary
storage (secondary storage), providing a place to store programs and data when not in use and g. Output Unit,
recording the results of processing
2. Memory Mountain MicroBenchmarking
methods to characterize the performance of system memory (ISCA95 and HPCA97). It captures two aspects of the
memory hierarchy, its behavior with local temporality by varying the size of work settings and spatial locality by
varying the patterns. The calculated value is the transfer bandwidth (for large amounts of data). Next we can use
the same graph for local and remote transfer characterization in other words access from calculation and from
communication. And we can do this regardless of the underlying architecture.[4]
3. C Language
One of the programming languages, created in 1972 By Dennis Ritchie for the unix operating system at Bell
Telephone Laboratories.[5]
III. METHODS
1. Memory mountain.c Code
The code of memory mountain microbenchmarking method that we used in this study is listed below:[6]
/* mountain.c - Generate the memory mountain. */
/* $begin mountainmain */
#include <stdlib.h>
#include <stdio.h>
/* K-best measurement timing routines */
#include "fcyc2.h"
/* routines to access the cycle counter */
#include "clock.h"
/* Working set size ranges from 1 KB */
#define MINBYTES (1 << 10)
/* ... up to 128 MB */
#define MAXBYTES (1 << 27)
/* Strides range from 1 to 32 */
#define MAXSTRIDE 32
/* increment stride by this amount each time */
#define STRIDESTRIDE 2
#define MAXELEMS MAXBYTES/sizeof(int)
/* The array we'll be traversing */
int data[MAXELEMS];
_________________________________________________________________________________________________
/* $end mountainmain */
void init_data(int *data, int n);
void test(int elems, int stride);
double run(int size, int stride, double Mhz);
int main()
{
int size; /* Working set size (in bytes) */
int stride; /* Stride (in array elements) */
double Mhz; /* Clock frequency */
/* Initialize each element in data to 1 */
init_data(data, MAXELEMS);
/* Estimate the clock frequency */

Mhz = mhz(0);
/* Not shown in the text */
printf("Clock frequency is approx. %.1f MHz\n", Mhz);
printf("Memory mountain (MB/sec)\n");
printf("\t");
for (stride = 1; stride <= MAXSTRIDE; stride += STRIDESTRIDE)
printf("s%d\t", stride);
printf("\n");
for (size = MAXBYTES; size >= MINBYTES; size >>= 1) {
/* Not shown in the text */
if (size > (1 << 20))
printf("%dm\t", size / (1 << 20));
else
printf("%dk\t", size / 1024);
for (stride = 1; stride <= MAXSTRIDE; stride += STRIDESTRIDE) {
printf("%.0f\t", run(size, stride, Mhz));
}
printf("\n");
}
exit(0);
}
/* init_data - initializes the array */

void init_data(int *data, int n)
{
int i;
for (i = 0; i < n; i++)

data[i] = 1;
}
/* $begin mountainfuns */
void test(int elems, int stride) /* The test function */
{
int i, result = 0;
volatile int sink;
_________________________________________________________________________________________________
for (i = 0; i < elems; i += stride)

result += data[i];
sink = result; /* So compiler doesn't optimize away the loop */
}
/* Run test(elems, stride) and return read throughput (MB/s) */

double run(int size, int stride, double Mhz)
{
double cycles;
int elems = size / sizeof(int);
test(elems, stride); /* warm up the cache */

cycles = fcyc2(test, elems, stride, 0); /* call test(elems,stride) */
return (size / stride) / (cycles / Mhz); /* convert cycles to MB/s */
}
/* $end mountainfuns */
2. Classification and Specification of Research Objects

1. CPU Spesification
CPU is the electronic circuitry within a computer that carries out the instructions of a computer program
by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the
instructions. The computer industry has used the term "central processing unit" at least since the early
1960s.[7] In this study, the spesification of CPU that used is listed on the TABLE I.
TABLE I- CPU SPESIFICATION
Subject Spesification
Kernel Linux 3.2.0-4-amd64
Motherboard Intel Platform Garlow
Intel Snowhill S3210SH
North Bridge: Intel Bigby 3210
South Bridge: Intel 82801IR ICH9R
Architecture x86_64
CPU op-mode(s) 32-bit, 64-bit
Byte Order Little Endian
CPU(s) 4
On-line CPU(s) list 0-3
Thread(s) per core 1
Core(s) per socket 4
Socket(s) 1
NUMA node(s) 1
Vendor ID Genuine Intel
CPU family 6
Model 15
Stepping 11
CPU MHz 1600
BogoMIPS 4788
Virtualization VT-x
L1d cache 32K
L1i cache 32K
L2 cache 4096K
NUMA node0 CPU(s) 0-3
2. Operating System Specification

An operating system (OS) is system software that manages computer hardware and software resources and
provides common services for computer programs.
_________________________________________________________________________________________________
Time-sharing operating systems schedule tasks for efficient use of the system and may also include accounting
software for cost allocation of processor time, mass storage, printing, and other resources. For hardware functions
such as input and output and memory allocation, the operating system acts as an intermediary between programs
and the computer hardware[8] In this study, we used unix-like Operating System. Unix was originally written in
assembly language.[9] Ken Thompson wrote B, mainly based on BCPL, based on his experience in the MULTICS
project. B was replaced by C, and Unix, rewritten in C, developed into a large.
Complex family of inter-related operating systems which have been influential in every modern operating system.
Unix-like systems run on a wide variety of computer architectures. They are used heavily for servers in business,
as well as workstations in academic and engineering environments. Free UNIX variants, such as Linux and BSD,
are popular in these areas. Four operating systems are certified by The Open Group (holder of the Unix trademark)
as Unix. HP's HP-UX and IBM's AIX are both descendants of the original System V Unix and are designed to run
only on their respective vendor's hardware. In contrast, Sun Microsystems's Solaris can run on multiple types of
hardware, including x86 and Sparc servers, and PCs. Apple's macOS, a replacement for Apple's earlier (non-Unix)
Mac OS, is a hybrid kernel-based BSD variant derived from NeXTSTEP, Mach, and FreeBSD. Operating System
Spesification for this study is listed in following Table II
TABLE II - OPERATING SYSTEM SPESIFICATION
Distributor ID Debian
Description Debian GNU/Linux 7.7 (Wheezy)
Release 7.7
Code Name Wheezy
3. Processor Specification
TABLE III - PROCESSOR SPESIFICATION
Status Launched
Launch Date Q1, 07
Processor Number X3220
L2 Cache 8 MB
FSB Speed 1066MHz
FSB Parity No
Instruction Set 64-bit
Embedded Options Available No
Lithography 65nm
VID Voltage Range 0.8500V – 1.500V
Performance of Core 4
Processor Base Frequency 2.4GHz
TDP 105W
Physical Address 32-bit
Socket LGA 775
3. Implementation of Micro Benchmarking Memory Mountain Method
The steps taken to be able to run the memory mountain implementation are as follows: a. Download the source
code memory mountain with the command wget -r -l1 www.cs.cmu.edu/afs/cs/academic/class/15213-
f05/code/mem/mountain/mountain.c, b. Move to the mountain directory with the command cd
/root/www.cs.cmu.edu/afs/cs/academic/class/15213-f05/code/mem/mountain/mountain.c, c. Compile with
the command gcc -O -o mountain mountain.c fcyc2.c clock.c and d. Run with the command ./mountain> result.txt.
4. Analysis of Implementation Results
Analysis of the results of memory mountain implementation is done by the following steps, 1. Classification of
Implementation Results, this section classifies the implementation results into the RAM range, Cache Memory L1
and Cache Memory L2 as preliminary research data, 2. Conversion of Implementation Result to Memory Mountain
Graph . in this section, the authors convert the implementation results from the table form into the form of a
memory mountain graph, 3. Sampling Sessions and Performance Analysis. In this section, the authors take
samples of sliced graphs per stride and analyze per stride performance, last 4.
_________________________________________________________________________________________________
Performance Analysis Results, the authors convert the per-graph chart results into the performance statistics
graph and conclude the performance results based on the statistical results.
IV. RESULT AND DISCUSSION
1. Implementation Result
If the implementation step is done, it will get the results that appear on the console and stored in the file result.txt
as we can see with the following picture:
Fig. 1. Results of Memory Mountain Implementation

What is listed in the result that shown on fig. 1 can be shown at the following table:
TABLE IV -RESULT OF MEMORY MOUNTAIN IMPLEMENTATION
s1 s3 s5 s7 s9 s11 s13 s15 s17 s19 s21 s23 s25 s27 s29 s31
128m 3927 1930 1145 805 643 536 457 396 359 340 326 313 306 304 306 315
64m 3929 1932 1144 805 644 535 457 396 360 341 325 313 306 304 306 315
32m 3934 1936 1143 807 644 536 457 397 359 340 326 314 306 305 306 316
16m 3928 1938 1150 810 648 537 461 401 363 343 331 319 312 312 319 333
8m 3950 2001 1244 884 715 596 511 445 404 391 378 382 394 423 489 609
4m 4295 3834 3347 2907 2529 2232 2005 1787 1742 1736 1780 1831 1854 1901 1976 2031
2m 4372 4181 3883 3489 3067 2688 2385 2129 2021 2019 2012 2012 2024 2020 2023 2036
1024k 4360 4204 3923 3521 3091 2719 2415 2157 2049 2049 2047 2051 2056 2065 2075 2072
512k 4373 4204 3929 3526 3088 2713 2418 2154 2052 2052 2051 2051 2049 2061 2073 2067
256k 4371 4186 3942 3536 3080 2714 2407 2160 2040 2055 2051 2044 2042 2065 2067 2058
128k 4368 4187 3924 3525 3082 2697 2378 2146 2023 2041 2027 2016 2018 2053 2048 2041
64k 4342 4151 3926 3498 3060 2677 2390 2136 2007 1990 2000 2010 2800 2572 2394 2691
32k 4242 4298 4262 4250 4192 4105 4014 3979 4005 3886 3842 3750 3829 3709 3708 3700
16k 4227 4174 4072 3965 3904 3809 3724 3724 3713 3583 3517 3444 3351 3290 3192 3192
8k 4264 4034 3890 3750 3613 3472 3286 3228 3121 3017 2882 2785 2718 2600 2587 2422
4k 4143 3782 3572 3311 3026 2827 2703 2504 1995 2288 1853 2059 1971 1913 1563 1756
2k 4006 3423 2940 2506 2322 2151 1898 1809 1680 1581 1173 1393 1347 1330 1164 1170
1k 3731 2749 2261 1554 1582 1124 1297 1206 1064 940 912 836 760 757 846 798
2. Analysis of Implementation Result

1. Classification of Implementation Results
The result of the classification of the implementation output of the code of mountain.c are:
a. Range memory from 1 kilobyte up to 128 megabytes divided into:
1) L1Memory Cache, ranging from 1 Kilobyte to 32 Kilobytes
2) Memory Cache L2, ranging from 64 Kilobytes to 8 Megabytes
3) Random Access Memory, ranging from 16 Megabytes to 128 Megabytes.
b. Description S1 up to S31 with a distance of 2S, where S is the Stride or the number of data included
in the unit word, so there are 16 times the experiment ranging from 1 word to 32 word that is 2
words in each trial.
c. The result numbers are the number of words processed
_________________________________________________________________________________________________
2. Conversion of Implementation Result to Memory Mountain Graph

If we put it in graphical form, the TABLE IV will generate the mountains of memory as shown on Fig. 2
below:
Fig 2. Conversion Result

3. Sampling Sessions and Performance Analysis
To see how it performs, let's slice stride 1, stride 5 and stride 31. That shown in fig 3. is the Slice of Stride 1.
Fig 3. S1 Slice Sampling

In Stride 1, each data in RAM is read stable where the average data readout speed is in the range of 3.929Mb/s.
When compared to RAM, L2 is faster in reading data. Although the data reading is not stable, but the speed tends
to decrease when the data is processed more. Average data readings ranged at 4.303 Mb/s. Compared to L2, the
speed of L1 inversely proportional, the average speed of data readings ranged in 4.103 Mb/s. The larger data is
being processed, the speed will tend to increase. not much different from the speed of L2 but still below the speed
of L2
In the simulation of stride 5, RAM is still stable but with reduced speed with an average of 1,145.80 Mb/s. this
shows that the larger the data processed the less RAM speed data processing. Meanwhile, for L2 cache, the
average speed is in the range of 3,514.75 Mb/s. and tends to decrease as more data is processed.
_________________________________________________________________________________________________
For L1 cache, the data readout speed is not much different from L2 with average data reading 3,499.5 Mb/s. The
results obtained are also the same as the previous slices, the more data processed in L1, the faster performance
L1 do.

In the S31 simulation, the RAM speed in reading the data is lower again. But still remain stable. The average read
speed of the data is 319.75 Mb / s. Then on the L2 cache, it is also stable, but the access speed decreases with an
average access of 1,950.63 Mb/s. the graph still shows the more data is being processed, the speed will decrease.
Then Cache L1 condition reverses from the previous simulation, the access speed is increasing and is at an
average of 2,173 Mb/s. this does not vary much with L2 but shows L1 can run equivalent L2 performance if data is
processed even more.
4. Performance Analysis Results
From the previous analysis, this is the result that we got:
a. Average Speed Performance Rank
TABLE V - SPEED PERFORMANCE ANALYSIS RESULT
Stride Performance RAM L2 L1
S1 3929 4303 4103
S5 1145 3514 3499
S31 319 1950 2173
Average Speed 1798 3256 3258
Rank 3 2 1
b. Workload Capacity and Performance
TABLE VI - WORKLOAD CAPACITY AND PERFORMANCE
Workload RAM L2 L1
Small workload Low High Average
Average workload Low High Average
Big workload Low Average High
c. Storage Capacity and Speed
TABLE VII - STORAGE CAPACITY AND SPEED
Capacity RAM L2 L1
Small Low-Stable High High
Average Low-Stable Stable Rise
Big Low-Stable Low Low
V. CONCLUSION
1. Measure and evaluate result of the speed rank of data transfer between memory and CPU is L1 is the fastest,
L2 is coming next and RAM is the slowest one
2. The performance between the RAM, Cache L1 and Cache L2 at workload can be shown on the TABLE VI which
is for the big workload, the highest performance
3. the relationship between storage capacity and access speed can be shown on TABLE VII which is the larger
storage capacity the slower the read speed and the smaller the storage capacity the faster
_________________________________________________________________________________________________
REFERENCES
1. Yaya Sudarya Triana, Indah Syahputri. Implementation Floyd-Warshall Algorithm for the Shortest Path of
Garage. Vol 3, Issue 2, February 2018.
2. Yuwan Jumaryadi, Tazkiyah Herdi, Riad Sahara. Analysis and Design of KB/TK Bunga Bangsa Islamic School
Information System. International Researc Journal of Computer Science (IRJCS) Issue 04 Volume 5. 2018
3. Stallings, William. Computer Organization & Architecture. Prentice Hall. New Delhi. India. 2008
4. Randal E. Bryant And David R. O’Hallaron. Computer Systems: A Programmer’s Pespective, 3/E (CS:APP3e),
3rd ed. Carnergie Mellon University. Pittsburgh. USA: Pennsylvania. 1998.
5. Brian W. Kernighan, Dennis M. Ritchie. C Programming Language, 2nd ed. Prentice Hall. New Jersey. USA. 1988
6. Camergie Mellon University.(2018). Homepage CMU – Carnergie Mellon University [Online]. Available:
http://www.cs.cmu.edu/afs/cs/academic/class/15213-f05/code/mem/mountain/mountain.c
7. Weik Martin H. A Third Survey of Domestic Electronic Digital Computing Systems. Ballistic Research
Laboratory. Maryland. USA. 1961
8. Stallings, William, Operating Systems, Internals and Design Principles. Prentice Hall. New Delhi. India. 2005
9. Ritchie, Dennis. Unix Manual, 1st Ed. New Jersey. USA. 2008
_________________________________________________________________________________________________

Performance Measurement Analysis of Random Access Memory, L2 Cache and L1 Cache On X86 Architecture With Micro Benchmarking Memory Mountain Method

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Measurement Analysis of Random Access Memory, L2 Cache and L1 Cache On X86 Architecture With Micro Benchmarking Memory Mountain Method

Uploaded by

Copyright:

Available Formats

International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842

Issue 05, Volume 5 (May 2018) www.irjcs.com

PERFORMANCE MEASUREMENT ANALYSIS OF RANDOM

II. THEORITICAL FUNDAMENTAL

/* Estimate the clock frequency */

/* init_data - initializes the array */

for (i = 0; i < n; i++)

for (i = 0; i < elems; i += stride)

/* Run test(elems, stride) and return read throughput (MB/s) */

test(elems, stride); /* warm up the cache */

2. Classification and Specification of Research Objects

2. Operating System Specification

Fig. 1. Results of Memory Mountain Implementation

2. Analysis of Implementation Result

2. Conversion of Implementation Result to Memory Mountain Graph

Fig 2. Conversion Result

Fig 3. S1 Slice Sampling

Fig 4. S5 Slice Sampling

Fig 5. S31 Slice Sampling

You might also like