Professional Documents
Culture Documents
1
ACKNOWLEDGEMENT
Gurudatha.M
(M050224CS)
2
CERTIFICATE
3
Contents
1 Abstract 5
2 Introduction 6
4 DineroIV 10
5 Modifications 11
6 Conclusion 12
7 References 13
4
1 Abstract
Spectral Bloom filter is a space efficient data structure that are used to
efficiently find the existence of an object in any given set. The optimizations
for this randomized structure are Recurring minimum and minimal Increase
and then comparison of these two data structures is done. Then DineroIV the
cache simulator tool and its uses are discussed. For the effective utilization
of the Bloom filters it is more efficient that data is checked in small amount
of time even if the algorithm is fast. For this Bloom filters and all other
search techniques need to be fast the best way is to keep as much as possible.
Then finally modifications that are to be done to the DineroIV code so that
we can implement this Spectral Bloom filters for the better implementation
of filters.
5
2 Introduction
Data mining is the process of selecting the required data from the huge
amount of data. Nowadays in most of the applications it has become increas-
ingly important to search the data with in the less amount of time. This led
to the idea of hash functions with an average search time of each element is
constant or O(1). The need arose even beyond hash functions.
Spectral Bloom filters are the kind of data structures with the ability to
compactly represent a set, and filter out effectively any element that does
not belong to the set, with small error probability. With this data structure
it is easy to search a data with in a set with small false positive error. In
this Spectral bloom filters. In minimal selection method the element with
the minimum minimum value will be returned. in the case of Recurring
Minimum the elements with the single minimum are kept in an secondary
SBF and then looked for recurring minimum and then returned.
DineroIV is a cache simulator tool which simulates the cache and given
the input trace it records all the memory requests and counts the number of
misses, hits and memory references of the trace and also outputs the results.
6
3 Spectral Bloom Filters
The Spectral Bloom Filter (SBF) is a data structure that stores the set S
of elements in a bit vector V with a vector of m counters, C. The counters
in C roughly represent multiplicities of items in S, all the counters in C are
initially set to 0. In the basic implementation, when inserting an item s,
increase the counters Ch1 (s) , Ch2 (s) , ...Chk (s) ; by 1. The SBF stores the fre-
quency of each item, and it also allows for deletions, by decreasing the same
counters. Consequently, updates are also allowed ( by performing a delete
and then an insert ).
To add a new item x ∈ U to the SBF, the counters {Ch1 (s) , Ch2 (s) , ...Chk (s) }
are increased by 1. The Spectral Bloom Filter for a multi-set S can be
computed by repeatedly inserting all the items from S. The same logic is
applied when dealing with streaming data. While the data flows, it is hashed
into the SBF by a series of insertions.
Querying the SBF : A basic query for the SBF on an item x ∈ U returns an
estimate on f x. The SBF error, denoted ESBF , to be the probability that
for an arbitrary element z (not necessarily a member of S), f̂z 6= fz. The basic
estimator, denoted as the Minimum Selection (MS) estimator is f̂x = mx .
7
3.1 Minimal Increase
The minimal counter holds better information of a hashed item than the
other counters as the minimal counter holds less data items hashed to it so
the minimal counter has the better info. So the Minimal Increase method
takes this result into account for effective finding of an element in the set.
In Minimal Increase method insert an element into the set and then increase
only the minimal counters of that element and so the other counters will not
be increased until the other minimal counters catches them up. so this can
be done in this way :
To search for a particular element in the set first calculate hash functions
and sort them in the serial order and then it is found if there are recurring
minimum if there are then it is success and report other wise if there is
singular minimum then search for the element in secondary SBF.
8
The algorithm: When adding an item x, increase the counters of x in
the primary SBF. Then check if x has a recurring minimum. If so, continue
normally. Otherwise (if x has a single minimum), look for x in the secondary
SBF. If found, increase its counters, otherwise add x to the secondary SBF,
with an initial value that equals its minimal value from the primary SBF.
When performing lookup for x, check if x has a recurring minimum in the
primary SBF. If so return the minimum. Otherwise, perform lookup for x
in secondary SBF. If returned value is greater than 0, return it. Otherwise,
return minimum from primary SBF.
3.3 Comparison
Error rates The MS algorithm provides the same error rates as the origi-
nal Bloom Filter. The Minimal increase method performs better over the RM
and MS algorithms. The RM algorithm is not as good, but is consistently
better than the MS algorithm.
9
4 DineroIV
Dinero IV is an uniprocessor trace driven cache simulator tool. Trace is a
sequence of memory references made during the execution of the code. Its
main work to simulate the memory requests and cache misses and keeping
all the info regarding the execution of a particular program.
The input to the DineroIV is the sample traces of the execution of a pro-
gram. DineroIv takes the sample code in the .din format. It then simulates
the behavior of the cache. for this the dinero IV maintains the blocks in the
form of stack. The stack contains the memory blocks that are requested by
the sample traces. Each stack item can also be hashed to a no of blocks.
This tool takes the sizes, associativity, replacement policies, and sample
traces as input. It then checks for the validity of the arguments with re-
spect to the cache levels and then it initializes the stack to the tone of the
arguments. After initializing the stack and setting the replacement policies
it keeps on getting the traces and keeps updating the respective details and
then it finally summarizes to give output.
This DineroIV tool devides the cache into a maximum of five level caches
where each cache is divided into 3 parts the instruction, data and unified.
and for each it maintains the no of blocks and subblocks, associativity etc.
Dinero IV keeps track of the size of each access, along with address and
access type.When reading din format, Dinero IV forces all accesses to be
4-byte aligned and assumes a size of 4. the same trace read by Dinero IV
in din format may produce different results than if read directly in a richer
format, e.g., extended din or one of the pixie formats.he richer formats should
have correct size information, while Dinero IV must guess the size in the din
format (always 4 bytes).References can cross block or sub-block boundaries.
A multi-block reference is handled by doing the first block and deferring the
rest, as a separate reference. The number of times this is done is reported in
the d4cache multi block field.
10
5 Modifications
For the effective utilization of the Bloom filters it is more efficient that data
is checked in small amount of time even if the algorithm is fast. For this
Bloom filters and all other search techniques need to be fast the best way is
to keep as much as possible. so the necessary modifications are:
• The stack which the dineroIV is using must be divided into two parts
one for the ordinary cache purposes and another for storing the contents
of the set S which the element needs to be searched.so that most of the
set S remains in the memory rather than bringing every time.
• Code of d4ref function must be modified for the way the element can be
searched in the set S whether it is Minimal Increase or the Recurring
Minimum. which allow for both random search and also varying size
such that the hash time is O(1).
11
6 Conclusion
The Spectral Bloom filters can be used effectively than many searching meth-
ods with small error rate and also effective in implementation with the avail-
able requirements.The DineroIV can be effeciently modified to incorporate
the required static or dynamic policies.
The searching of an element from a set or datamining can be done effi-
ciently by combing the two methods namely the bloom filters and the modi-
fying the cache for the implementation of set and ordinary cache to make the
fetching of the elements easy. Hence with the specified modifications there
is better chance of reducing the searching time of an element.
12
7 References
References
[1] Saar Cohen,Yossi Matias, ”Spectral Bloom Filters”,Proceedings of the
2003 ACM SIGMOD international conference on Management of data,
2003,San Diego, California.
13