Papadakos PHD 2013

UNIVERSITY OF CRETE
DEPARTMENT OF COMPUTER SCIENCE

FACULTY OF SCIENCES AND ENGINEERING
Interactive Exploration
of Multi-Dimensional Information Spaces
with Preference Support
by
Panagiotis Papadakos
PhD Dissertation
Presented
in Partial Fulfillment
of the Requirements
for the Degree of
Doctor of Philosophy
Heraklion, November 2013
UNIVERSITY OF CRETE
DEPARTMENT OF COMPUTER SCIENCE
Interactive Exploration of Multi-Dimensional Information Spaces
with Preference Support
PhD Dissertation Presented
by Panagiotis Papadakos
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
APPROVED BY:
Author: Papadakos Panagiotis
Supervisor: Tzitzikas Yannis, Assistant Professor, University of Crete
Commitee Member: Plexousakis Dimitris, Professor, University of Crete
Commitee Member: Savidis Anthony, Professor, University of Crete
Commitee Member: Spyratos Nicolas, Professor Emeritus, University of Paris-South
Commitee Member: Vassiliadis Panos, Assistant Professor, University of Ioannina
Commitee Member: Rauber Andreas, Associate Professor, Vienna University of Technology
Commitee Member: Paltoglou Georgios, Senior Lecturer, University of Wolverhampton
Department Chairman: Trahanias Panos, Professor, University of Crete
Heraklion, November 2013
To the sacred
reflexive, symmetric and transitive
relation of
a student and a teacher
The only principle that does not inhibit progress is:
anything goes.
Paul Karl Feyerabend
Against Method (1976)
The drawing in the previous page sketches the following:
i) Preferences are part of the cognitive process of decision making
ii) This dissertation takes advantage of multi-dimensional hierarchies
iii) The process that any PhD student has to face: starting from a few unrealistic aims, going through the gradual immersion
and disorientation in the ocean of the available knowledge (a difficult and frustrating situation where the help of the advisor
is appreciated), to the final gathering and integration of the contributions (see also the respective drawing in p. 175).
Image taken from beamer X
L
A
T
E
X template available from https://github.com/drbunsen/drbunsen-beamer
Acknowledgments
The following pages cannot convey my feelings and the experiences that I gained during all these years
of my Doctoral Dissertation odyssey. The blank space was filled by black ink in just a few seconds. Only
a small odour reminds the process of their imprinting. But my imprinting was a long process. Differ-
ent typesetters wrote their words with different metal sorts in different places. Their printings have
affected my scientific, artistic and human nature, and I owe them my present. Those people I would like
to thank.
The End of this work could not have been typed without the undivided and unconditional support
of my supervisor, Assistant Professor Yannis Tzitzikas. Through all these years his academic advice
and directions were always on the spot. I am also grateful to him for his constant mentorship and
for believing in me when I had lost my confidence. Although I was his first PhD student, he managed to
accommodate to the specific particularities of my personality and stimulate my interest and enthusiasm.
What is more important though, is that as blinkers keep horses from seeing what nature meant them to
see, which is just about everything, I was taught to try and remove my mental blinkers.
I want to deeply thank Professor Dimitris Plexousakis, head of Information Systems Laboratory (ISL)
for the time he devoted to me all these years. He has been a critic and at the same time a supportive
advisor. His insights have been really inspiring and crucial. As the head of the lab he created a highly
creative and inspiring environment for me.
I would like to express my sincere appreciation to the third member of my advisory committee, As-
sociate Professor Anthony Savidis, who was my supervisor during my MSc 3 Dimensional CRC voyage
in the Human Computer Interaction (HCI) lab. Although the initial plans of my PhD thesis changed to
unknown territories for him, he managed to understand my work and provide guidance and comments.
Furthermore I amindebted to the other members of my examination committee, Professor Emeritus
Nicolas Spyratos, Assistant Professor Panos Vassiliadis, Associate Professor Andreas Rauber and Senior
Lecturer Georgios Paltoglou, for their constructive comments and suggestions.
xiii
I was fortunate to meet Irini Fundulaki and Kostas Stefanidis, two researchers who helped me a lot
to gain self-confidence. Irini motivated a number of interesting discussions that helped me understand
deeper my work. Kostas is the motivating example of a young, smart, capable and passionate researcher.
I wish him all the best to his career.
Moreover I would like to acknowledge the support of the Institute of Computer Science of the Foun-
dation of Research and Technology (FORTH-ICS) and especially the ISL, both financially and for the facil-
ities (the lights of the laboratory were kept on until early morning some times). It is a nice place to be,
with exceptional people who elicit inspiring discussions.
This research has been co-financed by the European Union (European Social Fund - ESF) and Greek
national funds throughthe OperationProgramEducationand LifeLong Learning of the National Strate-
gic Reference Framework (NSRF) - Research Funding Program: Herakleitus II. Investing in knowledge
society through the European Social Fund. Despite the above formal words that I have to write, this
financial support has been really important for me, especially during this financial crisis period.
Finally, I want to thank the following persons with whom I spent a lot of time all these years: Nikos
Tsagkarakis for all the time that we spent together, our discussions, the summits we reached, for cultivat-
ing our vineyard and drinking the raki spirit we produced, Anna-Maria Papadaki for being an earthy
human being, Aristea Papadimitriou for her gaze and our philosophical discussions, Georgia Troullinou
for taking care of me when I was for a second time an infant, Michalis Papadakis for his amanedes, Dim-
itris Robotis for cooking on Sundays, Andreas Sfakianakis for being bald, Despoina Pavlidi for the house
in Panagia, Sofia Skandali for our trips, Christina Lantzaki for our interesting discussions on graphs,
Nikos Manolis and Maria Psaraki for the times in the Lefka basements, Aksas for not listening to his
name Nikos, my colleagues in FORTH Patkos Theodore, Yannis Marketakis, Pavlos Fafalios, Nikos Arme-
natzoglou, Yannis Kitsos, Dimitris Andreou, Stella Kopidaki, and Yannis Kargakis, Corina Doerr for the
nice logo of Hippalus and Ionas for the name Hippalus, Yannis Roussakis for ping-pong, my neighbours
in the lab, George Baryannis and Chrysostomos Zeginis, as long as Ioannis Chrysakis, Dimitra Zografis-
tou, Roula Avgoustaki, Lida Charami, Athina Kritsotaki, Irini Maravellia and Manos Papadakis for their
patience, Maria Moutsaki for scanning and Dimitris Aggelakis for windows, George Konstantinidis for
our friendship before he left Greece, and Dimitra Makri for her understanding.
This work is a result of the constant support of my parents, Stavros and Maria, and my two sisters
Stavroula and Katerina. They always believed and supported me in any possible way.
xiv
Abstract
Users access large amounts of information resources (documents or data) mainly through search func-
tions, where they type a few words and the system (web search engine, query engine) returns a linear
list of hits. While this is often satisfactory for focalized search, it does not provide enough support for
recall-oriented (exploratory) informationneeds, whichconstitute the majority according to various user
studies.
The interaction of Faceted and Dynamic Taxonomies (FDT), is a highly prevalent model for explo-
ratory search, which allows users to get an overview of the information space (e.g. search results) and
offer them various groupings of the results (based on their attributes, metadata, or other dynamically
mined information). These groupings enable users to restrict their focus gradually and in a simple way
(through clicks, i.e. without having to formulate queries), enabling them to locate resources that would
be difficult to locate otherwise (especially the low ranked ones).
The enrichment of search mechanisms with preferences could be proved useful for recall-oriented
information needs. However, the current approaches for preference-based access (mainly fromthe area
of databases), seem to ignore the fact that users should be acquainted with the information space and
the available choices for describing effectively their preferences.
In this dissertation we extend the interaction model of FDT with preference actions that allow users
to express their preferences interactively, gradually, and in a simple way.
Initially, we introduce a preference framework appropriate for information spaces comprising re-
sources described by attributes whose values can be hierarchically valued and/or multi-valued. We
define the language, its semantics and the required algorithms. The framework supports preference
inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (priori-
tization, Pareto and their combination).
Subsequently, we enrich the FDT model with preference actions and we propose logical optimiza-
tions and methods for exploiting the intrinsic characteristics of the FDT-based interaction, aiming at
xv
making it applicable to large amounts of information. Then, we present the design and the implementa-
tion of the web-based system Hippalus, which realizes the extended interaction model.
Regarding user benefits, at first we theoretically analyze user gain in terms of the number and diffi-
culty of choices, and then we describe and analyze three user-based evaluations that we have conducted.
The first investigates the degree of effectiveness of preferences (andthe effort to express them) when
users are not aware of the available choices. The results showed that only 20% of the users managed to
express effective preferences without knowing the available choices.
The second comparatively evaluates FDT and other exploratory models. The results showed that the
majority of users preferred FDT, was more satisfied by FDT and achieved higher rates of task completion
with FDT.
The last one concerns the evaluation of the preference-enriched FDT as realized by Hippalus. The
results were impressive. Even in a very small dataset, with the preference-enriched FDT all users suc-
cessfully completed all tasks in 1/3 of the time and with 1/3 of the actions in comparison to the plain FDT.
Moreover all (100%) of the users (either plain or experts) preferred the preference-enriched interface.
Keywords: Preferences, Exploratory Search, Interactive Information Retrieval, Decision Making
Supervisor: Tzitzikas Yannis
Assistant Professor
Computer Science Department
University of Crete
xvi
()
-
(.. ) -
(hits).
(focalized search),
(recall oriented), , ,
.

(). -
, .. ,
( , ,
).
, ( ),
, -
.

(recall oriented), -
(
),

.

, , .
-
xvii

/ . , -
.
, (, Pareto
).
-

.
- Hippalus, .
,
,
.
( )
. 20%
.
,
,
.

Hippalus. . ,
, 1/3
(!) . 100% ,
, .
Keywords: , , , -

:

xviii
Contents
Page
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Context, Approach and Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Produced Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Context: Exploration through FDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Preference Management in General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Various Perspectives of Preference Management . . . . . . . . . . . . . . . . . . 17
2.3 Faceted and Dynamic Taxonomies (FDT) and Preferences: Motivation . . . . . . . . . . 21
2.4 The Database World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 IR and Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 FDT and Preferences: Past and Related Works . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Motivation and Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 A Preference Framework for Multidimensional Information Spaces . . . . . . . . . . . . . . . 35
3.1 Syntax of the Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 The Domain of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
xix
3.3 Syntax to Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Set-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.3 Best/Worst Preferences over Hierarchically Organized Values . . . . . . . . . . 46
3.3.4 Relative Preferences over Hierarchically Organized Values . . . . . . . . . . . . 52
3.3.5 Preferences over Hierarchical Set-Valued Attributes . . . . . . . . . . . . . . . . 59
3.4 Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.2 Pareto Composition and Best Matches Only (BMO)-set . . . . . . . . . . . . . . . 63
3.4.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . . 65
3.5 A Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Complexity and Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Optimizations for Deriving the Preference-based Order . . . . . . . . . . . . . . . . . . 78
4.2.1 An Algorithm based on the Focal Object Set . . . . . . . . . . . . . . . . . . . . . 78
4.2.2 Optimizations for Capturing Set-Valued Attributes and Top-K Requirements . . 82
4.3 Optimizations for Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.2 Pareto Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . . 87
5 Applicability and the System Hippalus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1 Application in Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.1.1 Case: Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.2 Case: Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1.3 Case: RDF Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2 Hippalus: A Preference Enriched Faceted Exploratory System . . . . . . . . . . . . . . 98
5.2.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.2 Visualization and User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.3 Interaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
xx
6.1 Evaluation Approaches & Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1.1 Metrics for Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1.2 Metrics Related to the Proposed Interaction Scheme . . . . . . . . . . . . . . . . 114
6.2 Theoretical Analysis of the Number of User Decisions and Effort in FDT . . . . . . . . . . 116
6.3 DiFEPreKO Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.1 Analytical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.2 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.4 Evaluation of Various Exploration Approaches . . . . . . . . . . . . . . . . . . . . . . . 131
6.5 Evaluation of Hippalus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.6 Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.1 Synopsis of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 Directions for Future Work and Research . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Appendices
A Complete Syntax of Preference Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
B Binary Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
C Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
xxi
xxii
List of Figures
2.1 Dynamic Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Finding a Hotel in the Island of Symi (FDT over booking.com) . . . . . . . . . . . . . . . 14
2.3 Checking Olympus Cameras (FDT over eBay) . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 FTD-based GUI of the Mitos WSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Distinctions of Preference Management Approaches . . . . . . . . . . . . . . . . . . . . 19
2.6 SciNet Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Example Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Hasse Diagram of Preference Relation Over E (E, R
) . . . . . . . . . . . . . . . . . . . 40
3.2 Example for Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Example for a DAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Example for Flat Multi-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Example of Preferences Without Exploiting Hierarchies . . . . . . . . . . . . . . . . . . 47
3.6 Hasse Diagram of Actions Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Taxonomy of Manufactures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.8 Hasse Diagram of Scope-Based Ordering of Preference Actions . . . . . . . . . . . . . . . 54
3.9 Examples of Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.10 Relative Inherited Preferences and Conflicts Examples . . . . . . . . . . . . . . . . . . . 57
3.11 Examples of Cycles of the Forme e
e . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12 Hasse Diagram of the Relation R for the Manufacturer Attribute . . . . . . . . . . . . . . 59
3.13 Scope Based Ordering of Actions (Left for Best/Worst Actions, Right for Relative Prefer-
ence Actions): Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.14 Hasse Diagram for the Relation R
bw
: Complete Example . . . . . . . . . . . . . . . . . . 69
3.15 Hasse Diagram for the Relation R
: Complete Example . . . . . . . . . . . . . . . . . . 69
3.16 Hasse Diagram for the Relation R: Complete Example . . . . . . . . . . . . . . . . . . . 70
xxiii
3.17 Hasse Diagram for Ordering Ordering Multi-Valued Attributes According to MoreWins
Rule: Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.18 Hasse Diagram for Ordering Multi-Valued Attributes According to MoreGoodLessBad Rule:
Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.1 Processes of Web Searching and Exploratory Web Searching . . . . . . . . . . . . . . . . 90
5.2 Process of Exploratory Web Searching Enhanced with Preference Actions . . . . . . . . 91
5.3 Mitos GUI for Exploratory Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Facets and Zoom-Points of Running Example . . . . . . . . . . . . . . . . . . . . . . . . 96
5.5 Example of RDF/S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.6 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.7 The RDF Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.8 Hippalus: The Main Page of Hippalus . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.9 Hippalus: Value Expansion - Object Restriction . . . . . . . . . . . . . . . . . . . . . . 103
5.10 Hippalus: Expression of Relative Preference Korean European . . . . . . . . . . . 103
5.11 Hippalus: (a): Expressing Preferences, (b): Object Restrictions after Preference Expression105
5.12 Hippalus: Composition of Preference Actions. Manufacturer Prioritized to Price . . . . . 106
5.13 Hippalus: Composition of Preference Actions. Price Prioritized to Manufacturer . . . . . 106
5.14 Hippalus: Composition of Preference Actions. Default Combination Mode . . . . . . . . 107
5.15 Hippalus: Restricted Focus with Preferences Applied . . . . . . . . . . . . . . . . . . . 107
6.1 Available IR Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2 Evaluation Step B: Users Select a Car from the List (1st page) . . . . . . . . . . . . . . . . 122
6.3 Evaluation Step B: Users Select a Car from the List (2nd page) . . . . . . . . . . . . . . . 123
6.4 Probabilities and Distribution Function of the Binomial Distribution . . . . . . . . . . . 131
6.5 Comparative Evaluation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.6 Plurality and Borda results for (a) Ease of Use, (b) Usefulness, (c) Preference and (d) Satisfaction. 141
6.7 Average Values in Last Step of Each Task. (a) for Timings (T) and Actions (A), while (b)
Depicts the Values for Recall (R), Precision (P) and Average Precision (AP) . . . . . . . . 144
xxiv
List of Tables
2.1 Basic Notions and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Scopes (Direct and Under Inheritance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Scopes: Example for Best/Worst Preferences . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Scopes: Example for Relative Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Complete Example: Scopes and Active Scopes . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5 Tuples in Database: Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1 PrefOrder
Opt
Changes for Capturing Relative Preferences Over Hierarchically Organized
Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Complexity for Non-Optimized and Optimized Alg. PrefOrder and PrefOrder
Opt
. . . 81
6.1 Choices and Number of Clicks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2 Example of Hypothesis Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Results of the hypothesis evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.4 Percentages of the 30 Users that Expressed a Preference Over a Valid Attribute . . . . . 136
6.5 Graeco-Latin Square Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.6 Plain and Expert Users Average, Max and Min Timings and User Actions for each Task
for both UIs per each User Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.7 Plain and Expert Users Average, Max and Min Timings and User Actions per each Task
and All Tasks for both UIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.8 Plain and Expert Users Recall, Precision and Average Precision Metrics per each and all
Tasks for both UIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
xxv
xxvi
Chapter 1
Introduction
Contents
1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Context, Approach and Research Questions . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Produced Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Objective
In one sentence, we could say that the general objective of this thesis is to offer users a flexible and
effective method for accessing large amounts of data, able to support recall-oriented information needs
and decision making.
1.2 Context, Approach and Research Questions
Users access large amounts of information resources (documents or data) mainly through search func-
tions, where the user types a few words and the system (web search engine, query engine) returns a
1
2 Chapter 1. Introduction
linear list of hits. While this is often satisfactory for focalized search, where the user knows exactly
what he is looking for and can be satisfied by a single hit, it does not provide enough support for recall-
oriented (exploratory) informationneeds, whichare the majority according to various user studies (Rose
and Levinson (2004); Crawford (2006)). Below we describe in brief the world of unstructured data and
the world of structured data.
Information Retrieval (IR) is the area of study concerned with the processes by which user queries
to information systems are matched against stored objects (in principle full-text documents), which are
finally returned to the user. Mainly, it is a system-based approach that does not take into consideration
the user, except during query formulation. However, researchers recently have started trying to under-
stand the role of users in IR, since there is a belief that we cannot design effective IR systems without
knowing how users interact with them (Kelly (2009)). This has led to the development of Interactive
Information Retrieval (IIR), where users are studied along with their interactions with systems and in-
formation.
Traditional IR abstracts human interactions and experiences out of the evaluation of a retrieval sys-
tem, and as a result leads to suboptimal IR systems. The interest of the community for a TREC-style
evaluation framework for studying interaction and users led in three recent Tracks. These Tracks in-
clude the Interactive Track (TRECs 3-11), the HARD Track (TRECs 12-14) and ciQA(TRECs 15-16)
1
. Each
one of them experimented with different evaluation frameworks but none of them was successful in
establishing a generic evaluation framework (Kelly (2009)).
Onthe other hand, the recent applications must cope witha wide range of data, whichcanbe unstruc-
tured (full text documents), semi-structured (XML) or structured (databases, linked-data). Furthermore
a plethora of new tasks, quite different from the classical query evaluation task, are being performed:
from data mining algorithms and machine learning to collaborative recommendation and filtering. As a
result, there is an interest in the integration of IR and databases, like in Papadakos et al. (2008a); Li et al.
(2011), and the exploitation of available techniques from both scientific regions in a user friendly way.
In the world of structured information (e.g. databases, the Semantic Web) users are offered powerful
and expressive languages to query the underlying information. On the other hand, such query languages
are not directly utilized by end users, since the formulation of queries is a laborious and difficult task
for them (Reisner (1981)). Consequently, efforts for exploiting such languages in user friendly general-
1
http://trec.nist.gov/tracks.html
1.2. Context, Approach and Research Questions 3
purpose models of exploration/navigation have come up (e.g. Chakrabarti et al. (2004); Oren et al. (2006);
Mkel et al. (2006); Hildebrand et al. (2006); Becker and Bizer (2009); Le Phuoc et al. (2010); Ferr and
Hermann(2012)). For instance, the interactionscheme of FDT(Sacco and Tzitzikas (2009)) allows users to
explore the information space and is suitable for recall-oriented tasks, as the ones found in Exploratory
Search (ES)
2
and decision making environments. By using the Faceted and Dynamic Taxonomies (FDT)
interaction scheme, users can restrict their focus (object set) without having to formulate queries, but
through a small set of actions (zoom in/out/side). Each action corresponds to a query (formulated on-
the-fly) which can be enacted by a simple click. This approach can be applied over the results of an IR
system(simple user query with relative terms), a relational database (by using available query languages
like SQL) or data published under Semantic Web technologies like RDF/S or OWL data models (querying
them using SPARQL, SQWRL, etc).
In this dissertation we investigate how we can extend these actions with preference management.
Such an extension can further ease the interaction and speed up the restriction of the focus to those
parts of the information space that the user is interested in. Such actions can be especially beneficial
for mobile devices and User Interfaces (UIs) over small screen real-estates (i.e. smart-phones and tablet
computers), which need special interfaces (as the one proposed by Neumann and Schmeier (2012)). To
this end, we extend the FDT interaction with user specified preferences. In other words FDT allows con-
structing queries by simple navigation/exploration actions, and this work extends this set of actions in
order to offer preference-compliant exploration.
Works on preference management over structured data (e.g. Kieling (2002); Chomicki (2003); An-
dreka et al. (2002); Kieling and Kostler (2002)) require that the user either has to formulate complex
preference queries, or the application developer has to develop specialized interfaces which internally
construct such queries. Moreover, and more importantly, for formulating an effective preference spec-
ification the user should be already acquainted with the information space and the available choices,
which could be unknown as in the case of web databases (Stefanidis et al. (2011a)). In addition, in this
work we formulated the hypothesis that without knowing the available choices, the declarative expres-
sion of preferences is a tiresome and time-consuming process and proved its validity through a user
study.
2
The ES initiative aims to provide users with better tools for advanced information seeking tasks such as learning, investi-
gation and analysis according to Marchionini (2006).
The above observations justify the need for flexible and universal (i.e. general purpose) access meth-
ods that offer exploration services and real-time preference elicitation. The requirements for such explora-
tory environments include:
a) generality, they should be capable of capturing a wide range of information spaces and user informa-
tion needs,
b) expressiveness, it should be possible for the user to interactively specify complex preference structures
and
c) usability, the users should be able to use and understand the interactionimmediately, and the resulting
interaction should be effective and desired by the users.
As a result, the main research questions that arise from the above are:
How can we gradually and flexibly specify preferences over information spaces that might be hierar-
chically organized and might support multi-valued attributes and which will be their semantics?
How can we tackle the algorithmic perspective so that the proposed preference-extended FDT in-
teraction can be applied over large information bases?
How does the preference-extended FDT interaction affect the user effort and other metrics during
exploratory tasks?
1.3 Contribution
The key points and contributions of this thesis are:
It introduces an interaction model for preference elicitation during FDT exploration. Most works on
preference management focus only on the order of objects, while this work focuses also on the
other aspects of the FDT interaction scheme (facet/zoom-points), i.e. on the order of queries
the user can select for changing his focus.
At first we introduce a preference framework appropriate for information spaces comprising re-
sources described by attributes whose values can be hierarchically valued and/or multi-valued. We
define the language, its semantics and the required algorithms. The framework supports preference
1.4. Produced Publications 5
inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (prioriti-
zation, Pareto and their combination).
It elaborates on the systemand algorithmic perspective of the proposed approach, and introduces
methods that allow applying the approach over large information spaces.
Subsequently, we present the design and the implementation of the web-based system Hippalus
which realizes the extended interaction model.
Regarding the benefits for the users, at first we analyze theoretically the user gain in terms of
number and difficulty of choices.
Then we describe and analyze three user-based evaluations that we have conducted. The first in-
vestigates the degree of effectiveness of preferences (and the effort to express them) when the
user is not aware of the available choices. The results show that only 20% of the users managed to
express effective preferences without knowing the available choices. The second, comparatively
evaluates FDT and other exploratory models. The results showthat the majority of users preferred
FDT, were more satisfied by FDT, and they achieved higher rates of task completion with FDT. Fi-
nally, the last one evaluates the preference-enriched FDT as realized by Hippalus. The results
were impressive. Even in a very small dataset, with the preference-enriched FDT all users com-
pleted successfully all tasks in 1/3 of the time and with 1/3 of the actions in comparison to the
plain FDT. Moreover all (100%) of the users (either simple or experts) preferred the preference-
enriched interface.
To the best of our knowledge this is the first work that supports the above.
1.4 Produced Publications
The research activity related to this thesis has so far produced 2 journal, 3 conference, 1 workshop and
1 demo papers along with 2 technical reports, which are briefly described below:
Related to the application of FDT over a Web Search Engine (WSE)
DEXA08 Workshops paper Tzitzikas et al. (2008) describes FleXplorer, which is a framework for
FDT, that can manage millions of objects in real-time and is used by Mitos WSE
3
.
The idea of combining the interaction scheme of FDT and on-line clustering algorithms was de-
scribed in the conference paper Papadakos et al. (2009a), presented at ECDL09 and also in HDMS09
(Papadakos et al. (2009b)).
ECDL09 Doctoral Consortium workshop paper Papadakos (2009) describes the initial vision of this
Dissertation.
WISE09 conference paper Kopidaki et al. (2009) describes a snippet-based clustering algorithm
named NM-STC, which is used by the previous work.
The KAIS12 journal Papadakos et al. (2012a) proposes the exploitation of both static and dynamic
metadata for the FDT interaction scheme, studies an incremental way of speeding up the explo-
ration process of this approach and provides the results of an experimental user study over Mitos.
Extension of FDT with preferences
The FI12 journal Tzitzikas and Papadakos (2013), motivates the need for real-time preference elic-
itation, introduces a language for enriching the interaction scheme of FDT, with preference elicita-
tion and preference-based interaction. Key aspects of the proposed approach include, the support
of hierarchically organized values, the support of set-valued attributes, and the incremental preference
specification mode, with the scope-based method for resolving conflicts. The proposed algorithms, take
advantage of the rapid reduction of the information space through the use of FDT, and are indepen-
dent of the size of the information base.
A demo paper, showcasing an implementation of the above functionality over an RDF exploratory
system, is described in Papadakos et al. (2012b).
Related to IR indexing and querying
The technical report Papadakos et al. (2008b), published in CORR08 describes Mitos, a DBMS-based
WSE that provides the FDT interaction scheme.
3
Under development by the Department of Computer Science of the University of Crete and FORTH-ICS
(http://groogle.csd.uoc.gr:8080/mitos/).
1.5. Thesis Outline 7
The DBMS-index of Mitos is discussed in the PCI08 conference paper Papadakos et al. (2008a),
where different database representations are discussed and compared.
An extension of the previous work with one additional representation and experimental results is
provided in the technical report Papadakos et al. (2009c), published in CORR09.
Submitted and under review
Submitted to the International Journal of Information Technology & Decision Making an article
based on the hypothesis user study described in Section 6.3. The title of the article is Compar-
ing the Effectiveness of Intentional Preferences versus Preferences over Specific Choices: A User
Study.
A paper that describes in detail the Hippalus system and discusses the results of the evaluation
in Section 6.5 has been submitted for review to the 1st International Workshop on Exploratory
Search in Databases and the Web (ExploreDB 2014), with the title Hippalus: Preference-enriched
Faceted Exploration.
1.5 Thesis Outline
The rest of this thesis is organized as follows. Chapter 2 provides the required background information
on FDT and preference management, and discusses related work. Chapter 3 defines the syntax and se-
mantics of a preference specification language for multi-dimensional hierarchical information spaces.
Chapter 4 elaborates on the algorithmic perspective of the proposed approach and introduces a number
of optimizations which are crucial for the applicability of the framework. Chapter 5 examines imple-
mentation and application issues of the approach. Chapter 6 discusses user effort, checks the validity
of the hypothesis of this thesis and examines the results of a number of user-based evaluations. Finally,
Chapter 7 concludes the thesis and identifies issues that are worth further work and research.
8
Chapter 2
Background and Related Work
Contents
2.1 Context: Exploration through FDT . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Preference Management in General . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Various Perspectives of Preference Management . . . . . . . . . . . . . . . . . 17
2.3 FDT and Preferences: Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 The Database World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 IR and Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 FDT and Preferences: Past and Related Works . . . . . . . . . . . . . . . . . . . . 25
2.7 Motivation and Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . 30
In this chapter we discuss the background and the related work regarding FDT and preferences.
Specifically, Section 2.1 reviews and provides notions and notations for FDT. Regarding preferences
and personalization, Section 2.2 reviews preference management in general. In Section 2.3 we motivate
the enrichment of FDT with preference actions. Section 2.4 and Section 2.5 discusses preferences under
the prism of the database and IR world respectively. Finally, Section 2.6 discusses related work in the
context of decisionmaking and preferences over FDT, while Section2.7 provides the motivating example
of this thesis.
9
10 Chapter 2. Background and Related Work
2.1 Context: Exploration through FDT
Most Database (DB) and IR applications, as well as most Web Search Engines (WSEs), are very effective
for focalized search, i.e. they make the assumption that users can accurately describe their information
need using a query which is usually a small sequence of words. However, as several user studies have
shown, a high percentage of search tasks are exploratory (Crawford (2006),Rose and Levinson (2004)), the
user does not know accurately his information need (e.g. in WSE users provide in average 2.4 words as
described in Inan (2006)) and he can not be satisfied by a single hit. The information needs emerge as
users iteratively seek, learn and reflect on the gathered results during the session (Bystrmand Jrvelin
(1995); Chowdhury et al. (2011)). As a result focalized search very commonly leads to inadequate inter-
actions and poor results. The available UIs in most cases do not aid the user in query formulation, and
do not provide any exploration services. The returned answers are simple ranked lists of results, with
no organization. For this reason users typically reformulate their initial query, inspect the top elements
of the returned answer, and so on.
One approach to this problem is results clustering (Hearst and Pedersen (1996); Zamir and Etzioni
(1998); Kopidaki et al. (2009)) which provides an overview of the search results. A survey of web clus-
tering engines is provided in Carpineto et al. (2009). Results clustering aims at grouping the results
into topics, called clusters, with predictive names (labels), aiding the user to locate quickly documents
that otherwise he wouldnt practically find especially if these documents are low ranked (and thus not
in first result pages). The snippets, the cluster labels and their structure is one instance of what we
call dynamically-mined metadata. We use this term to refer to metadata which are query-dependent, i.e.
they cannot be extracted/mined a-priori. The problem with clustering is that such metadata are usu-
ally mined only from the top-K part of a query answer because it would be unacceptably expensive
(for real-time interaction) to apply these tasks on large quantities of data. In addition, the lack of pre-
dictability, the fact that a number of algorithms create clusters with common results, the difficulty of
labeling the groups (at least for non snippet-based algorithms) and the counterintuitiveness of cluster
sub-hierarchies, make the explicit use of clustering difficult for the users (Hearst (2006)).
Other approaches to exploratory search (Meij et al. (2009); Shokouhi and Radinsky (2012); Fafalios
et al. (2012b); Shokouhi (2013)) include completions, either of a single term the user is typing in or the en-
tire query, and auto suggestions representing the full user query intent. Such completions also include
2.1. Context: Exploration through FDT 11
result completions (i.e. the user is presented a number of results according to the keywords he has typed).
Such approaches have been used for a while by mainstreamsearch engines like Google
1
, Freebase
2
, com-
mercial sites like eBay or Evi
3
which is an answer engine.
On the other hand, modern environments should guide users in exploring the information space
and in expressing their information needs in a progressive manner. Systems supporting FDT offer a sim-
ple, efficient and effective way for explorative tasks and are discussed in Sacco and Tzitzikas (2009).
Dynamic taxonomies (faceted or not) is an interaction framework based on a multi-dimensional classifica-
tion of (may heterogeneous) data objects allowing users to browse and explore the information space in
a guided, yet unconstrained way through a simple visual interface. Features of this framework include:
(a) display of current results in multiple categorization schemes (called facets - or just attributes), (b)
display of categories (i.e. attribute values) leading to non-empty results only, (c) display of the count
information of the indexed objects of each category (i.e. the number of results the user will get by se-
lecting that category), and (d) the user can refine his focus gradually, i.e. it is a session-based interaction
paradigm in contrast to the query-and-response dialog of current WSE which is stateless.
Figure 2.1: Dynamic Taxonomies
An example of the idea of dynamic taxonomies assuming only one facet is shown in Figure 2.1. Figure
2.1(a) shows a taxonomy comprising 10 terms (European, Italian, Spanish, German, Fiat, Lancia, Seat
4
,
VW, BMW, Audi) and 8 indexed objects (1-8). Figure 2.1(b) shows the dynamic taxonomy if we restrict
1
http://www.google.com
2
http://www.freebase.com/
3
http://www.evi.com/
4
Since Seat was bought by VW we assume that cars build by Seat are both Spanish and German.
our focus to the objects {4,5,6}. Notice that it comprises only 6 terms, those that lead to objects in {4,5,6}.
Figure 2.1(c) sketches user interaction, based on the restriction shown in Figure 2.1(b) (e.g. at the left
side bar). Notice the count number next to each term.
User Interaction. The user explores or navigates the information space by setting and changing
his focus. The notion of focus can be intensional or extensional. Specifically, any conjunction of terms (or
any boolean expression of terms in general) is a possible focus. In this case we can say that the focus is
defined intensionally. For example, the initial focus can be the empty, or the top termof a facet. However,
the user can also start from an arbitrary set of objects, and this is the common case in the context of a
WSE. In that case we can say that the focus is defined extensionally. Specifically, if A is the result of a
free text query q (or if A is a set of tuples returned by an SQL query q), then the interaction is based on
the restriction of the faceted taxonomy on A (Figure 2.1(b) shows the restriction of a taxonomy on the
objects {4,5,6}). At any point during the interaction, we compute and provide to the user the immediate
zoom-in/out/side points along with count information (as shown in Figure 2.1(c)). When the user selects
one of these points then the selected termis added to the focus (corresponding to a more refined query),
and so on.
Notions and Notations. Table 2.1 defines formally and introduces notations for terms, terminologies,
taxonomies, faceted taxonomies, interpretations, descriptions and materialized faceted taxonomies, that will be
used in the sequel. The upper part of the table describes taxonomies. The middle part of the table de-
scribes materialized faceted taxonomies, which is actually the kind of information sources that we consider.
In brief, Obj is a set of objects (e.g. the set of all documents indexed by a WSE), each described with
respect to one or more aspects (facets), where the description of an object with respect to one facet
consists of assigning to the object one or more terms from the taxonomy that corresponds to that facet.
I is the interpretation function, while

I takes into account the semantics. For example, and assuming
the example of Figure 2.1(a), we have I(Lancia) = {2, 3}, I(Italian) = , while

I(Lancia) = {2, 3}
and

I(Italian) = {1, 2, 3}. The lower part of the table describes the FDT-interaction. For example,
Figure 2.1(b) depicts the restriction over the set A = {4, 5, 6}, and the reduced terminology T
A
is the set of
shown terms.
Scalability Regarding scalability we should mention that FDT can provide real-time exploration ser-
vices for millions of objects using techniques like those proposed in Yee et al. (2003); Sacco (2006a); Ben-
Yitzhak et al. (2008). Thorough experimental results over FleXplorer are given in Tzitzikas et al. (2008).
TAXONOMY
Name Notation Definition
terminology T a set of terms (can capture categorical/numeric values)
subsumption a partial order (reflexive, transitive and antisymmetric)
taxonomy (T, ) T is a terminology, a subsumption relation over T
broaders of t B
+
(t) { t
| t < t
}
narrowers of t N
+
(t) { t
| t
< t}
direct broaders of t B(t) minimal<(B
+
(t))
direct narrowers of t N(t) maximal<(N
+
(t))
top element i i = maximal
(Ti)
MATERIALIZED FACETED TAXONOMIES
faceted taxonomy F = {F1, ..., F
k
} Fi = (T i, i), for i = 1, ..., k and all T i are disjoint
object domain Obj any denumerable set of objects
interpretation of T I any function I : T P(Obj)
materialized faceted taxonomy (F, I) F is a faceted taxonomy {F1, ..., F
k
} and I is an inter-
pretation of T =
i=1,k
T i
ordering over interpretations I I
I(t) I
(t) for each t T

model of (T , ) induced by I

I

I(t) = {I(t
) | t
t}
description of o wrt I DI(o) DI(o) = { t T | o I(t)}
description of o wrt

I D
I
(o)

DI(o) { t T | o

I(t)} =
tD
I
(o)
({t} B
+
(t))
FDT-INTERACTION: BASIC NOTIONS AND NOTATIONS
intentionally specified focus ctx any subset of T such that ctx = minimal(ctx)
projection on Fi ctxi ctx Ti
Kinds of zoom points w.r.t. a facet i while being at ctx
zoom points AZi(ctx) { t Ti |

I(ctx)

I(t) = }
zoom-in points Z
+
i
(ctx) AZi(ctx) N
+
(ctxi)
immediate zoom-in points Zi(ctx) maximal(Z
+
i
(ctx)) = AZi(ctx) N(ctxi)
zoom-side points ZR
+
i
(ctx) AZi(ctx) \ {ctxi N
+
(ctxi) B
+
(ctxi)}
immediate zoom-side points ZRi(ctx) maximal(ZR
+
(ctx))
Restriction over an object set A Obj (i.e. extensional focus)
reduced interpretation IA IA(t) = I(t) A
reduced terminology TA { t T |

IA(t) = } =
{ t T |

I(t) A = } =oAB
+
(DI(o))
Table 2.1: Basic Notions and Notations
Figure 2.2: Finding a Hotel in the Island of Symi (FDT over booking.com)
As expected, the computation of zoom-in points with count information is more expensive than without:
in 1 sec we can compute the zoom-in points of around 240.000 results (i.e. |A| = 240.000) with count
information, while without count information we can compute the zoom-in points of around 540.000
results.
Applications. Examples of applications of faceted metadata-search include: e-commerce (e.g. eBay
shown in Figure 2.3 or Amazon
5
), library and bibliographic portals (e.g. DBLP, ACM Digital Library),
booking applications (e.g. booking.com
6
as shown in Figure 2.2), museum portals (e.g. Hyvnen et al.
5
http://www.amazon.com
6
http://www.booking.com
Figure 2.3: Checking Olympus Cameras (FDT over eBay)
(2005) and Europeana
7
), mobile phone browsers (e.g. Karlson et al. (2006)), specialized search engines
and portals (e.g. Mkel et al. (2005); Yee et al. (2003)), Semantic Web (e.g. Hildebrand et al. (2006);
Mkel et al. (2006)), general purpose WSE (e.g. Mitos Papadakos et al. (2009a)) and collaborative envi-
roments (e.g. mSpace Schraefel et al. (2003)). Moreover, and as shown in Papadakos et al. (2012a) this
interaction scheme can act complementarily to the query-and-response dialogue of the current WSE,
along with available dynamic metadata mined through clustering techniques (Kopidaki et al. (2009)) or
entity mining (Fafalios et al. (2012a, 2013); Kitsos et al. (2013); Fafalios and Tzitzikas (2013)).
As an application example, Figure 2.4 shows a screenshot of a WSE that supports FDT exploration.
7
http://www.europeana.eu
Figure 2.4: FTD-based GUI of the Mitos WSE
Specifically, it shows the screen after the user submitted the query java. In that screenshot, 4 different
facets are shown, eachcorresponding toone metadata attribute (at the left bar). The values (zoom-points
or terms) of two of these facets (By date and By clustering) are hierarchically organized, while the values
of the rest two facets (By filetype and By domain) are flat (no hierarchical organization). The results of the
current focus appear at the right frame. For more on this application the reader can refer to Papadakos
et al. (2012a), while the real-time snippet-based results clustering algorithm employed is described in
Kopidaki et al. (2009).
2.2. Preference Management in General 17
2.2 Preference Management in General
Preferences are a central part of our every day lives and lead human decision making. Commonly, pref-
erences are not hard constraints but wishes, simple or complicated ones (covering one or more aspects),
which might or might not be satisfied. Such wishes might be independent, or might affect each other
even in conflicting ways (Stefanidis et al. (2011a)).
Preferences have been studied in a number of fields since they are a multi-disciplinary topic. Such
fields include Philosophy (Hansson (2001)), social sciences like Psychology (Scherer (2005)) and Eco-
nomics (Fishburn (1999)) and Decision Making (Lichtenstein and Slovic (2006)). Furthermore, they have
been thoroughly studied in a number of Computer Science areas. Specifically, they have been studied in
the fields of Artificial Intelligence (AI) (Wellman and Doyle (1991)), Human Computer Interaction (HCI)
(Linden et al. (1997)), and especially in Information Systems (ISs) like in databases (Kieling (2002);
Chomicki (2003)), XML (Kieling et al. (2001)), and OLAP (Golfarelli et al. (2011)).
A survey on representation, composition and application of preferences in DBs is given at Stefanidis
et al. (2011a), while a survey of major questions and approaches for preference handling in applications
such as recommender systems, personal assistant agents and personalized user interfaces is given at
Peintner et al. (2008). Pu and Chen(2008) propose guidelines and report examples for product searchand
recommender systems. In Figure 2.5 we show some distinctions of preference management approaches
from various perspectives, which are discussed below.
2.2.1 Various Perspectives of Preference Management
We can identify the following perspectives of preference management
8
:
Subject of Personalization. In general, a user can express preferences regarding the informational
contents of an application, the visualization of the contents, the services that the user has access to at
any time, or the interaction between the user and the application.
Preference Formulation. Preferences can be defined either using a qualitative approach like in Kieling
(2002); Chomicki (2003) and Georgiadis et al. (2008) or a quantitative approachas inAgrawal and Wimmers
(2000); Balke and Gntzer (2004) and Koutrika and Ioannidis (2005). According to the qualitative approach,
preferences are described directly, using a preference relation
Pref
(i.e. x
Pref
y). Preference rela-
8
This categorization is based on Stefanidis et al. (2011a) survey.
tions may be specified using logical formulas (Chomicki (2003)), or by using special preference constructors
(Kieling (2002)). In the quantitative approach, preferences are described indirectly by defining scoring
functions (i.e. Score(x) > Score(y)). Scores may be assigned through preference functions (Agrawal and
Wimmers (2000)) or through degrees of interest under specific satisfied conditions (Koutrika and Ioanni-
dis (2004)). The qualitative approach is more powerful and expressive than the quantitative approach,
since not every preference can be modeled using scoring functions, according to Chomicki (2003) and
Fishburn (1970). There are also approaches that support a mixture of qualitative and quantitative prefer-
ences (Rossi et al. (2008)). This can be done by putting together a CP-net (Conditional Preference Network)
9
and a set of constraints.
Certainty of Preference. The above approaches canbe further specializeddepending onwhether the ex-
pressed preferences are crisp or fuzzy (uncertain). Uncertainty expresses the level of confidence whether
a particular preference holds and can be modeled by using fuzzy set theory. Barrett and Salles (2006) re-
views the literature on fuzzy preferences.
Sources of Preference. Preferences can be specified explicitly by the users (either through a query lan-
guage that supports preferences (Kieling and Kostler (2002); Levandoski et al. (2010)), or through the
mediation of an application that produces such queries (Kieling et al. (2011b)), or implicitly, by tracking
silently user actions and monitoring user behaviour. The latter category includes works like Gadanho
and Lhuillier (2007), Kelly and Teevan (2003) and Pound et al. (2011). In addition, preferences can be in-
ferred based on the assumption that similar people like similar things. Such works include collaborative
filtering systems (Schafer et al. (2001) and Rashid et al. (2002)). Machine learning has also been applied
for learning preference value functions. For example desJardins et al. (2006) and Wagstaff et al. (2010)
present methods for learning preferences over sets of items, by taking as input a collection of positive
examples.
Subject Information Space. Another criterion is the structure of the underlying information space
(unstructured information (i.e. text), relational spaces, multi-dimensional spaces with hierarchically
organized attribute domains, support of multi-valued attributes, etc).
Context. Preferences can hold unconditionally and in this case are called context-free. On the other hand,
contextual or conditional preferences hold when specific conditions are met. Furthermore, contextual pref-
9
A CP-net is a directed graph Gover attributes V , whose nodes are annotated with conditional preference tables for each
attribute (Boutilier et al. (2004)), and uses conditional ceteris paribus (all else being equal) semantics.
2.2. Preference Management in General 19
Figure 2.5: Distinctions of Preference Management Approaches
erences canbe dividedto internal, whenthe context canbe definedfrominformationavailable to the data
over which preferences are expressed on, and external when not. Computing context (i.e. network connec-
tivity), user context (i.e. profile), physical context (i.e. temperature) and time are common types of external
contexts (Chen and Kotz (2000)). A model for the propagation of user preferences through contexts is
described in Ciaccia and Torlone (2011) while a model for expressing contextual is described in Stefanidis
et al. (2011b).
Elasticity. Preferences can be exact or elastic. Exact preferences can either be satisfied or not, while elastic
should be satisfied as closely as possible. For example, Kieling (2002) captures elastic preferences using
the AROUND preference construct and distance functions.
Complexity. Complexity describes the degree of detail and how specific a preference is. Generic or sim-
ple preferences express preferences over a single attribute of the entities of interest while a compound
preference combines a number of simple preferences.
Completeness. The description of user preferences usually is incomplete, since it is inconvenient for
users to express preferences over all pairs of objects in the domain of interest (Stefanidis et al. (2011a)).
In such cases, the lack of preference relations over some objects can be interpreted either as an equal
preference (i.e. they are equivalent), as an incomparability (i.e. these objects can not be compared), or
finally we assume that there is a gap in our preference knowledge, which can be avoided by defining a
preorder extending the given partial order (Ross (2007)).
Semantics. Preferences can use two different semantics: ceteris paribus semantics and totalitarian se-
mantics. Ceteris paribus is latin and means all else being equal. For example the preference I prefer
a square table over a round table, when any other attributes like size, wood, etc. are the same. On the
other hand totalitarian semantics mean that if I prefer an object o
1
over o
2
for a specific attribute, it
means that I do not prefer o
2
over o
1
for any other attribute (Pareto semantics).
Stability. Furthermore, its difficult to assume that user preferences are stable, so frameworks that cap-
ture preferences should not assume that they are fixed. Users change their preferences even while in-
specting available choices. For example Doyle (2004) shows how easily preferences change over time.
Chomicki (2007) proposes an incremental preference revision framework, where the revised preference
relation is produced by composing the original preference relations with another preference relation, by
using preference composition methods like union, prioritized and Pareto composition. Elaborating even
more, in this thesis we show that the expression of user preferences is time-consuming and results to
incomplete preferences, when the user does not have the ability to viewand explore the existing choices.
We have named this hypothesis the Difficulty of Formulating Effective Preferences without Knowing the
Options (DiFEPreKO) hypothesis which we evaluate in Section 6.3, through a user study.
Granularity. Preferences can be expressed at different levels of granularity. For example in databases,
preferences can be expressed over individual tuples, sets of tuples (i.e. where preferences do not depend
only on individual tuple values but also on properties of groups of tuples like in Brafman et al. (2006);
2.3. FDT and Preferences: Motivation 21
Binshtok et al. (2007) and Zhang and Chomicki (2011)), attributes (Georgiadis et al. (2008)), relationships
(Koutrika and Ioannidis (2004)) (i.e. preferences expressed over relationships between two type of enti-
ties), relations (i.e. preferences expressed on class of entities) and finally on facts (i.e. preferences on the
space of hierarchy attributes) (Golfarelli et al. (2011)). In the FDT world, preferences can be expressed
over different objects, zoom-points and facets. Most of the available works focus only to objects. Recently,
there are works that also affect facets, like in Dash et al. (2008); Wagner et al. (2011) and Pound et al.
(2011), which will be presented later in Section 2.6.
2.3 FDT and Preferences: Motivation
One main thesis of this work is that effective preference specification presupposes knowledge of the in-
formation space and of the available choices. FDT-based interaction can aid users in getting acquainted
with the information space and the available choices. Therefore FDT can aid preference elicitation even
if instead of the preference actions proposed in this proposal, the other well known approaches (e.g.
Preference SQL described in Kieling and Kostler (2002) and Kieling et al. (2011a) or FlexPref described in
Levandoski et al. (2010)) are employed for expressing user preferences and/or deriving the correspond-
ing object ranking. The computationand display of zoompoints reduces the need for specifying complex
preference profiles and users can explore the available choices (or the most preferred) non linearly. For
instance, by clicking on the zoom points the user can inspect the available choices based on the desired
values. Without effective exploration services the user is obliged to explore linearly blocks of objects
and the derivation of small blocks (equivalently many blocks) requires rich preference specifications
which are cumbersome to acquire. Lets consider a set of attributes and suppose the user selects one
zoompoint of the first attribute. The FDT approach will showthose values of the rest attributes that are
active. Such browsing can aid users in identifying for each attribute those values for which it is worth
specifying a complex value tradeoff (e.g. by using a quantitative approach).
On a multi-dimensional space where user preferences for each dimension have been specified, the
efficient set (else called skyline, or Pareto optimal set) is indeed very useful if the user is interested in one
hit (e.g. one car to buy, one hotel to book). In the FDT approach and with the actions that we propose
(specifically with term-scoped actions), the most preferred values from each dimension are shown as
zoom points and at decreasing order of preference. We know that all objects that have these values (i.e.
those objects that have at least one of the most preferred values of an attribute), are certainly part of
the skyline. So the preference-extended FDT interaction inherently provides partially skyline support.
However, to compute the entire skyline we need to apply one skyline algorithm (e.g. Kossmann et al.
(2002); Papadias et al. (2005)), so skyline computation can be considered as a helpful complementary
service. The computed skyline can then be explored using the FDT method.
2.4 The Database World
For applying user preferences over relational data, many different methods have been proposed in the
literature. The most used ones are skylines (i.e. return objects in a database that are not dominated
by any other object in the data) (Brzsnyi et al. (2001); Kossmann et al. (2002); Chomicki et al. (2003);
Papadias et al. (2005)) and top-K (i.e. score each object using a monotonic ranking function and return
the top-K (Chaudhuri and Gravano (1999); Chang and Hwang (2002); Ilyas et al. (2004a,b)). Other methods
include k-dominance (i.e. consider only k dimensional subspaces for dominance)(Chan et al. (2006a)),
k-frequency (i.e. rank each object based on how often they are returned in the skyline when different
number of dimensions are considered) (Chan et al. (2006b)), top-k dominance (i.e. rank objects based on
howmany other objects it dominates and returns the k objects with the highest score (Yiu and Mamoulis
(2007)), k-representative dominance (i.e. selecting k skyline points so that the number of points, which are
dominated by at least one of these k skyline points is maximized) (Lin et al. (2007)), hybrid multi-objective
methods (computing sets of objects that are non-dominated with respect to a set of monotonic objective
functions (Balke and Gntzer (2004)), ranked skylines (i.e. adapt to user-specific information needs and
identify the skyline results of user-specified retrieval size k) (Lee et al. (2009)), distance-based dominance
(i.e. a newdefinition of representative skyline that minimizes the distance between a non representative
skyline point and its nearest representative) (Tao et al. (2009)), and lastly skylines (i.e. the number of
skylines can be increased or decreased, provide a built-in rank for all objects and integrate weights to
different dimensions) (Xia et al. (2008)). Finally, user satisfaction can further be improved by increasing
the diversity of the results, like in desJardins and Wagstaff (2005) and Vee et al. (2009).
2.5. IR and Preferences 23
2.5 IR and Preferences
Preference management and personalization in IR has been approached from various perspectives. The
initial step for personalizing IR systems was query reformulation through explicit relevance feedback
(Rochio (1971); Choi et al. (2001); Bot and Wu (2004)), or pseudo-relevance feedback (Kelly and Belkin
(2001); Kelly and Teevan (2003)), which is implicit feedback inferred from user behavior (i.e. selection
of a document, time the document is open, etc). The approaches for personalization and information
filtering can be roughly classified into two categories: content-based filtering and collaborative filtering.
Inthe first approach, the documents are monitored and the systempushes to the user the best match-
ing ones to his user profile. The user can provide explicit relevance feedback, updating his profile using
different retrieval models, like Boolean, VSM, probabilistic models (Robertson and Jones (1976); Yu et al.
(2004); Zigoris and Zhang (2006); Zhang and Koren (2007)), retrieval models that rank objects based on
user-defined reference points (Korfhage (1997)), inference networks (Callan (1996)), language models
(Croft and Lafferty (2003)), user feedback to improve preference learning (Cohen et al. (1999)) and ma-
chine learning algorithms for learning ranking functions(Lewis (2001); Yang et al. (2005); Shawe-Taylor
et al. (2002); Burges et al. (2005); Zhai and Lafferty (2006); Zha et al. (2006); Liu (2011)). In the latter
approach, the system takes advantage of other similar user profiles and preferences, except from doc-
uments content. Memory-based (utilize the entire user-item database to generate a prediction) and
model-based (provide item recommendation by first developing a model of user ratings) approaches
have been proposed (Breese et al. (1998); Delgado and Ishii (1999); Herlocker et al. (1999); Hofmann and
Puzicha (1999); Jin et al. (2004); Konstan et al. (1997); White et al. (2010)). Other approaches (Basu et al.
(1998); Melville et al. (2001); Wang et al. (2006); Pitkow et al. (2002)) try to combine both techniques, to
provide an effective recommendation system.
A very recent and interesting approach is described in Ruotsalo et al. (2013a). This work presents the
design and study of interactive user modeling, where the user models features are keywords, and aims to
support exploratory tasks. Specifically this work allows the users to perceive the state of a user model
at all times and provide feedback that directly rewards and penalizes. In addition the users can continu-
ously tune the systems belief about their information needs. Feedback is provided by drag-&-dropping
keywords fromavailable documents into the exploratory view. Keywords near the center of the explora-
tory vieware more important than keywords near the edges. Figure 2.6 shows an snapshot of the SciNet,
Figure 2.6: SciNet Prototype
which is a prototype implementing the above functionality. The results show that interactive user mod-
eling can help users to more effectively find relevant, novel and diverse results without compromise in
task execution time. The same authors in Ruotsalo et al. (2013b) introduce an interactive intent mod-
eling, where the user directs exploratory search by providing feedback for estimates of search intents.
Estimates are visualized in an Intent Radar, where relevant intents are are close and similar intents have
similar angles.
Such approaches, except Ruotsalo et al. (2013a) which also affects keywords, affect only object rank-
ing and do not exploit available metadata (which could be mined statically or dynamically as proposed
in Papadakos et al. (2012a); Kitsos et al. (2013)), With respect to our proposal they can be considered
as complementary techniques that are based solely on the textual content of the objects. In addition,
the proposed model can incorporate IR-like rankings by exploiting a Relevance facet, which corresponds
to the score returned by the WSE. Furthermore, they do not engage users (again except Ruotsalo et al.
(2013a)) to use available personalization techniques in the search process.
2.6. FDT and Preferences: Past and Related Works 25
2.6 FDT and Preferences: Past and Related Works
Supporting personalization in FDT is not well studied. Most FDT systems, like Flamenco (Hearst et al.
(2002)), output facets and zoom-points in lexicographical order. An alternative is to order facets and
zoom-points based on the number of indexed documents as in Oren et al. (2006). Some other systems
like eBay Express
10
(merged now to the main eBay portal), only present a manually chosen subset of
facets to the users, and the zoom-points are again ranked based on the number of indexed documents.
Manually selecting and maintaining a number of preferred facets can be time consuming, especially for
systems that support a great number of facets and zoom-points. In addition in systems like eBay or
Amazon, users are able to order the available objects according to simple object ordering operations
over one specific attribute (e.g. order objects according to Price, or Price + Shipping, or Duration
of auction in ascending or descending order).
Set-Cover Ranking
One of the first approaches for facet ranking, is described in Dakka et al. (2005). Specifically, this
work aims at providing automatic and scalable methods for the creation of multifaceted interfaces. In
addition, it provides methods for selecting the best portions of the generated hierarchies (considering
the limitations of screen space). Specifically, they introduce two approaches for facet ranking. The
first tries to maximize the number of indexed objects that are accessible from the top-k facets (set-cover
ranking). The second, named merit based, takes into consideration the structural properties of the sub-
hierarchies under the selected facets (i.e. the structure of zoom-points). Specically, the merit-based
method ranks higher facets that enable users to access their contents with the smallest cost on average.
Interestingness Ranking
Another approach described in Dash et al. (2008), tries to select the list of facets that will be dis-
played to the user following a query, a problem called facet selection problem. In this method the notion
of interestingness is incorporated into the ranking. Each facet is measured based on how surprising it
is, by aggregating the interestingness of its values given a certain expectation. They define three dif-
ferent ways for setting the expectations. The first is the natural one, where the users assume a natural
distribution in the data-set (i.e. documents uniformly distributed along each facet, or that facets are
independent). The second is navigational one, where they assume that the user is already familiar with
10
http://www.ebay.com
the repository and the expectation is set based on how the user navigates the results. Finally, there is
an ad-hoc way, where the user sets the expectation based on an arbitrary query. However, in this ap-
proach, users cannot explictly define their preferences over facets and zoom-points and cannot affect
the ordering of the objects.
Collaborative Approaches
A collaborative filtering method with explicit user ratings to design a personalized FDT system is pro-
posed in Koren et al. (2008), where several algorithms are proposed and evaluated. They propose a gen-
eral probabilistic frameworktobuildfaceteddocument models anduser relevance models. Users express
a preference for retrieved documents and facet-values pairs are ranked according to their probability
of being included in a document relevant to the user. Their objective is to minimize user cost, which
is defined as the time needed for reaching an item of interest. The time is an aggregation of the times
for reading facet headings, browsing facet hierarchies and correcting browsing mistakes. Moreover, the
authors provide an evaluation methodology for personalized faceted search research, in order to com-
plement user studies by being cheap, repeatable, and controllable. In contrast to our work, this work
does not allow the user to express any facet and zoom-point preferences. Furthermore, it assumes that
each user is searching for exactly one document, and that the user has perfect knowledge of the target
document. This can be the case only for focalized search, but not for exploratory search, which is our
point of interest.
Anumber of collaborative approaches for the personalization of faceted search and visual graph nav-
igation in Semantic Web data, by content filtering based on (manually or automatically) created ontolo-
gies are proposed in Tvaroek (2006); Tvaroek and Bielikov (2007a,c,d,b); Tvaroek et al. (2008). These
approaches take advantage of metadata stored in an ontology to create at runtime new facet descrip-
tions. The set of available facets and restrictions adapt to the in-session user behavior and on long term
user and other users characteristics stored in the user model. According to these approaches, relevance
to users is measured by calculating the distance between values in the hierarchical ontology. In addi-
tion they annotate search results to improve user orientation and guidance. Again, it is a collaborative
approach and there is no support for explicit preferences.
Minimum Effort and Cost Approaches
Minimum-effort driven navigational techniques for enterprise databases and warehouses are de-
scribed inRoy et al. (2008). At eachstep of the navigation, the systemasks the user one or more questions
2.6. FDT and Preferences: Past and Related Works 27
regarding different facets. Then according to the user response, it dynamically fetches the next most
promising set of facets. For example in a cars database, a very simple faceted search interface is one
where the user is prompted an attribute (e.g. Manufacturer), to which he responds with a desired value
(e.g. Honda), after which the next appropriate attribute (e.g. Model) is suggested to which he responds
with a desired value (e.g. Accord). The proposed approach is based on minimal cost decision trees, which
is anNP-Complete problem. As a result, they use a simple approximationalgorithm. This algorithmselects
facets based on their ability to rapidly drill down to the most promising tuples as well as the user abil-
ity to provide desired values for them. In addition, in Roy and Das (2009) the same authors investigate
opportunities to improve the performance of minimum effort driven faceted search techniques. The
main idea is motivated by the early stopping techniques used in the TA-family of algorithms for top-K
computations. In comparison to the proposed approaches in this thesis, this work does not allow users
to express preferences, and it only concerns which facets will be displayed.
In the same manner but for zoom-points, Kashyap et al. (2010) propose a cost-based system for
faceted navigation, named FACeTOR. The user is presented with a subset of all possible facet conditions
(zoom-points), which are selected based on a probabilistic cost model of user navigation. This approach
guarantees that the overall navigation cost is minimized and every result is guaranteed to be reachable
by a facet condition. Since the selection of the optimal facet conditions is NP-Hard, they present two intu-
itive heuristics. The first, is inspired by an approximation algorithmfor the weighted set cover problem
and attempts to find a relatively small set of suggestions that have a high probability of being recognized
by users. The second heuristic, greedily selects each facet condition assuming that all future suggestions
have identical properties. This automatic approach only concerns zoom-points and does not allowusers
to express preferences.
Semantically Enriching Tweets
Abel et al. (2011) present an adaptive and personalized faceted search engine for Twitter. They pro-
pose strategies for inferring facets and facet-values (entities and topics) fromtweets and related external
Web resources, by semantically enriching tweets. Given the semantically enriched tweets, they propose
user and context modeling strategies that identify (current) interests of a given Twitter user and allow
for contextualizing the demands of this user. As a result they propose faceted search strategies for con-
tent exploration on Twitter and methods that adapt to the interests and context of a user, by ranking
the facets and facet-values. Finally, they present an evaluation environment based on simulated users
to evaluate different strategies in this adaptive faceted search engine on Twitter. All the above func-
tionality is offered automatically, and as a result the user can not explicitly express his preferences over
facets, values and objects or define his context.
Log Based Utility
Pound et al. (2011) model the user faceted-search behavior using the intersection of web query-logs
with existing structured data, in order to capture facet and facet-value utility for a specific query. They
present an automated scalable solution that elicit user preferences on attributes and values. They pro-
pose different disambiguation techniques ranging fromsimple keyword matching to more sophisticated
probabilistic models (based on clustering, logs or clicks) for mapping keywords to different possible
attributes. Furthermore, they present a variety of techniques that deal with disambiguating amongst
different overlapping attribute-value pairs per query (table or context dependent value selection). In
addition they discuss how to use signals from the data, like entropy and sparseness to discover which
attributes make better facets. As a result, facets and values are ordered according to available log infor-
mation and users are not allowed to explicitly express their preferences for their specific information
need.
Intuition Based Ranking
All of the above approaches, assume a precise information need. That is, relevance, interestingness,
and user costs (for fulfilling an information need) have been employed for measuring facet importance.
On the other hand, Wagner et al. (2011) provide a browsing-oriented approach (i.e. the user has a fuzzy
information need and slowly explores an unknown collection of items) for facet ranking. They use an
aggregation function over different intuitions and metrics for facet ranking. In particular they prefer
facets that allow users to modify the result set via small and uniform facet operations. In addition they
group facets and their values by using a divisive hierarchical clustering technique algorithm leading
to an Extended Facet Tree. Finally, they provide a task-based evaluation of their system regarding effec-
tiveness and efficiency. Compared to our approach, this approach tries to rank facets and facet values
according to the characteristics of the facets and facet values space, but does provide explicit user pref-
erences or ranking of objects according to preference. On the other hand, this is the first method that
targets exploratory search and shows that ranking of facet and facet values can be effective for explora-
tory information needs.
Preference Search
2.7. Motivation and Running Example 29
Finally, Kieling et al. (2011b) propose the substitution of Faceted Search, which they consider as a
tedious and time consuming trial and error process, with Preference Search. Preference Search replaces
lengthy user sessions by one single user request, where the user completes a search mask. The user
input is then automatically compiled into one single Preference SQL query. This query is afterwards
augmented in a context-sensitive and user adaptive fashion by a recommender component using sen-
sors and friends recommendations from a social network. It then presents to the user the BMO objects.
Excluding the recommender system, the above functionality can be easily implemented using our pro-
posed method, by letting the user expressing his preferences over the related facets. Then the system
could return to himthe top objects for each facet that the user has defined a preference (i.e. Pareto opti-
mal set). Furthermore, our method is more expressive, since it allows preferences over attributes with
hierarchically organized values which are possibly set-valued. The support of hierarchies can make the
expression of preferences less time consuming, more intuitive and with less cognitive load. One further
note is that Kieling et al. (2011b) assume that Faceted Search can return empty results, which is not true
in our case, since only categories that lead to non empty results are displayed. Specifically, our hypoth-
esis is that FDT can aid exploratory search by letting the user progressively expressing his information
needs. In addition, since preferences are incomplete and most importantly they change over time, the
proposed Preference Search method, with its single user request can be successful for focalized search,
and not for explorative environments.
2.7 Motivation and Running Example
Let us first motivate the benefits of FDT for decision making over our running example. Consider an in-
ternational dealer of used cars and suppose that the available cars are stored in a relational table of the
form: Car(id, Manufacturer, Model, Category, Price, Color, Power, Volume, Year, Mileage, Fuel,
Location, Comment, Accessories). An instance of the table is shown below:
Id Manufacturer Model Category Price Color Power Volume Year Mileage Fuel Location Comment Accessories
o
1
Porsche Carrera 911 Cabrio 50000 Black 350 3600 2005 54000 Petrol Cefalonia Uncrashed {ABS,AT}
o
2
Alfa Romeo 164 Sedan 15000 Red 180 3000 1995 76000 Petrol Heraklion Crashed {ABS}
... ... ... ... ... ... ... ... ... ... ... ... ... ...
In addition there are three taxonomies that have been designed in order to provide an hierarchical
organization for the values of the attributes manufacturer, fuel, and location. The leaves of these
taxonomies are the domains of the corresponding attributes which are recorded in the tuples of the
relational table
11
. Specifically, assume the taxonomies shown in Figure 2.7.
Figure 2.7: Example Taxonomies
Example 1 Assume that somebody, call him James, wants to change his car. He is interested in a family car, al-
though he preferred sport cars when he was younger. His wife prefers Jeeps but he is reluctant due to the extra
parking space required and because the garage of his home is somehow small. He believes that Japanese and Ger-
man cars are more reliable than French or Korean cars. He likes the fact that Hybrid cars consume less, are more
ecological and that the annual taxes are lower for such cars. James lives at the city of Heraklion, so cars owned by
persons that do not live in the island of Crete (where Heraklion resides) are less preferred for him (due to the trav-
eling time and cost) unless the case is exceptional. In addition he cannot afford an expensive car. Ideally he would
like a Porsche with four doors (e.g. Porsche Panamera) and enough space for luggage, hybrid with consumption less
than 6lt/100Km, bigger than Panamera (to satisfy his wife) but smaller than Cayenne, with less than 10 thousands
kilometers, in sale by his favorite neighbor and at a very good price (e.g. less than 30K Euros), but this is a utopian
desire. James aims at buying one car, but it is probable that he would buy a Porsche Carrera 4s if available at a
very good price, and another decent but inexpensive family car to satisfy the rest requirements.
11
Our model also allows tuples that contain values which are not necessarily leaves of the corresponding taxonomy.
2.7. Motivation and Running Example 31
Although lengthy, the above description is by no means complete. There are a lot of other aspects
that would determine James final decision (years of guarantee, grip, airbags, Euro NCAP stars, color,
trunk, GPS, CD player, trip calculator, sunroof, etc). What we want to stress with this example is that the
specification of preferences is a laborious, cumbersome and time consuming task, and that the resulting
descriptions are in most cases incomplete. Pragmatically, decision making is based on complex trade-
offs that involve several (certain or uncertain) attributes as well as users attitude towards risk (Keeney
and Raiffa (1976)). Moreover preferences are not stable over time.
We believe that it is beneficial to provide users with an interaction method in which the preference
specification cost is paid gradually and depends on the available choices. For example, why spending time
for expressing complex tradeoffs between Porsche models with 4 doors versus those with 2 doors if
no Porsche car is in sale. Therefore an effective interaction that shows users the available choices is
important for reducing the preference specification cost and for speeding up decision making
12
.
In brief, the proposed preference specification actions affect the presentation order of:
facets, i.e. the order by which facets (i.e. criteria, attributes) appear,
terms, i.e. the order of the zoom-in/side points (i.e. criteria values, attribute values) appear
(which can be hierarchically organized and/or set-valued), and
objects (of the focus), i.e. the order by which the objects (i.e. choices) appear.
Now suppose a user who (a) likes European cars, (b) does not like Italian cars, (c) likes Ferrari, and
(d) prefers low prices. According to the framework that we propose, the user can express the above
preferences straightforwardly, i.e. without having to refer to particular European countries or Italian
manufacturers (for expressing (a) and (b)) thanks to the hierarchically organized values, and preference
inheritance). Furthermore, he does not have to express all the above in one shot. He can provide them
gradually and in any order, say (b)-(a)-(d)-(c), and there is no need to define priorities for resolving
the conflicts (e.g. the fact that he likes Ferrari but he does not like Italian cars). The priority will be
deduced automatically by a scope-based conflict resolution rule. For instance, the scope of (b), i.e. Italian
cars, is contained in the scope of (a), which is the set of European cars, so (b) prevails on Italian cars.
Analogously (c) prevails on Ferrari cars (despite the fact that Ferrari is Italian).
12
See also Section 6.3 for a user-based evaluation of the DiFEPreKO hypothesis.
Moreover the user can express more expressive statements like (e) I prefer Asian to European cars,
and (f) I prefer Italian to Korean cars. From these two statements we can deduce that the user prefers
Fiat to Kia, and prefers Toyota to Peugeot. The above are examples of just some of the functionalities
offered by the proposed approach.
With respect to the characteristics described earlier in Section 2.2.1, this work focuses on multi-
dimensional spaces with hierarchically organized attribute domains, and explicitly-specified and crisp qualita-
tive user preferences. We assume that these preferences hold unconditionally (i.e they are context-free),
exact (although we provide support for distance functions), and simple (we assume that preference inher-
itance is not a compound preference and we also provide prioritized and Pareto composition).
We also focus on the preference elicitation process. Preference elicitation refers to the problem of
developing a decision support system capable of generating recommendations to a user, thus assisting
him in decision making. It is important for such a system to model users preferences accurately, find
hidden preferences and avoid redundancy. A survey of preference elicitation methods is given in Chen
and Pu (2004) while a survey of preference elicitation froma computer scientists perspective is given in
Braziunas (2006). Most of the above methods focus on the quantitative approach, i.e. on the elicitation
of multi-criteria value functions. In this thesis we use the term real-time preference elicitation because
according to our approach: (a) the system requires from the user to express his preferences only for
those facets/values that are involved in the available (and restricted) set of choices (i.e. not for the
entire value space), and (b) we exploit the hierarchical organization of terms for reducing the number
of preferences that have to be explicitly specified.
To conclude and to the best of our knowledge this is the first work that proposes an incremental
preference elicitation mode which allows the user to define the desired preference structure gradually
andflexibly, over attributes withhierarchically organizedvalues andpossibly set-valued, andemploys a scope-
based conflict resolution rule.
Chapter 3
A Preference Framework for Multidimen-
sional Information Spaces (Syntax, Seman-
tics and Algorithms)
Contents
3.1 Syntax of the Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 The Domain of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Syntax to Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Set-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.3 Best/Worst Preferences over Hierarchically Organized Values . . . . . . . . . 46
3.3.4 Relative Preferences over Hierarchically Organized Values . . . . . . . . . . . 52
3.3.5 Preferences over Hierarchical Set-Valued Attributes . . . . . . . . . . . . . . . 59
3.4 Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.2 Pareto Composition and BMO-set . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . 65
3.5 A Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
33
34 Chapter 3. A Preference Framework for Multidimensional Information Spaces
In this chapter we extend the interaction of FDT with user actions for preference specification /
elicitation. Specifically, we introduce a preference framework appropriate for information spaces com-
prising resources described by attributes whose values can be hierarchically valued and/or multi-valued.
We define the language, its semantics and the required algorithms. The framework supports preference
inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (prioritization,
Pareto and their combination).
We start by introducing a preference framework for multidimensional information spaces. Specifi-
cally, Section 3.1 introduces the syntax for preference actions, Section 3.2 describes the domain of the
semantics, while Section 3.3 defines the syntax of the semantics for flat, hierarchical, single-valued and
set-valued attributes. Finally, Section3.4 describe the composition(prioritized and Pareto) of preference
actions over multiple facets.
3.1 Syntax of the Language
Here we introduce a language consisting of statements that we call preference actions, which can be easily
enacted by simple input user actions (i.e. mouse selections). We consider an information space as the
one described in Section 2.7.
Specifically, each action has a scopeType and a spec, which consists of an anchor and a rankSpec. In
more detail, the scopeType (either facets order, terms order, or object order) determines which
kind of elements it affects (facets, terms, or objects). Furthermore, each action is anchored (anchor) to
one element which can be a facet, a term or even an object
1
. This anchor allows enacting the preference
actions through the GUI straightforwardly as we will see in Section 5.2.2 (i.e. if the user right-clicks on
an element e, a pop-up window will show and allow the user to select the desired preference action,
where the selected action will be anchored to e. . Finally, each action is associated to a rankSpec (rank
description) which can be lexicographic (for ordering strings lexicographically), count (for ordering
elements based on the number of objects that are classified to them), value (for ordering numerically-
valued facets) and indexedBy (for ordering objects according to the number of facets each object is
indexed by)
2
. The language also defines actions for supporting best / worst (i.e. short-cuts to express
1
Such actions would be interesting for example in expressing positive or negative preference over a specific object.
2
In addition we could also support any other method suggested in the bibliography for automatically ranking facets and
facet-values as discussed in Section 2.6.
3.1. Syntax of the Language 35
preferred / non-preferred according to specific policies) elements ( later on, we extend the language
to also capture relative preferences, Prioritized and Pareto composition, intervals, etc.). Syntactically,
preference actions are defined through the following grammar in a BNF variation:
stmt ::= scopeTypespec
scopeType ::= facets order : | terms order : | objects order :
spec ::= anchorrankSpec
anchor ::= facet F
i
| term t
j
| object o
k
| // the empty string

rankSpec ::= {lexicographic | count | value | indexedBy} {min|max}
| best | worst
| use scoreFunction score() {min|max}
In the above grammar F
i
, t
j
and o
k
, denote names that match a facet, a term or object respectively,
while score is the name of a real-valued function provided by the user or the application programmer
(e.g. around operator for proximity search, which can be the edit distance for categorical values, or
absolute value of distance for numerical values). Some examples follow:
(1) facets order: count max
(2) facets order: facet Manufacturer best
(3) terms order: facet Year value max
(4) objects order: term Location.Cefalonia best
(5) objects order: facet Relevance value max
(6) objects order: use scoreFunction Relevance * dist(Price,20K) max
Before explaining formally their semantics, let us first describe them informally. The 1st action
specifies the order of facets to be in decreasing order with respect to their count information (i.e. max
counts are preferred). The 2nd places the facet Manufacturer at the top of the facets list. The 3rd specifies
that the order of the terms of the facet Year to be in decreasing order. The 4th places all objects classified
(directly or indirectly) under the term Cefalonia at the top of the object ordering.
Now suppose a user who starts the car seeking process by formulating a free text query which the
WSE evaluates over the attribute comment of the database. In this case the user would like to see the
objects in decreasing order with respect to their relevance. The 5th action captures this requirement
where the facet called Relevance corresponds to the score returned by the WSE. Finally, the 6th action
orders the objects based on a function over the relevance score and distance from a given price.
We now extend the syntax to support relative preferences over facets and terms, as shown before:
stmt | facets order : prefer facet F
i
to F
j
stmt | terms order : prefer term t

i
to t
j
stmt | objects order : prefer term t

i
to t
j
Regarding object ranking, we extend the syntax to allowcomposition of preference that synthesize two
or more actions with complex preference constructors over the different facets. Such actions include
Pareto, Pareto Optimal (i.e. same ordering as the skyline), Priority and Combinational composition. The
syntax of such actions is given below:
stmt | objects order : Pareto setOfFacets
stmt | objects order : ParetoOptimal setOfFacets
stmt | objects order : Priority orderedSetOfFacets
stmt | objects order : Combinational bucketOrderedSetOfFacets
Below we introduce some possible extensions, although we do not focus on them. For example, the
compositions described above presuppose a number of object scoped preference actions over each facet
that participates inthe composition. Onthe other hand, inthe skyline
3
operator of SQL, for eachattribute
participating in the skyline, a single preference is expressed along with the operator (i.e. SELECT * FROM
3
In brief, the skylines as in Papadias et al. (2005) are the maximal (w.r.t. preference) elements, i.e. those which are not
dominated by others. This set is also called efficient set, or Pareto optimal set.
3.2. The Domain of Semantics 37
Cars SKYLINE OF price MIN, consumption MIN).
specList ::= F
i
{LOW | HIGH} | specList
stmt | objects order : skylineOf specList
Furthermore, we can extend the syntax so that to support interval-anchored actions and named actions
(that eases the formulation of more complex preferences):
anchor | term interval [ t
i
t
j
]
namedStmt ::= NamedAction String : stmt
Notice that for interval functions, we only consider as the pair of the interval, numerical values that
are values of the same facet F
f
(i.e. t
i
, t
j
T
f
) such that t
i
t
j
. Then with term interval [ t
i
t
j
]
we denote all t
k
T
f
such that t
i
t
k
t
j
. In this case, we use as anchor of a preference action
all available values between t
i
and t
j
. Such actions can be used as shortcuts and can be easily defined
through simple menus.
The complete syntax of the language is given in Appendix A.
3.2 The Domain of Semantics
In general, a preference over a set of elements E can be expressed as a binary relation over the elements
of E. In the described approach, we do not assume that preference relationships are transitive. So we
hereafter assume that a preference relation is the binary relation (E, ) (sometimes we will also use its
dual relation denoted by ). The proposed approach can also be used if we consider transitivity over the
preference relationships (as in Kieling (2002)), i.e. a preference relation is a strict partial order (E, )),
except for set-valued attributes, since the MoreWins-Rule described later in Def. 3 is not transitive.
The actions specified by the syntax allow structuring (ordering) the materialized faceted taxonomy
according to the preferences. Independent of howmany actions have been issued and what their seman-
tics are, the defined preference at each point in time, comprises k +2 preference relations. Specifically:
One over the facets: ({F
1
, . . . , F
k
},
F
),
k preference relations, one for each facet F
i
(of the form (T
i
,
i
)), and
one preference relation for the objects (A,
Obj
).
Let B be the set of user actions the user has issued. We can partition this set to k + 2 subsets (where
k is the number of facets) as follows: B
F
holds the user actions for facets, B
T
i
holds the user actions
for the terms of each facet F
i
and B
Obj
holds the user actions regarding the objects preferences. So
B = B
F
(
i
B
T
i
)B
Obj
. As eachof these sets cancontainmore thanone action, we have to specify how
the corresponding preference relation is defined, e.g. from the actions in B
T
i
to define the preference
relation (T
i
,
i
).
Let us nowintroduce some requirednotions about preference relations. Consider a set E = {Porsche,
Ferrari, Fiat} and a preference relation R
over E consisting of one relationship, specifically R
=
{Porsche Ferrari}. We shall use dom(R
) to denote the elements of E that participate in R
,
here dom(R
) = {Porsche, Ferrari}, and call inactive the elements of E which are not members of
dom(R
), in our case Fiat. Given a preference relation R
, with R
we will denote its dual order. Com-

monly, preference relations are illustrated using Hasse diagrams. In our case (E, R
) can be illustrated
as showninFigure 3.1. Givena preference relationR
and two objects o

1
, o
2
witho
1
o
2
we will denote
that o
1
is preferred to o
2
and with o
1
o
2
the reverse.
Figure 3.1: Hasse Diagram of Preference Relation Over E (E, R
)
Definition 1 (Valid Preference) We consider a preference relation R
over a set of elements E to be

valid, if it is acyclic.
Given a set of objects Obj, a bucket order B on Obj with |Obj| items, is the total order L defined
over the |B| sets B
1
, ..., B
|B|
, where the |B| buckets are a partition
4
of Obj. For any two items o
i
and o
j
in Obj, if they are in the same bucket, there is no preference precedence between o
i
and o
j
, and these
4
All blocks are pairwise disjoint.
3.3. Syntax to Semantics 39
two items are said to be tied. If itemo
i
belongs to B
k
and itemo
j
belongs to B
l
, we say that o
i
is more
preferred to o
j
if and only if B
k
precedes B
l
according to the total order L. A total order on Obj can be
viewed as a special case of a bucket order such that every bucket consists of only one item.
Definition 2 We say that a linear or bucket order Lover E respects a binary relation Rover E, if R L.
3.3 Syntax to Semantics

To describe formally the semantics of the syntax we have to define what the various keywords of the
syntax, like count, mean precisely.
Initially, note that the semantics of lexicographic, count, value, and indexedBy are straightfor-
ward, and each defines a linear or bucket order. The same is true also for use ScoreFunction. Note
however that count is not applicable to objects, while indexedBy is only valid for objects. For a term
t, t.count is the number of objects in A, indexed by term t, or a narrower term of t. For a facet F
i
,
F
i
.count is the number of the elements in A which are indexed by terms of F
i
. For example, consider
the example of Figure 2.1(a) where we have only one facet, say Manufacturer. At that point we have
Manufacturer.count = 8 while for the term Italian we have Italian.count = 3. In the restric-
tion on the set A = {4, 5, 6} that is shown in Figure 2.1(b), we have Manufacturer.count = 3 while
Italian is not shown, since Italian.count = 0. Formally, and using the notations of Table 2.1, we have
t.count = |
I(t) A| and F
i
.count = |J(F
i
) A| where J(F
i
) = {o Obj | D(o) T
i
= } (FDT
notations are described in Table 2.1). The semantics of best/worst(e
i
) and prefer e
i
to e
j
actions are
defined in an aggregated way (i.e. not in isolation) and are clarified next.
3.3.1 Flat Single-Valued Attributes
We will now define the semantics of actions that express qualitative preferences, i.e. actions of the form
best(e
i
), worst(e
i
), and prefer e
i
to e
j
, starting from the case of single-valued and flat attributes.
Let B(resp. W) be the elements of E on which a best (resp. worst) action has been defined. Let R
be the relative preferences (of the form e

i
e
j
) over E provided by the user. We shall now introduce
an algorithm, Alg. Apply, which takes as input these sets and derives one linear or bucket order. The
algorithm also takes a parameter Policy which determines the ordering of the inactive elements (will be
explained later on).
Algorithm 1 Apply(E, B, W, R
, Policy)
Input: the set of elements E, the set of best elements B over E, the set of worst elements W over E, a
set of relative relationships R
over E, and Policy for inactive elements

Output: a bucket order L over E that respects R
1: R
bw
{(b, w) | b B, w W} // each best is preferred than each worst
2: R R
bw
R
//add relative prefs

3: L SourceRemoval(R) //produce blocks with boundaries
4: I E \ (B W dom(R
)) // I contains the inactive elements

5: L
addInactiveElements(L, I, Policy)
6: return L
Algorithm 2 SourceRemoval(R)
Input: a binary relation R over E
Output: a bucket order L over E that respects R
1: L
2: repeat
3: S maximal
(R)
4: R R \ {(x, y) R | x S} // Remove maximal
5: L L.append(S) // Append a bucket to L
6: until S =
7: return L
At first the algorithmconstructs a graphbyconnecting eachbest toeachworst element ((b, w) means
b w). So best/worst are interpreted as eachbest is preferred to eachworst. Thenit adds to the graph
the relationships in R
. Furthermore, we should note here that the parameters Band W actually define
a set of relationships (R
bw
at line 3 of the algorithm), so they could have been expressed directly through
the R
parameter, however we keep them separate as they constitute an easily enacted (for the user)
shorthand. In order to create R
bw
this algorithm assumes that |B| 1 and |W| 1. If this is not the
case, we can use different policies.
Although a linear or bucket order could be produced by traversing the graph in a breadth first search
(BFS) manner (where the first block will contain the more preferred elements, the second the next more
preferred, etc), if the transitive reduction is a DAG (Directed Acyclic Graph, i.e. not a tree), then BFS
could yield wrong results (i.e. the produced linear or bucket order would not respect the condition of
Definition 2). This will be made clear in a following example. Using instead of BFS a topological sorting
algorithm, which yields a linear ordering of the nodes of a DAG such that each node comes before all
nodes to which it has outbound edges, e.g. Alg. SourceRemoval as shown above, we can always get a
linear order that respects R. In particular, Alg. SourceRemoval is based on the source removal algorithm
described in Kahn (1962), satisfying the condition that all removed maximal nodes are inserted in the
same bucket. Initially, it finds all the maximal elements of R, moves them in a bucket, and continues
with the maximal elements of their children, and so on.
Figure 3.2: Example for Flat Single-Valued Attributes
To give an example, let B = {Ferrari}, W = {Fiat, Lancia} and R
= {Porsche Ferrari,
Porsche Fiat}. Figure 3.2 shows at the left the diagramof R, and at the right the result of topological
sorting (as derived by step 5 of Apply), i.e. L = Porsche, Ferrari, {Fiat, Lancia}, meaning that
the bucket order consists of three blocks (the first two are singletons).
Figure 3.3: Example for a DAG
Consider another example, where R
= {Porsche Ferrari, Ferrari Lancia, Lancia

Fiat, Porsche Fiat}. Figure 3.3 shows the resulting total order (i.e. L = Porsche, Ferrari,
Lancia, Fiat) derived by topological sorting. If the final order was derived using BFS, the bucket order
would be L
BFS
=Porsche, {Ferrari, Fiat}, Lancia, although Fiat is the least preferred car. As a
result R L
BFS
(i.e. L
BFS
does not respect R).
Regarding inactive elements (elements not participating in any action), they can be considered as max-
imal or minimal elements according to the application needs (controlled by parameter Policy of Alg.
Apply). For example consider a facet F
i
with values from a set T
i
, and a number of actions that define
the sets B
i
, W
i
, R
i
. By using E = T
i
and calling Alg. Apply, in line 7 we compute the set of inactive
elements I = E\(B
i
W
i
dom(R
i
)) (where dom(R
i
) is the elements of E that participate in R
i
).
Now by using the command addInactiveElements (line 5 of Alg. Apply) and passing as parameters
the bucket order L
, the inactive elements I and the policy based on the application needs, which can be
maximal (resp. minimal), we put the inactive elements at the beginning (resp. end) of L
as a new block.
As a final note, our approach assumes totalitarian semantics regarding the attributes that do not
participate in any preference action. For example, if Ferrari Fiat, then any car manufactured by
Ferrari is preferred to any car manufactured by Fiat. In the opposite case, (i.e. if our approach sup-
ported the ceteris paribus semantics), a Ferrari would be preferred to a Fiat car, provided that these
cars agreed regarding preference on the values of all other attributes.
3.3.2 Set-Valued Attributes
Multi-valued attributes appear in several cases (social tags, clustering, etc). In our running example
suppose that the attribute accessories of the table Car is multi-valued, taking values like ABS, ESP
(Electronic Stability Program), AT (Auto-Transmission), DV D, etc.
Definition 3 (Induced Preference over Sets: MoreWins-Rule)
If s, s
are two subsets of E, with wins(s, s
) we will denote the number of times s beats s
according
to . Formally:
wins(s, s
) = |{(e, e
) | e s, e
, e e
}|
Any subset S of the powerset of E (i.e. S P(E)), can be ordered according to a preference relation
that we will denote by
{}
, defined by the following rule:
s
{}
s
iff wins(s, s
) > wins(s
, s)

As an example consider a set T = {ABS, ESP, AT, DV D} and three statements which define
ABS as best, ESP as worst and that ABS AT. Now consider the following family of sets: S = {{ABS},
{ESP}, {ABS, ESP}, {AT, ABS}, {AT, ESP}, {DV D, ESP}}. The win(s, s
)/win(s
, s) values
of all pairs of sets from the above family are shown in the next table (the last column shows the number
of clear winnings - not ties).
By using Def. 3 (i.e.
{}
) and then applying topological sorting we get the following bucket order
{ABS}, {AT, ABS}, {ABS, ESP}, {{AT, ESP}, {DV D, ESP}}, {ESP}, as shown in Fig. 3.4.
w(s, s
)/w(s
, s) {ABS} {ESP} {ABS,ESP} {AT, ABS} {AT, ESP} {DVD, ESP} all
{ABS} 0/0 1/0 1/0 1/0 2/0 2/0 5/0
{ESP} 0/1 0/0 0/1 0/2 0/1 0/1 0/5
{ABS,ESP} 0/1 1/0 1/1 1/2 2/1 2/1 3/2
{AT,ABS} 0/1 2/0 2/1 1/1 3/0 3/0 4/1
{AT,ESP} 0/2 1/0 1/2 0/3 1/1 1/1 1/3
{DVD,ESP} 0/2 1/0 1/2 0/3 1/1 1/1 1/3
Figure 3.4: Example for Flat Multi-Valued Attributes
Now suppose that both ABS and ESP are defined as best elements, and that both AT and DVD are
defined as worst. In that case it holds:
wins({ABS}, {ABS, ESP}) = wins({ABS, ESP}, {ABS}) = 0
wins({AT}, {AT, DV D}) = wins({AT, DV D}, {AT}) = 0
This means that with wins we get 0 whenever sets with best only elements are compared, and sets with
worst only elements are compared. If we would like to break such ties we could adopt a MoreGoodLessBad-
rule (the more best elements the better and the less worst elements the better). To define it formally,
we first have to introduce some notations. Given an element e we use sup(e) to denote the number of
elements that e dominates, minus 1. Formally, sup(e) = |{e
E | e e
}| 1. Notice that each

worst element takes a negative value. Given a set of values e we define the support of s, denoted by
Support(s), by summing up the support of its terms, i.e. Support(s) =
es
sup(e). Note that since a
worst value takes -1 we can discriminate an s having one worst term from one s
having 10 worst terms

(Support(s) = 1, while Support(s
) = 10). We can now proceed and define:

Definition 4 (Breaking ties: MoreGoodLessBad-rule)
If wins(s, s
) = wins(s
, s) and Support(s) > Support(s
) then s
{}
s
.
Inour case: Support({ABS, ESP}) =2 >Support({ABS}) =1 >Support({AT}) =1 >Support(
{AT, DV D}) = 2, and the induced ordering, i.e. {ABS, ESP}, {ABS}, {AT}, {AT, DV D} ,
satisfies the MoreGoodLessBad-rule.
Toconclude, incase we have preferences over atomic values but the informationspace has set-valued
attributes, then it is enough to use Alg. Apply with a small modification. Initially, we followthe first two
steps of Alg. Apply, in order to compute the relation of the atomic values. We should stress here that
to correctly compute wins we have to take into account the transitive closure of the preference relation.
For example, if a b andb c andwe want to compute wins({a, e}, {c, e}) we shouldconsider that a
c. In other words, we should anticipate the topological sorting of Apply over individual values before
computing wins over sets. Then we compute the wins (and the Support to break ties), to define
{}
.
Afterwards we continue with the next steps of Alg. Apply, i.e. topological sorting and so on, eventually
yielding the final bucket order of the sets. The steps are given in more detail in Alg. 3.
3.3.3 Best/Worst Preferences over Hierarchically Organized Values
So far we have considered single-valued and set-valued attributes over flat (non hierarchically orga-
nized) value domains. Let us now consider hierarchically organized values. As an example if the user
is interested in Italian cars and marks them as best then it is reasonable to apply best also to its
narrower terms, i.e. to Ferrari, Fiat, etc. It is not hard to see that the approach described in the previous
section is not adequate for terms which are not leaves. For example suppose the following set of actions
(using an informal syntax): B = {Best(European), Worst(Italian), Best(Ferrari)}, which define
Algorithm 3 ApplyOverFamiliesOfSets(E, B, W, R
, Policy)
Input: the set of elements E (here each element of E is a set), the set of best elements B, the set of worst
elements W, a set of relative relationships R
, and Policy for inactive elements

Output: a bucket order L over E
1: R
bw
{(b, w) | b B, w W}
2: R R
bw
R
3: R Closure
transitivity
(R) // Addition of the transitively induced links
4: for each e, e
E, s.t. e = e
do
5: if wins(e, e
) > wins(e
, e) then
6: set e
{}
e
7: else if wins(e, e
) < wins(e
, e) then
8: set e

{}
e
9: else if wins(e, e
) = wins(e
, e) then
10: resolve the tie by computing the support(e) and support(e
)
11: L SourceRemoval(
{}
)
12: I E \ dom(
{}
) // I is the set of inactive elements
13: L
14: return L
the sets B = {European, Ferrari}, W = {Italian}. If we apply Alg. Apply without taking into
account the taxonomy we would get the bucket order shown in Figure 3.5 which does not make much
sense, nor helps us to derive the intended ordering of cars.
Figure 3.5: Example of Preferences Without Exploiting Hierarchies
It follows that without proper exploitation of the subsumption relation, the user would have to issue
a high number of actions, all anchored to leaf terms. To tackle this problem, below we introduce a form
of preference inheritance where preferences are inherited to the narrower terms. Let b be an action in
B. We shall use scope(b) to denote the scope of the action b, which is the set of elements (either facets,
terms, or objects) that are affected by this action. To capture inheritance we will redefine the scope of
actions which are anchored to terms of a taxonomy.
Definition 5 (Scope and Inheritance) Let b be an action b = e, rs where e is its anchor and rs the
other part of the action. The scope of b is defined as:
scope(b) = scope(e, rs) =
e
(e)
scope(e
, rs)
where N
(e) stands for e and the narrower elements of e, formally N
(e) = {e}N
+
(e) = {e
| e
e}.
In other words, the scope of b is the union of the scopes of the actions obtained by replacing the
anchor e with a narrower term of e. Table 3.1 defines exactly the scope for each action, while the scopes
of our example according to Def. 5, are shown in the first two columns of Table 3.2.
Table 3.1: Scopes (Direct and Under Inheritance)
scopeType anchor (D)irect scope (I)nherited scope
facet F
i
T
i
T
i
terms order termt
j
{t
j
} N
(t
j
)
objects order termt
j
I(t
j
)

I(t
j
)
Table 3.2: Scopes: Example for Best/Worst Preferences
action scope active scope
b
1
: Best(Europe) {European, German, Audi, BMW, Porsche, French,
Citroen, Peugeot, Italian, Lancia, Ferrari, Fiat,
Lamborghini }
scope(b1) \ scope(b2)
b
2
: Worst(Italian) {Italian, Lancia, Ferrari, Fiat, Lamborgini} scope(b2) \ scope (b3)
b
3
: Best(Ferrari) {Ferrari} scope(b3)
Note that the set of action B = {b
1
, b
2
, b
3
} defines a valid preference, i.e. no cycles are formed (recall
Def. 1). However, if we unfold each b B, based on its scope, then we will get a B
that does not define

a valid preference, e.g. Ferrari will be both best and worst and this forms a cycle. To tackle this problem,
and to provide an intuitive interpretation of users actions, we will introduce what we call active scope,
after first introducing some required definitions.
Definition 6 We say that an action b is equally or more refined than an action b
, denoted by b b
, if
scope(b) scope(b
).
In this way a preorder (reflexive and transitive) relation over B, denoted by (B, ) is defined. In the
case of our example, the Hasse diagram of is shown in Figure 3.6.
Figure 3.6: Hasse Diagram of Actions Refinement
The objective is to use (B, ) for resolving the conflicts incurred due to inheritance. This can be
done by assuming that more specific preferences prevail over less specific ones (specificity). Particularly,
we introduce the following rule:
Definition 7 (Scope-based Dominance Rule)
If A scope(b) scope(b
) then b
is dominated by b on A, and thus action b
should not determine the

ordering of A.
We can now define the active scope of each action, by excluding from its scope the scopes of its direct
children with respect to . Specifically, we can define active scope as:
Definition 8 (Active Scope)
If C(b) denotes the direct children of b with respect to , then the active scope of b, denoted by aScope(b),
is defined as: aScope(b) = scope(b) \ (
C(b)
scope(b
))
Inour example, the active scopes are showninthe Table 3.2. Fromthese we obtainB = ascope(b1)
ascope(b3), and W = ascope(b2), which define a valid preference. Specifically, Alg. Apply will return
(assuming inactive elements go at the end) the following bucket order:
{European, Ferrari, German, Audi, BMW, Porsche, French, Citroen, Peugeot},
{Lancia, Fiat, Lamborghini},
{Asian, Japanese, Toyota, Korean, Kia, American, U.S.A., Chrysler}
Now, consider the same set of actions B but suppose that they are object-scoped instead of term-
scoped, and assume that the table Cars contains the following tuples:
Id Manuf ...
P Porsche
L Lancia
Fi Fiat
Fe Ferrari
T Toyota
The (plain and active) scopes in this case are:
action scope active scope
b
1
: Best(Europe) {P, L, Fi, Fe} {P}
b
2
: Worst(Italian) {L, Fi, Fe} {L, Fi}
b
3
: Best(Ferrari) {Fe} {Fe}
The sets B and W of the active scopes are: B = {P, Fe} and W = {L, Fi}. With these parameters
Alg. Apply will yield the ordering: {P, Fe}, {L, Fi}, {T}.
The algorithm that supports inherited preferences and scope-based resolution of conflicts is Alg.
PrefOrder. It starts by computing the scopes of each action b B (line 2) using Def. 5 in order to
compute the preorder relation (B, ) (line 3). Afterwards, it computes the active scopes using the Def. 8
(line 5), and expands the original set of actions B to a new set of actions B
, by including the newactions

computed by the active scopes (line 6). Then, it parses the new actions set B
in order to get the B, W

and R
(line 8). Finally, it calls Alg. Apply (line 9).

Algorithm 4 PrefOrder(E, B, Policy)
Input: the set of elements E, the set of actions B, and Policy for inactive elements
Output: a bucket order L over E
1: // Part (i): Computation of (B, )
2: Compute the scopes of the actions in B
3: Form (B, )
4: // Part (ii): Efficient Computation of Act. Scopes
5: Use (B, ) to compute the active scopes of the actions in B
6: Use the active scopes to expand the set B to a set B
7: //Part (iii): Derivation of the final bucket order

8: (B, W, R
) Parse(B
)
9: return Apply(E, B, W, R
, Policy) // call to Alg. 1

Let us discuss now a number of propositions.
Prop. 1 If B W = and (T, ) is a tree, then in the expanded (through active scopes) actions, a term
cannot be both Best and Worst.
Proof:
Since (T, ) is a tree, for each termt there is only one and unique path starting fromt and
ending to the root of the tree. The termt will be in active scope of the closest action, i.e. in
the active scope of an action anchored on t, or on its father, or on the father of its father,
and so on. Therefore it can be in the active scope of an action anchored to its closest (in the
path) term. Since B W = that anchor can be either in B or in W (not both), therefore
t cannot be both Best and Worst.
This means that the inheritance of preferences over tree-structured facets cannot create any am-
biguity. However if (T, ) is a DAG (Directed Acyclic Graph), then Prop. 1 does not always hold, e.g.
consider a term having two direct fathers one defined as best, the other as worst. Such actions do not
define a valid preference and below we show how we can detect such cases. Let:
effAnchors(t) = minimal{ t
| t t
and t
is anchor of one preference action}

Prop. 2 If B W = , then there is not any ambiguity about a term t iff the actions in effAnchors(t)
are all either Best or Worst.
Proof:
It is a straightforward consequence of the definitions that a termt will be in active scopes of
the actions anchored in the terms that belong to the set effAnchors(t) = minimal { t
| t
t
and t
is anchor of one preference action}. If all such actions are Best (resp. Worst) state-
ments, then t will be Best (resp. Worst) in the expanded statements. If however some of
these actions are Best and some are Worst, then (since t will be in the active scopes of all
of them) t will be both Best and Worst, and thus the expansion will create ambiguities (and
hence an invalid preference).
Note that Prop. 1 is a special case of Prop. 2, since in trees for each termt it holds:
|effAnchors(t)| 1
Algorithmically we can check whether the actions defined over a DAG-structured facet create an ambi-
guity by checking the condition of Prop. 2 only for those terms which have more than one direct fathers.
Prop. 3 Alg. PrefOrder respects the scope-based dominance rule (Def. 7).
Proof:
Suppose the opposite, i.e. suppose that b, b
and A Obj s.t. A scope(b) scope(b
)
and that PrefOrder orders the elements of Aon the basis of action b
. This cannot be true,

since according to Def. 8, the active scope of b
will not contain A. Notice that although in

the definition of active scopes (Def. 8) only the direct children are used, the scope (defined as
in Def. 5) is based on N
(e) so it takes into account all children wrt . For this reason it is
enough at Def. 8 (and actually more efficient at implementation level) to consider only the
direct children.
3.3.4 Relative Preferences over Hierarchically Organized Values
To complete the expressive power of the proposed actions, here we study the case of relative (qualitative)
preferences over hierarchically organized values. Specifically, our objective is to support sets of prefer-
ences of the form:
(b
1
): Asian European
(b
2
): European Kia
(b
3
): BMW Asian
(b
4
): Kia Fiat
(b
5
): Toyota Kia
whose semantics take into account inheritance, and conflicts are resolved in an intuitive manner. To
this end we will define the scope and the expansion of such preferences.
Definition 9 (Scope of Relative Preferences)
The scope of a preference relationship e
i
e
j
, denoted by scope(e
i
e
j
), is defined as:
scope(e
i
e
j
) = (N
(e
i
) N
(e
j
)) (N
(e
j
) N
(e
i
))

Definition 10 (Expansion of Relative Preferences)
The expansion of a preference relationship e
i
e
j
, denoted by expansion(e
i
e
j
), is defined as:
expansion(e
i
e
j
) = {e
i
e
j
| e
i
N
(e
i
), e
j
N
(e
j
)}
This means that expansion(e

i
e
j
) actually unfolds the preference relationship e
i
e
j
on the
basis of the subsumption relationships, while scope(e
i
e
j
) does not contain any preference relation-
ship (it is used for resolving conflicts as we shall see below). The scope-based ordering of such actions
is defined as before (Def. 6), i.e. b b
iff scope(b) scope(b
). We can now define the active scope of a

preference e
i
e
j
by excluding from its expansion all relationships e
i
e
j
such that (e
i
, e
j
) belongs
to the scope of a child (w.r.t. ) action.
Definition 11 (Active Scope of Relative Preferences)
The active scope of a preference action b, in the context of a set of preference actions B is defined as:
aScope(b) = {e
i
e
j
expansion(b) | b
B s.t. b
b and (e
i
, e
j
) scope(b
)}
which is equivalent to:
aScope(b) = expansion(b) \ (
b
{e
i
e
j
| (e
i
, e
j
) scope(b
)})
Assume that the taxonomy of manufacturers has the form shown in Figure 3.7,
Figure 3.7: Taxonomy of Manufactures
Then the scope-based ordering of preferences b
1
-b
5
is that shown in Figure 3.8, while Table 3.3 shows
the scopes, expansion and active scopes of the actions.
Figure 3.8: Hasse Diagram of Scope-Based Ordering of Preference Actions
Table 3.3: Scopes: Example for Relative Preferences
preference expansion active scope
b
1
: Asian European Asian European,
Asian BMW,
Asian Fiat,
Kia European, Kia
BMW, Kia Fiat,
Toyota European,
Toyota BMW,
Toyota Fiat,
Lexus European,
Lexus BMW,
Lexus Fiat
Asian European,
Asian Fiat,
Toyota European,
Toyota Fiat,
Lexus European,
Lexus Fiat
b
2
: European Kia European Kia,
BMW Kia,
Fiat Kia
European Kia,
BMW Kia
b
3
: BMW Asian BMW Asian,
BMW Kia,
BMW Toyota,
BMW Lexus
BMW Asian,
BMW Kia,
BMW Toyota,
BMW Lexus
b
4
: Kia Fiat Kia Fiat Kia Fiat
b
5
: Toyota Kia Toyota Kia Toyota Kia
As in the case of Best/Worst preferences (and Prop. 1 and 2), here we have to examine whether the
expansion of relative preferences creates ambiguities (conflicts), apart from those which are resolved
by the scope-based rule, and how we can identify such cases.
Let Bbe a set of relative preference actions, whichdefine a validpreference relationR
over a (T, ).
We will examine whether a preference relationship between two terms e and e
of T (either e e
or
e
e), can be in the active scope of more than one action in B. If this holds then this means both e e

and e
e could belong to the expanded (through the active scopes) preference relation, and thus that
preference relation would be invalid.
Let us make the hypothesis that a relationship e e
belongs to the active scope of two actions b

i
and b
j
such that b
i
= b
j
. Suppose that b
i
: t
i
t
i
and b
j
: t
j
t
j
. Certainly e e
should belong to the

expansions of both b
i
and b
j
. Membership to expansion of b
i
means: e t
i
and e
t
i
. Membership to
expansion of b
j
means: e t
j
and e
t
j
. We can identify the following cases:
(i) if t
i
t
j
and t
i
t
j
then it holds b
i
b
j
and thus e e
can belong to the active scope of b

i
only (and not of b
j
).
(ii) if t
j
t
i
and t
j
t
i
then it holds b
j
b
i
and thus e e
can belong to the active scope of b

j
only (and not of b
i
).
(iii) if t
i
t
j
and t
j
t
i
, or t
j
t
i
and t
i
t
j
then neither b
i
b
j
nor b
j
b
i
holds. This means
that in such cases it could belong to the active scopes of both. An example is shown at Figure 3.9
(left).
(iv) If t
i
||t
j
and/or t
j
||t
i
, again we have b
i
b
j
and b
j
b
i
, meaning that e e
would belong to
the active scopes of both. Note that the case t
i
||t
j
and t
j
||t
i
can occur in DAGs, and an example is
shown at Figure 3.9 (right). For the case of trees we cannot have t
i
||t
j
, since we know that e t
i
and e t
j
(it is not possible to hold all these three relationships). For the same reason in trees it
cannot hold t
j
||t
i
.
A P B R
I J
I > J due to A > B
I < J due to R > P
e e e e:
e e: e > e
European
German
BMW
Asian
Korean
KIA
BMW > KIA due to German > Asian
BMW < KIA due to Korean > European
Figure 3.9: Examples of Conflicts
The cases (iii) and (iv) are indicative situations when conflict can occur. Note that case (iii) can occur
both in trees and DAGs, while case (iv) only in DAGs.
It follows from the above that we need methods for detecting the cases where inheritance causes
invalidities. One method to do so, is to compute the expansion and then check for cycles. This means
that a classical cycle detection algorithm (e.g. topological sort) is enough for detecting such cases.
We could also avoid the expansion step in some cases. Belowwe elaborate on a method that could be
applied for the case of tree-structured taxonomies. To begin with, let R
e
denote the expanded (through
the notion of active scopes) preference relation of R
(obviously, R
R
e
).
Prop. 4 (Relative Inherited Preferences and Conflicts)
For tree-structuredtaxonomies, the expansionthroughactive scopes of a validpreference relationR
(yield-
ing a preference relation R
e
) can create a conflict iff (if and only if) there are two actions in R
(not
necessarily different) of the forma b and c d such that either:
(i) a d and c b hold, or
(ii) b c and d a, hold.
If these actions are the same, meaning that a = c and d = b, the formulation of the proposition becomes:
R
e
has a conflict iff there is an action a b and either a b or b a.
Proof:
(Direction: If the conditions of the proposition hold then R
e
has a conflict)
As we can see fromFigure 3.10 (i), if the conditions of the proposition hold, then R
e
contains
a conflict (either between a and c, or between d and b). Regarding the special case (where
the two actions are the same), note that if b a then we get the cycle b b (see Figure 3.10
(ii-left)). If a b then we get the cycle a a (see Figure 3.10 (ii-middle)). Note than non
trivial cycles (i.e. not self-cycles) can also occur, e.g. if c b a, with the expansion we
will get c b and b c (see Figure 3.10 (ii-right)).
(Direction: if R
e
has a conflict then the conditions of the proposition hold)
Trivial Cycle
Suppose that R
e
has a trivial cycle of the forma a. Since this relationship cannot belong
to R
(which is acyclic by assumption), it should be result of an inherited action, therefore

a should have a superclass, say sp, for which there is an action sp sb, and this action
for being inheritable to a, it should also be a sb. Therefore it should hold a sb and
a sp. However, since is a tree, sb and sp cannot be incomparable (i.e. it cannot be
d
a
b
c
a
d
c
b
a
b
b
a
a
b
c
a
e
c
e
d b
(i)
(ii) (iii)
Figure 3.10: Relative Inherited Preferences and Conflicts Examples
sb||sp), therefore it should either be a sb sp or a sp sb. We reached to the
conclusion that there exists an action sp sb and either sb sp or sp sb. This is exactly
what the proposition states.
Cycle of the forme e
e
A relationship e e
can belong to R
e
either because it belongs to R
, or due to an action
a b to whose active scope the relationship e e
belongs. In the latter case it should be

e a and e
b.
Analogously, a relationship e
e can belong to R
e
either because it belongs to R
, or due
to an action c d to whose active scope the relationship e e
belongs. In the latter cases,

it should be e
c and e d (illustrated at Figure 3.10 (iii)).

However, since is a tree it cannot be a||d nor b||c. Therefore we can have one of the
following four cases (also illustrated at Figure 3.11).
(i) a d and b c
(ii) a d and c b
(iii) d a and b c
(iv) d a and c b
We cannot be in case (i) because in that case e e
would not be in the active scope of c d

(that would contradict one of our hypothesis). Similarly, we cannot be in case (iv) because
in that case e e
would not be in the active scope of a b. So only (ii) and (iii) can hold.
Notice that we reached to the exact conditions that the proposition states.
d
a
e
c
e
b
(i)
d
a
e
b
e
c
a
d
e
c
e
b
a
d
e
b
e
c
(ii) (iii) (iv)
Figure 3.11: Examples of Cycles of the Forme e
Based on the above proposition, belowwe describe an algorithmic method for identifying such prob-
lems without having to expand R
, i.e. without having to compute R

e
. For each pair of statements
(i.e. for each pair of relationships in R
) we check whether the condition of Proposition 4 holds. This

means that we need to check the proposition |R
|(|R
| 1)/2 times. To check the proposition once, we

have to check whether four relationships hold. If the transitive closure of is stored then this can be
checked fast (one scan, or even faster if indexes exist). If the transitively induced relationships are not
stored, then we can check whether t t
by applying the reachability algorithm with cost analogous to

the average depth of the taxonomy. If however the taxonomy has been labeled (e.g. using Agrawal et al.
(1989)), then we can check whether t t
in O(1).
At application level, the detected invalidities can be managed in various ways. For instance, we can
inform the user and ask him to revise his preferences or to resolve the ambiguity. Alternatively one
could consider the preference invalid and thus ignore it, or cut the inheritance at some points (e.g. at
the points of conflicts), or employ other conflict resolution rules (e.g. the closer in hierarchy prevails,
or the more recent action prevails, etc). All these are application-specific issues that go beyond the focus
of this thesis.
Obviously (B, ) contains relationships between preferences of the same kind (i.e. Best/Worst and
Relative). Therefore, when we are in the first step of the algorithm where we compute (B, ), first we
calculate the actions refinement preorder for Best/Worst preference actions, then for Relative prefer-
ence actions and finally we return the union of these relationships.
Returning to preference-based order and the actions b
1
-b
5
given at the beginning of this section, we
can apply Alg. PrefOrder as it is (assuming the scope defined as in this section). Specifically, to produce
3.4. Multi-Facet Preferences 57
the induced ordering we have to pass to Alg. Apply through R
, all active scopes of the actions in B.

Figure 3.12 shows the transitive reduction (i.e. the Hasse Diagram) of the relation R
for the prefer-

ences over the Manufacturer attribute. The derived bucket order by Alg. PrefOrder in our example is:
{BMW}, {Asian, Toyota, Lexus}, {European}, {Kia}, {Fiat} , and its restriction on the leaves
of the taxonomy is: {BMW}, {Toyota, Lexus}, {Kia}, {Fiat} which captures the intuition.
Figure 3.12: Hasse Diagram of the Relation R for the Manufacturer Attribute
3.3.5 Preferences over Hierarchical Set-Valued Attributes
In case we have set-valued attributes over hierarchically organized value domains, we can again exploit
inheritance to order the sets. In particular, consider the scope and active scope as defined earlier, in a way
that captures relative preferences. We can apply Alg. PrefOrder up to line 8 (i.e. just before calling the
algorithm Apply), and then apply the algorithm described in Section 3.3.2 (based on the relation
{}
),
to derive the final bucket order. The steps are sketched in more detail in Alg. 5.
3.4 Multi-Facet Preferences
Here we describe the case where we have actions that concern more than one facets. The user can define
separately a preference for each facet (using one or more actions) and then compose them using Priority
or Pareto (Pareto Optimal is a subcase of Pareto) operators, or a composition of the previous operators.
Algorithm 5 PrefOrderSetValued(E, B, Policy)
Input: the set of elements E (E is a family of sets), the set of actions B, and Policy for inactive elements
Output: a bucket order L
over E
1: // As in Alg. 4:
2: Compute the scopes of the actions in B and form (B, )
3: Use (B, ) to compute the active scopes of the actions in B
5: (B, W, R
) Parse(B
)
6: // As in Alg. 3:
7: R
bw
{(b, w) | b B, w W}
8: R R
bw
R
9: R Closure
transitivity
(R) // Addition of the transitively induced links
10: Compute
{}
based on wins and support as in Alg. 3
11: L SourceRemoval(
{}
)
12: I E \ dom(
{}
) // I is the set of inactive elements
13: L
14: return L
3.4.1 Prioritized Composition

Prioritized composition (Kieling (2002)) of two preference relations P1 and P2, denoted by P1 P2,
meaning that P1 has more priority than P2, is defined as:
x
P1P2
y iff x
1

P1
y
1
(x
1
= y
1
x
2

P2
y
2
)
Let B
i
and B
j
be two sets of object-scoped actions. Suppose the user has defined B
i
B
j
, and let
A be the current object set (the focus). The ordering of A with respect to B
i
B
j
, is derived by order-
ing each block defined by the preference B
i
, using the preferences in B
j
. The exact steps are given in
Alg. MFPriority. At Step 1 we derive the blocks defined by the preference B
i
. At Step 2 we order the
elements of each block derived from the first step, using the actions in B
j
. Finally, at Step 3 we just put
these blocks in the order specified by Step 1.
Let us now denote with o
1
o
2
that two objects are indifferent based on the relation R
, i.e. that
neither o
1
o
2
or o
2
o
1
holds. A refinement of the indifference relation associated to a preference
relation R
is to consider objects o
1
, o
2
as equivalent o
1
o
2
5
, if o
1
o
2
and for all o Obj such that
o
1
o or o o
1
, it is o
2
o or o o
2
respectively and vice verca. If o
1
o
2
and o
1
o
2
we say that
5
Another symbol used for equivalence in the bibliography is .
Algorithm 6 MFPriority(A, B
i
, B
j
)
Input: the objects of current focus A, the actions B
i
for facet F
i
, and the actions B
j
for facet F
j
Output: a bucket order L of Acorresponding to B
i
B
j
1: We call the Alg. PrefOrder(A, B
i
) and let L = A
1
, . . . A
M
be the produced bucket order where
M is the number of blocks returned.
2: For each block A
m
of L (1 m M) where |A
m
| > 1, we call PrefOrder(A
m
, B
j
), returning a
bucket order L
m
= A
m1
, . . . , A
mz
.
3: We replace each block A
m
of L with its bucket order L
m
and this yields the final bucket order L =
L
1
, . . . , L
M
.
objects o
1
and o
2
are incomparable Ciaccia and Torlone (2011).
It is easy to see that the produced bucket order interprets prioritized composition () as:
x
P1P2
y iff x
1

P1
y
1
(x
1

P1
y
1
x
2

P2
y
2
)
where x
1

P1
y
1
means that x
1
and y
1
are in the same block in the bucket order produced by P1. This
means that the relative ordering of the blocks defined by P1 is preserved, and this policy is aligned with
what the user expects to see. This is the prioritized composition described in Chomicki (2003), which is
referred to as triangle composition in Ross (2007).
A refinement of the above is to use equivalence instead of indifference () (see Section 3.2). This
refinement can be made, since in our case if o
1
o
2
and for all o Obj such that o
1
o or o o
1
, it
is o
2
o or o o
2
respectively and vice verca (our algorithms provide a bucket order for all elements,
since they also consider inactive elements). As a result, the produced bucket order interprets prioritized
composition () as follows:
x
P1P2
y iff x
1

P1
y
1
(x
1

P1
y
1
x
2

P2
y
2
)
The above algorithm can be straightforwardly generalized to more than two facets. For example
assume that the user has defined:
B
Loc
B
Manuf
B
price
Moreover, assume the actions B
Loc
= {Best(Crete), Worst(Chania)}, B
Manuf
={Best(European),
Worst(Italian), Best(Ferrari)} and B
Price
={price min}, and suppose that the current focus Acon-
sists of the following tuples:
Id Location Manuf. Price ...
L Heraklion Lancia 10 ...
B Chania BMW 20 ...
A
1
Athens Audi 20 ...
A
2
Athens Audi 21 ...
F
1
Heraklion Ferrari 100 ...
F
2
Rethymno Ferrari 80 ...
The constituent and final bucket orders are shown below(for the composed preferences we use nest-
ing to make clear how each block was derived):
L
BLoc
= {L, F
1
, F
2
}, {B}, {A
1
, A
2
}
L
B
Manuf
= {B, A
1
, A
2
, F
1
, F
2
}, {L}
L
BPrice
= {L}, {B, A
1
}, {A
2
}, {F
2
}, {F
1
}
L
BLocB
Manuf
= {F
1
, F
2
}, {L}{B}, {A
1
, A
2
}
L
B
Loc
B
Manuf
B
Price
= {F
2
}, {F
1
}{L}{B}, {A
1
}, {A
2
}
= F
2
, F
1
, L, B, A
1
, A
2
Note that the above specified prioritized composition method (and algorithm) does not adopt the
ceteris paribus semantics, since it does not require equality of values. Higher priority implies preference
over all other attributes, and therefore it adopts the totalitarian semantics. Totalitarian semantics are too
strong and canlead to cyclic preferences whenseveral comparative preference statements are dealt with
(Neves and Kaci (2010)). In our framework we always compose facets either by Priority or Pareto compo-
sition or a combination of them to avoid this kind of cycles. We can even assume a default behaviour of
automatic facet priority driftage, based on the interaction of the user with the facets. The second prior-
itized facet assumes totalitarian semantics for each block of the bucket order returned by ordering the
elements based on the most prioritized facet, the third prioritized facet assumes totalitarian semantics
per sub-block of the previous bucket order, etc. For example in the previous case, since the Location facet
is prioritized over the Manufacturer facet, and Heraklion Chania, the Lancia will be preferred to the
BMW, although Italian cars are not preferred over other European cars.
3.4.2 Pareto Composition and BMO-set
The Pareto composition (Kieling (2002)) assumes that the preferences expressed over different facets are
equally important. Typically, the Pareto composition of two preference relations P1 and P2, denoted by
P1 P2, is defined as:
x
P1P2
y iff (x
1

P1
y
1
(x
2
= y
2
x
2

P2
y
2
)) (x
2

P2
y
2
(x
1
= y
1
x
1

P1
y
1
))
The winnow(Chomicki (2003)) operator or Pareto optimal or Best operator (Torlone and Ciaccia (2002)),
selects the maximal elements of the preference order defined using the Pareto composition (i.e. BMO-
set). There are many algorithms for the winnow operator, like BNL described in Brzsnyi et al. (2001)
or SFS described in Chomicki et al. (2003). The winnow operator is also implicit in skyline queries, which
supports only LOWEST and HIGHEST preferences based on the Preference Algebra described in Kieling
(2002). Methods for calculating skylines over partially ordered data have also started to emerge as in
Zhang et al. (2010).
Let B
i
and B
j
be two sets of object-scoped actions. Suppose the user has defined B
i
B
j
, and let
A be the current object set (the focus). Lets denote with A
BMO
the BMO-set of the focus A. The exact
steps for computing the Pareto are given in Alg. MFPareto.
Algorithm 7 MFPareto(A, B
i
, B
j
)
Input: the objects of current focus A, the actions B
i
for facet F
i
, and the actions B
j
for facet F
j
Output: a bucket order L of Acorresponding to B
i
B
j
1: We call the Alg. PrefOrder(A, B
i
) and Alg. PrefOrder(A, B
j
) for facets F
i
and F
j
and let L
i
=
A
i1
, . . . A
im
be the produced bucket order for facet A
i
and L
j
= A
j1
, . . . A
jn
for facet A
j
, where
mand n is the number of blocks returned for each facet resp.
2: while the bucket orders B
i
and B
j
are not empty do
3: Get the maximal elements of each bucket order, i.e. A
imax
and A
jmax
4: Check which objects in the bucket orders A
imax
and A
jmax
are not dominated by other objects in
the A
j
and A
i
bucket orders respectively. These objects belong to the current BMO-set A
BMOcurrent
5: Append A
BMOcurrent
to returned bucket order L
6: Remove objects in A
BMOcurrent
from bucket orders B
i
and B
j
7: return Bucket order L
Initially, we derive the bucket orders defined by the preference actions B
i
and B
j
. Then we get the
maximal elements from each bucket order (i.e. the objects of the current BMO-set are included in them)
andtest whichobjects are not dominatedby others (by checking the bucket order of the other preference
action). These objects are part of the current BMO-set, and are removed from the initial bucket orders
B
i
and B
j
. Then we continue computing the next BMO-set of the remaining objects. Notice, that if we
are interested only in the Pareto optimal, i.e. winnow operator, we need to find only once the BMO-set.
One can easily see that the produced bucket order interprets Pareto composition () as:
x
P1P2
y iff (x
1

P1
y
1
(x
2

P2
y
2
x
2

P2
y
2
)) (x
2

P2
y
2
(x
1

P1
y
1
x
1

P1
y
1
))
where x
1

P1
y
1
means that x
1
and y
1
are in the same block in the bucket order produced by P1. Again,
in our case indifference means equivalence, so the finally produced bucket order is interpreted as:
x
P1P2
y iff (x
1

P1
y
1
(x
2

P2
y
2
x
2

P2
y
2
)) (x
2

P2
y
2
(x
1

P1
y
1
x
1

P1
y
1
))
The above algorithm can be straightforwardly generalized to more than two facets. For example,
assume that the user has defined:
B
Loc
B
Manuf
B
price
Loc
={Best(Crete), Worst(Chania)}, B
Manuf
={Best(European),
Price
= {price min}, and suppose that the current focus A
consists of the following tuples:
B Chania BMW 20 ...
A
1
Athens Audi 20 ...
A
2
Athens Audi 21 ...
F
1
Heraklion Ferrari 100 ...
F
2
Rethymno Ferrari 80 ...
The constituent bucket orders are shown below:
L
B
Loc
= {L, F
1
, F
2
}, {B}, {A
1
, A
2
}
L
B
Manuf
= {B, A
1
, A
2
, F
1
, F
2
}, {L}
L
BPrice
= {L}, {B, A
1
}, {A
2
}, {F
2
}, {F
1
}
From the above we can see that A1 dominates A2, since A1 is less expensive and has the same pref-
erence regarding Location and Manufacturer. In addition, A1 is dominated by B, since they have the same
price and the manufacturer is equally preferred, but B is located in Chania, which is preferred over the
inactive Athens. Finally, F1 is dominated by F2, since F2 is less expensive. Then, the final bucket order
returned by the algorithm is:
L
BLocB
Manuf
BPrice
= {L, F
2
}, {F
1
, B}, {A
1
}, {A
2
}
The Pareto optimal set (i.e. the result of the winnow operator or skyline operator), is A
BMO
=
{L, F
2
}, which is the maximal element of the above bucket order.
Pareto composition assumes Ceteris paribus semantics. Recall that Ceteris paribus semantics means
that if o
1
o
2
for a specific attribute, I prefer o
1
to o
2
considering that for all the other attributes o
1
and o
2
are equal, (i.e. objects are equivalent). Furthermore, we expand the Ceteris paribus semantics by
accepting that if o
1
o
2
for a specific attribute attr
1
, o
1
ando
2
are at least equal for all other attributes.
So we also accept o
1
to be preferred to o
2
for another attribute attr
N
instead of being equal. Here we
assume that all else equal is captured by the o
1
o
2
(equivalence) operator (i.e. o
1
and o
2
are in the
same bucket for a specific attribute).
For example in the previous case, A1 is preferred to A2 since it is less expensive and all the rest
attributes are the same. Furthermore, L is preferred to F2, since Heraklion Rethymnon, and L is
less expensive than F2.
3.4.3 Combination of Priority and Pareto Compositions
In addition, we can provide combinations of the Priority and Pareto compositions. For example assume
that L
1
is the bucket order returned by Alg. MFPriority (i.e. we have a composition of type P11
P12 ... P1k ) and that L
2
is the bucket order returned by Alg. MFPareto (i.e. composition of type
P21P22...P2l ). Then, we cancombine the previous bucket orders, using either Priority or Pareto
composition, by calling Alg. MFPriority or MFPareto resp (combining also their semantics). There are
works like the one described in Neves and Kaci (2010) that provide a combination of Priority and Ceteris
paribus semantics. In this case instead of calculating the bucket orders (the first step of each algorithm)
we can pass the already computed buckets L
1
and L
2
to the appropriate algorithm.
In this way we can calculate Priority compositions of the type:
(P11 P12 ... P1k) (P21 P22 ... P2l)
(P11 P12 ... P1k) (P21 P22 ... P2l)
(P11 P12 ... P1k) (P21 P22 ... P2l)
Respectively we can calculate Pareto compositions of the type:
(P11 P12 ... P1k) (P21 P22 ... P2l)
(P11 P12 ... P1k) (P21 P22 ... P2l)
(P11 P12 ... P1k) (P21 P22 ... P2l)
Compositions of the type:
(P11 P12 ... P1k) (P21 P22 ... P2l)
(P11 P12 ... P1k) (P21 P22 ... P2l)
would return equivalent results as compositions:
(P11 P12 ... P1k P21 P22 ... P2l)
(P11 P12 ... P1k P21 P22 ... P2l)
respectilve, since according to Kieling (2002) Priority and Pareto compositions are associative.
As an example, assume that the user has defined:
(B
Loc
B
Manuf
) B
price
Loc
= {Best(Crete), Worst(Chania)}, B
Manuf
={Best(European),
Price
={price min}. Finally suppose that the current focus A
consists again of the same following tuples:
3.5. A Complete Example 65
B Chania BMW 20 ...
A1 Athens Audi 20 ...
A2 Athens Audi 21 ...
F1 Heraklion Ferrari 100 ...
F2 Rethymno Ferrari 80 ...
The constituent and final bucket orders are shown below.
L
BLoc
= {L, F1, F2}, {B}, {A1, A2}
L
B
Manuf
= {B, A1, A2, F1, F2}, {L}
L
BPrice
= {L}, {B, A1}, {A2}, {F2}, {F1}
L
BLocB
Manuf
= {F1, F2}, {L}{B}, {A1, A2}
L
(BLocB
Manuf
)BPrice
= {L, F2}, {F1, B}, {A1}, {A2}
In case the user had defined the opposite combination:
(B
Loc
B
Manuf
) B
price
then the constituent and final bucket orders would be:
L
B
Loc
B
Manuf
= {F1, F2}, {B}, {A2, A1}, {L}
L
B
Price
= {L}, {B, A1}, {A2}, {F2}, {F1}
L
(BLocB
Manuf
)BPrice
= {F2}, {F1}, {B}, {A1}, {A2}, {L}
3.5 A Complete Example
This section provides a complete example for making more clear the semantics of preferences state-
ments. Consider the following set of preference actions:
b
1
: Best(Europe)
b
2
: Worst(Italian)
b
3
: Porsche Ferrari
b
4
: Fiat Korean
b
5
: Japanese French
The scope-based ordering of actions are shown in Fig. 3.13, where the left diagram concerns the
best/wost actions, while the right one concerns the relative preference actions. The scopes and active
scopes of the actions are shown in Table 3.4.
Figure 3.13:
Scope Based Ordering of Actions (Left for Best/Worst Actions, Right for Relative
Preference Actions): Complete Example
It follows, that Alg. Apply will be called with the following parameters:
Param Param value
B European, German, Audi, Bmw, Porsche, French, Citroen,
Peugeot
W Italian, Lancia, Ferrari, Fiat, Lamborghini
R
Porsche Ferrari, Fiat Korean, Fiat Kia, Japanese

French, Japanese Citroen, Japanese Peugeot, Toyota
French, Toyota Citroen, Toyota Peugeot, Lexus French,
Lexus Citroen, Lexus Peugeot
The diagram of R
bw
is shown in Figure 3.14, while the diagram of R
is shown in Figure 3.15. The

final diagram of R is shown in Figure 3.16. For reasons of space names are abbreviated.
The returned bucket order, assuming all these actions are term-scoped is:
{E, G, A, B, Po, J, T, Lx}, {Fr, C, Pe}, {I, Lmb, Fe, Fi, La}{Ko, Ki}
The bucket order over the leaves of the taxonomy (i.e. car manufacturers) is:
Table 3.4: Complete Example: Scopes and Active Scopes
action scope / expansion active scope
b
1
: European, German, Audi, BMW,
Porsche, French, Citroen, Peugeot,
Italian, Lancia, Ferrari, Fiat,
Lamborghini
European, German, Audi, Bmw,
Porsche, French, Citroen,
Peugeot
b
2
: Italian, Lancia, Ferrari, Fiat,
Lamborghini
Italian, Lancia, Ferrari, Fiat,
Lamborghini
b
3
: Porsche Ferrari Porsche Ferrari
b
4
: Fiat Korean, Fiat Kia Fiat Korean, Fiat Kia
b
5
: Japanese French, Japanese
Citroen, Japanese Peugeot, Toyota
French, Toyota Citroen, Toyota
Peugeot, Lexus French, Lexus
Citroen, Lexus Peugeot
Japanese French, Japanese
Citroen, Japanese Peugeot,
Toyota French, Toyota
Citroen, Toyota Peugeot,
Lexus French, Lexus
Citroen, Lexus Peugeot
Figure 3.14: Hasse Diagram for the Relation R
bw
: Complete Example
Figure 3.15: Hasse Diagram for the Relation R
: Complete Example
{A, B, Po, T, Lx}, {C, Pe}, {Lmb, Fe, Fi, La}{Ki}
Now suppose an object-relational database (i.e. a database that supports multi-valued attributes)
containing the following tuples shown in Table 3.5.
Figure 3.16: Hasse Diagram for the Relation R: Complete Example
Table 3.5: Tuples in Database: Complete Example
Id Manufacturer Price Accessories
C Citroen 10 {DVD}
B BMW 20 {ABS, AT}
A
1
Audi 20 {ABS, MT, DVD}
A
2
Audi 21 {ABS}
F
1
Ferrari 100 {ESP, MT}
F
2
Ferrari 80 {ESP, ABS, MT}
P Porsche 150 {ESP}
F
3
Fiat 5 {}
K Kia 12 {DVD}
T Toyota 20 {ABS, AT, ESP, DVD}
Then, if we apply the manufacturers ordering to the specific objects in the table, we get:
L
B
Manuf.
= {B, A
1
, A
2
, P
1
, T}, {C}, {F
1
, F
2
, F
3
}, {K}
Now consider the following three preference actions over the attribute Accessories:
b
1
: Best(ABS)
b
2
: Worst(DVD)
b
3
: AT MT
These actions define the following preference relation R:
ABS AT
| |
DVD MT
Suppose that we have to order the values that appear in the attribute Accessories of the tuples in
Table 3.5 according to preference, i.e. we want to order the set:
{ {}, {DV D}, {ABS}, {ESP}, {ABS, AT}, {ESP, MT},
{ABS, MT, DV D}, {ESP, ABS, MT}, {ABS, AT, ESP, DV D} }
w(s, s
)/w(s
, s) { } {ABS} {DVD} {ESP} {ABS, AT} {ESP, MT} {ABS,MT,DVD} {ESP, ABS, MT} {ABS, AT, ESP, DVD} all
{} 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
{ABS} 0/0 0/0 1/0 0/0 0/0 0/0 1/0 0/0 1/0 3/0
{DVD} 0/0 0/1 0/0 0/0 0/1 0/0 0/1 0/1 0/1 0/5
{ESP} 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
{ABS,AT} 0/0 0/0 1/0 0/0 0/0 1/0 2/0 1/0 1/0 5/0
{ESP,MT} 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/1 0/2
{ABS, MT, DVD} 0/0 0/1 1/0 0/0 0/2 0/0 1/1 0/1 1/2 1/4
{ESP,ABS,MT} 0/0 0/0 1/0 0/0 0/1 0/0 1/0 0/0 2/1 4/2
{ABS,AT,ESP,DVD} 0/0 0/1 1/0 0/0 0/1 1/0 2/1 1/2 1/1 3/3
The ordering of these values according to MoreWins rule (i.e. Def. 3 in Section 3.3.2) is shown in the
Hasse diagram of Figure 3.17. We can resolve the ties by using the MoreGoodLessBad rule (i.e. Def. 4).
Specifically, Support({}) = 1, Support({ABS}) = 0, Support({DV D}) = 1, Support({ESP})
= 1, Support({ABS, AT}) = 0, Support({ESP, MT}) = 2, Support({ABS, MT, DV D}) = 2,
Support({ESP, ABS, MT}) = 2, and finally Support({ABS, AT, ESP, DV D}) = 2.
As a result for the empty set {} we have {} {ESP, MT}, {} {ABS, MT, DV D}, {}
{ESP, ABS, MT}, {} {ABS, AT, ESP, DV D}. For {ABS} we have {ABS} {}, {ABS}
{ESP}, {ABS} {ESP, MT} and {ABS} {ESP, ABS, MT}, while for {DV D} , {DV D}
{ESP, MT}. Finally, regarding {ABS, AT}, {ABS, AT} {ESP}, while for {ESP}, {ESP}
Figure 3.17:
Hasse Diagram for Ordering Ordering Multi-Valued Attributes According to
MoreWins Rule: Complete Example
{ESP, MT}, {ESP} {ABS, MT, DV D}, {ESP} {ESP, ABS, MT} and finally {ESP}
{ABS, AT, ESP, DV D}.
The ordering of these values according to MoreWins rule (i.e. Def. 3 in Section 3.3.2) is shown in the
Hasse diagram of Figure 3.18.
Figure 3.18:
Hasse Diagram for Ordering Multi-Valued Attributes According to MoreGoodLess-
Bad Rule: Complete Example
After running topological sorting we get the following final bucket order over the sets
{{ABS}, {ABS, AT}}, {{ESP}, {}},
{{ABS, AT, ESP, DV D}, {ESP, ABS, MT}},
{ABS, MT, DV D}, {DV D}, {ESP, MT}
If we assume the tuples of the Table 3.5 , the expressed preference actions are object scoped, then
the final bucket ordering is:
L
B
Access.
= {A
2
, B}, {P
1
, F
3
}, {T, F
2
}, {A
1
}, {K, C}, {F
1
}
Suppose that we also want cars to be sorted according to their price in ascending order, i.e. the order
of the cars in Table 3.5 is
L
B
Price
= {F
3
}, {C}, {K}, {T, B, A
1
}, {A
2
}, {F
2
}, {F
1
}, {P
1
}
Nowconsider that (B
Manufacturer
B
Price
)B
Accessories
. As a result, according to previous bucket
orders we have:
L
B
Manuf.
B
Price
= {F
3
, B, A
1
, T}, {C, A
2
}, {K, P
1
}, {F
2
}, {F
1
}
and
L
(B
Manuf.
BPrice)BAccessories
= {{B}, {F
3
}, {T}, {A
1
}}, {{A
2
}, {C}}, {{P
1
}, {K}}, {F
2
}, {F
1
}
72
Chapter 4
Complexity and Optimizations
Contents
4.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Optimizations for Deriving the Preference-based Order . . . . . . . . . . . . . . 78
4.2.1 An Algorithm based on the Focal Object Set . . . . . . . . . . . . . . . . . . . . 78
4.2.2 Optimizations for Capturing Set-Valued Attributes and Top-K Requirements . 82
4.3 Optimizations for Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . 85
4.3.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.2 Pareto Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . 87
At first (Section 4.1) we discuss the computational complexity of the algorithms presented in the
previous sections. Then at Section 4.2 we introduce more efficient algorithms for object-ordering, while
at Section 4.2.2 we focus on algorithms for set-valued facets which can be used also for evaluating the
top-K elements of the object order. Finally, at Section 4.3 we discuss some optimizations for multi-facet
preferences.
73
74 Chapter 4. Complexity and Optimizations
4.1 Computational Complexity
Alg. 1 (Apply)
In the worst case all elements of E are involved and the most expensive task is that of topological sorting.
The topological sorting is in O(|E| + |R|), thus w.r.t. E we can say that it is in O(|E|
2
). If the actions
are object-scoped, i.e. E corresponds to Obj, then the complexity of Apply is in O(|Obj|
2
).
Alg. 3 (ApplyOverFamiliesOfSets)
Suppose E is a set of terms over |T
i
|. The computation of the closure at line (3) is in O(|T
i
|
3
). Then
we have to compute |E|
2
computation of wins (between all pairs of element of E). Since to compute
wins(s, s
) we need O(|s|
2
) steps, the cost for computing all wins is in O(|E|
2
avgSetSize
2
) where
avgSetSize is the average size of the sets in E. Note that for some of the pairs we may have to compute
the support of the involved sets. Since for computing the support of one atomic element the cost is |T
i
|,
the computation of Support(s) is in O(|s||E|). Altogether, the computation of wins and support is in
O(|T
i
|
3
+|E|
2
avgSetSize
2
).
As regards the size of E, for a facet with |T
i
| values we can have at most 2
|T
i
|
sets, therefore |E|
2
|T
i
|
. However |E| cannot be bigger than |A|, therefore we can write |E| min(2
|T
i
|
, |A|).
Alg. 4 (PrefOrder)
Let us now elaborate on the computational complexity of PrefOrder and suppose that all actions in B
are object-scoped, i.e. E corresponds to Obj. Line 2 requires computing the scopes of all actions in B.
The computation of the scope of an action depends on |Obj|, and the size of the scope can be |Obj| in
size, i.e it is in O(|B| |Obj|). Line 3 requires |B|
2
comparisons of sets, where each set can be |Obj| in
size, i.e. it is in O(|B|
2
|Obj|). Line 5 requires computing the active scopes and this depends on |B| and
|Obj|, i.e it is in O(|B||Obj|). Line 6 requires firstly to compute the parameters B, W and then to run
Alg. Apply. The cost of the latter is in O(|Obj|
2
) as discussed earlier. It follows that the overall cost of
Alg. PrefOrder is in O(|Obj|(|Obj| +|B|
2
)).
4.1. Computational Complexity 75
Alg. 6 (MFPriority)
Consider the algorithmMFPriority of Section 3.4.1. Assume that we have k facets, i.e. we have to order
the elements according to a prioritized composition of actions over each facet (B
1
, B
2
, . . . , B
k
). Let us
describe the complexity for only two facets. In that case we have to apply Alg. PrefOrder with cost
O(|Obj|(|Obj| + |B
i
|
2
)), and then for each block of the produced bucket order to call Alg. PrefOrder
with actions |B
j
|. The cost of each such call is in O(|Obj|(|Obj| + |B
i
|
2
)). Overall we can say that the
cost is O(|Obj|(|Obj| +|B|
2
)), where |B| =|B
1
| +... +|B
k
| . Now the cost of the algorithm for k facets
is in O(|Obj|(k|Obj| +|B|
2
)).
Alg. 7 (MFPareto)
Consider the algorithm MFPareto of Section 3.4.2. Assume that we have k facets, i.e. we have to or-
der the elements according to a Pareto composition of actions over each facet (B
1
, B
2
, . . . , B
k
). Let us
describe the complexity for only two facets. In that case we have to apply Alg. PrefOrder twice with
cost O(|Obj|(|Obj| + |B
i
|
2
) + |Obj|(|Obj| + |B
j
|
2
)). The set of objects in the two maximal buckets
in the worst case can be the whole set of objects (i.e. |Obj|). Then we have to check for each object in
the maximal blocks of the returned two bucket orders, if they get dominated for any of the two criteria
(as described by preference actions ordering objects). This can be done by running existing skyline algo-
rithms like BNL (Brzsnyi et al. (2001)) which has a cost of O(|Obj|
2
). In the worst case (i.e. only one
element is not dominated in each run) we have to repeat this for |Obj| objects, i.e. the cost for finding
the Pareto is in O(|Obj|
3
). Overall we can say that the cost is O(|Obj|(|Obj|
2
+ |B|
2
)). Now the cost
of the algorithm for k facets is in O(|Obj|(k|Obj|
2
+ |B|
2
)), where |B| = |B
1
| + ... + |B
k
|. If we only
calculate the Pareto Optimal (i.e. the skyline) then the cost is in O(|Obj|(k|Obj| +|B|
2
))
Combination of Pareto and Priority
Regarding the combination of Pareto and Priority as described in 3.4.3, the complexity will be in the
worst case in O(|Obj|(k|Obj|
2
+ |B|
2
)), which is the complexity of the Pareto (i.e. the most expensive
composition).
4.2 Optimizations for Deriving the Preference-based Order
Facet and Zoom Point Ordering.
Since the set of facets F = {F
1
, . . . , F
k
} is usually small, the computation of
F
is not expected to be
expensive and we can use the proposed algorithms straightforwardly. The same is true for ordering the
zoom points of each facet (as |T
i
| is usually small). Also note that we do not have to order the entire T
i
but only the active terms (i.e. the zoom points Z
i
(ctx) and ZR
i
(ctx) as defined in Table 2.1) which
are subsets of T
i
.
Object Ordering.
Let us now focus on object ordering. If |Obj| (and thus all |A|s) is small, then we could again apply the
proposed algorithm straightforwardly. If |Obj| is high then |A| can be high too.
At such cases we propose exploiting the benefits of adopting the FDT approach, i.e. the fast conver-
gence to small results sets with a fewclicks. Converge is discussed in detail (and it is quantified) at Section
6.2. This means that an acceptable and feasible policy is to order according to preference the set Aonly
if |A| is belowa given threshold (say a fewhundreds). For these reasons, belowwe present an algorithm,
Alg. PrefOrder
Opt
, which is an optimized version of Alg. PrefOrder, and whose complexity does not
depend on |Obj|, but only on |A| and |B|, so it can be applied to large information bases. We could call
this algorithm focus-based.
An alternative algorithm which can be beneficial in cases Ais large is given in section 4.2.2.
4.2.1 An Algorithm based on the Focal Object Set
Alg. PrefOrder
Opt
takes as input the set Ato be ranked which we can assume that is not big (due to the
fast convergence of FDT). First we present some auxiliary functions and the main idea (ignoring the case
of relative preferences and set-valued attributes), and then the full algorithm.
We can start by the observation that if we have a function that checks whether b
1
b
2
holds, where
b
1
and b
2
are actions, thenwe canformthe relation. Analgorithmthat implements sucha function, de-
noted by CheckSubScopeOf(b
1
, b
2
), is given below. The key point is that we can decide whether b
1
b
2
holds, without having to compute the scopes of these actions. Instead we can base our approach on the def-
inition of action scopes (Table 3.1). Specifically, if the anchor of b
1
is not empty, while the one of b
2
is
4.2. Optimizations for Deriving the Preference-based Order 77
empty (e.g. order all facets lexicographically) then it returns True. The rest cases follow the general
rule: terms are more refined to facets. In case of two term-anchored actions whose terms are -related,
then the actions are -related too (see line 6). If furthermore labeling is used (e.g. Agrawal et al. (1989))
which is good choice in such applications, then the cost of this function is always in O(1).
1: function CheckSubScopeOf(b
1
, b
2
): Boolean
2: if (b
1
.anchor = ) (b
2
.anchor = ) then
3: return True
4: if (b
1
.anchor = t
i
) (b
2
.anchor = F
j
) then
5: return True
6: if (b
1
.anchor = t
i
) (b
2
.anchor = t
j
) (t
i
t
j
) then
7: return True
8: return False
For defining the intended algorithmwe also need a booleanfunctionIsInScopeOf(o, b) that returns
True if o belongs to the scope of b. This function can be implemented as follows.
1: function IsInScope(o, b): Boolean
2: if b.scopeType=object order: then
3: if b.anchor=facet F
i
then
4: if D(o) T
i
= then return True
5: else return False
6: else if b.anchor=termt
j
then
7: if t
j

D(o) then return True
8: else return False
9: return False
The main cost of IsInScope(o, b) is the cost required to check whether a term is narrower than
another (line 7 requires checking if t
j
is broader thana termassignedto o, i.e. if t
j
t
j
where t
j
D(o)),
so its cost is O(|R
|) where |R
| denotes the number of relationships of a taxonomic relation. If labeling

is used (e.g. Agrawal et al. (1989)) then this cost is O(1). Assume now that the average number of terms
that are directly assigned to an object o is denoted with avgD
. Then the final cost of IsInScope(o, b)

is in O(avgD
).
We can now present the optimized version of Alg. PrefOrder, which is Alg. PrefOrder
Opt
shown
below. It takes as input two parameters, anobject set A, anda set of actions B(the latter is one of the k+2
sets of actions). Part (1) includes the optimized version of lines (2-3) of PrefOrder, and Part (2) includes
the optimized version of lines (5-6) of PrefOrder. We can see that the algorithm never computes the
scope of any actionand this is the key point for applying it inlarge informationbases (inthe sense that its
computational complexity does not depend on |Obj|). Instead, it checks whether elements of E (recall
that E has been reduced through clicks) belong to the scopes of actions.
Algorithm 8 PrefOrder
Opt
(E, B, Policy)
Input: the set of elements E, the set of actions B, and Policy for inactive elements
Output: a bucket order over E
1: /** Part (1): Computation of (B, ) */
2: V isited
3: R
// R
corresponds to
4: for each b B do
5: for each b
B \ V isited do
6: if CheckSubScopeOf(b, b
) then
7: R
{(b b
)}
8: else if CheckSubScopeOf(b
, b) then
9: R
{(b
b)}
10: V isited V isited {b}
11: endfor
12: endfor
13:
14: /** Part (2): Efficient Computation of Act. Scopes */
15: for each b B do
16: C(b) direct children of b wrt R
17: ActiveScope[b] {e E | IsInScope(e, b)

18: (c C(b) it holds IsInScope(e, c) = False)}
19: endfor
20:
22: /** Part (3): Derivation of the final bucket order */

23: (B, W, R
) Parse(B
)
24: return Apply(E, B, W, R
, Policy) // call to Alg. 1

Regarding its complexity, suppose that the taxonomy of each facet is labeled. The cost of the first
part of the algorithm is in O(|B|
2
). Note that as long the user is not submitting a new action, (B, ) can
CheckSubScopeOf : line 6 Let b
1
.anchor = (e
i
, e
j
) and b
2
.anchor = (e
i
, e
j
). We
have to write:
((e
i
e
i
) (e
j
e
j
))
((e
j
e
i
) (e
i
e
j
))
Alg. PrefOrder
Opt
: lines
(17-18)
ActiveScope[b]
{(e, e
) E E | IsInScope(e, e
, b)
(c C(b) it holds IsInScope(e, e
, c) = False)}
IsInScope(o, o
, b) Let b.anchor = (t
i
, t
j
). We have to write:
((t
i

D(o)) (t
j

D(o
)))
((t
j

D(o)) (t
i

D(o
)))
Table 4.1:
PrefOrder
Opt
Changes for Capturing Relative Preferences Over Hierarchically Or-
ganized Values
be preserved and reused when the user is changing his focus (so O(|B|
2
) is payed once). The second part
of the algorithm has |B| iterations. Assuming labeling, the cost of each iteration is (avgD
) |A| (1 +
avgC
) where avgC
is the average number of direct children of an action w.r.t . It follows that the
cost of the second part is (avgD
) |B| (|A| (1 +avgC
)) = (avgD
) |B| |A| +(avgD
) |A|
(|B| (avgC
)) = (avgD
) |A| (|B| + | |). It holds that | | |B|

2
, and as a result the cost of
the second part is (avgD
) |A| |B|
2
. The last part of the algorithm is the cost of Alg. Apply, which
in our context is expressed as O(|A|
2
).
Changes for Capturing also Relative Preferences
The optimized algorithm Alg. PrefOrder
Opt
can be easily adapted so that to handle also relative pref-
erences over hierarchically organized values (as defined in Section 3.3.4). Specifically we just have to
make the changes shown at Figure 4.1.
Table 4.2: Complexity for Non-Optimized and Optimized Alg. PrefOrder and PrefOrder
Opt
Part Alg. PrefOrder Alg. PrefOrder
Opt
Alg. PrefOrder
Opt
Relative
Part 1 O(|Obj|(|Obj| +|B|
2
)) O(|B|
2
) O(|B|
2
)
Part 2 O(|Obj||B|) O(|A||B|
2
avgD
) O(|A|
2
|B|
2
avgD
)
Part 3 O(|Obj|
2
) O(|A|
2
) O(|A|
2
)
Total O(|Obj|(|Obj| +|B|
2
)) O(|A|(|A| +|B|
2
avgD
)) O(|A|
2
|B|
2
avgD
)
Regarding complexity, if labeling is available, then CheckSubScopeOf remains in O(1) and as a
result part one remains to O(|B|
2
). The function IsInScope(o, o
, b) requires 4 checks of the form

t
j

D(o
). Again, if AvgD
denotes the average number of terms that are directly assigned to an object
o Obj, then these checks cost O(AvgD
) time. In the revised lines 17-18 of Alg. PrefOrder

Opt
the
cost of each iteration is higher (in place of |A| we now have |A|
2
). Therefore the cost of the second part
of the algorithm is now in O(|A|
2
|B|AvgD
).
Synopsis. Table 4.2 summarizes the complexities for the 3 different parts of the algorithm, for the non-
optimized and optimized version of the algorithm. The key point is that the complexity of the optimized
algorithm is independent of |Obj|.
4.2.2 Optimizations for Capturing Set-Valued Attributes and Top-K Requirements
Here we provide an optimized algorithm for ordering a set of objects A for the case where (i) we have
relative preferences over a facet whose values are hierarchically organized, and (ii) the object descrip-
tions according to that facet are set-valued. The reason for describing this case separately is because
IsInScope was defined without considering set-valued attributes (however note that a plain vanilla
algorithm was given in Sect. 3.3.5).
Let F
i
be the facet whose terms are hierarchically organized and suppose that the object descriptions
are set-valued at that facet. We start by assuming that the relation
{}
over sets of terms of facet F
i
has
been computed, and obviously this includes inheritance resolution (computation of active scopes), and
computation of the wins and Support if needed (as we have described in Section 3.3.2). Now the idea of
the algorithm is the following:
(a) for the objects in Acollect their descriptions w.r.t. F
i
(let Z be this set),
(b) compute the restriction of
{}
on Z,
(c) apply topological sorting on Z based on
{}
, and
(d) from the blocks of Z derive the blocks of the objects.
The exact steps of the algorithm are given in Alg. 9.
In line (1) we compute Z, which is the family of sets of terms of F
i
that occur in A. As stated earlier,
we assume that the relation
{}
over all values that occur in F
i
has been defined (as in lines 1-10 of Alg.
3). Now line (2) sets R to be the restriction of
{}
on Z. Subsequently we apply topological sorting and
Algorithm 9 PrefOrderSetValuedOpt(A,
{}
)
Input: A, an order
{}
over a set-valued attribute with values in F
i
.
Output: Ordering of Aw.r.t.
{}
1: Z = {D
i
(o) | o A}
2: R =
{}
|Z
// restriction of
{}
on Z
3: L SourceRemoval(R)
4: OL
5: for each block b of L do
6: for each term set s in b do
7: ob = I(s) A// ob is the corresponding object block
8: append ob to OL
9: append to OL a block separator
10: return OL
we obtain L. Next, we start consuming L starting from the first block. Note that a block can contain
one or more term sets. For each such set s we scan Aand let ob be the objects that have this value. The
elements in ob are appended to OL which is the order of objects. This continues until having consumed
all blocks of L.
To compute
{}
we can followlines (1)-(10) of Alg. 3 and according to section 4.1, the complexity for
this is in O(|T
i
|
3
+|E|
2
avgSetSize
2
). As regards the size of E, for a facet with |T
i
| values we can have
at most 2
|T
i
|
sets (i.e. the size of P(T
i
)), therefore |E| 2
|T
i
|
. However |E| cannot be bigger than |A|,
therefore we can write |E| min(2
|T
i
|
, |A|). One policy is to compute
{}|Z
when needed. Another is
to compute
{}
over all distinct sets that occur for that facet (and to update it each time the user issues
a preference action that concerns that order), to avoid recomputing it while the user restricts the set A.
At run-time we just have to take its restriction of Z. We favor this policy in the given algorithm.
Generalization and Top-K Algorithm
Note that Alg. 9 essentially corresponds to the following approach: first order the terms and from their
ordering derive the object ordering. Now suppose that we are not in the context of a set-valued attribute. If
instead of passing the parameter
{}
, we pass an ordering over the values of T
i
, then a rising question
is whether we could use this algorithm, instead of Alg. 8, to produce the object order, and in what cases
that algorithm would be beneficial.
Let approach this question from the computational complexity perspective. Suppose the case of rel-
ative preferences over a T
i
. Instead of
{}
, we have to pass as parameter the ordering over the values of
T
i
. To compute this ordering we can use Alg. 8 where instead of having to order the objects in Awe or-
der the terms of T
i
. In this case, and according to Table 4.2, the cost of this step is in O(|T
i
|
2
|B|
2
avgD
).
Note that it is not necessary to compute Z or the restriction of the preference relation on Z (lines 1-2 of
Alg. 9), in the sense that the final answer will be correct in any case due to the intersections with A at
line 7. The computation of Z can be beneficial if Z is much smaller than T
i
(in that case for less ss we
will have to compute I(s)). Also note that the way A is defined can be exploited for further optimiza-
tions. For instance, if A has been defined intentionally (by one query), then we may already know the
set Z without having to scan the set A.
Line (3) requires topological sorting whose cost is in O(|T
i
|
2
). The subsequent loop will have at most
|T
i
| iterations (in particular |Z|), and the cost of each iteration is that of the operation I(s) A. The
operation I(s) A can be implemented in various ways, based on the sizes of the operants (and the
data structures that are in place). E.g. if Ais small it is better to scan Ato select those objects whose D
i
description equals s, and in this case the cost of an operation I(s)Ais in O(|A| avgD
). On the other
hand if A is big, and I(s) is small, then it is better to compute and scan I(s) and then delete from this
set those elements which are not in A. In this case the cost of an operation I(s) A, if we assume direct
access to the elements I(s) and ability to perform binary search for lookups at A, is O(|I(s)| log |A|). It
follows that the cost of the loop is in O(|T
i
| min(|A| avgD
, |I(s)| log |A|)).

Overall, the cost of Alg. 9 for single-valued facets, including the cost for computing the preference
relation to be passed to this algorithm, is in O(|T
i
|
2
|B|
2
+ |T
i
| min(|A| avgD
, |I(s)| log |A|)).

Recall that the cost of Alg. 8 (according to Table 4.2) is in O(|A|
2
|B|
2
avgD
).
One benefit of Alg. 9 is that it can be more efficient than Alg. 8 if A is large and T
i
is small. This
is evident also from their complexities; Alg. 9 will have the cost O(|T
i
|
2
|B|
2
+ |T
i
||I(s)| log |A|), while
Alg. 8 will have the cost O(|A|
2
|B|
2
avgD
). Note that cases where |A| can be very large may occur
at application level. For instance, consider the case of a user who has expressed a number of object-
scoped actions, and instead of restricting A, he would like to directly get the most preferred objects. The
user wants to bypass the information thinning process probably because he believes that his preference
actions are enough for bringing the most desired object to the top positions of the returned answer. Is
it not hard to see that in this scenario, both plain (Alg. 4) and Alg. 8, are prohitively expensive. Alg.
9 will be more efficient, but it will still order the entire Obj. Although, according to our opinion the
4.3. Optimizations for Multi-Facet Preferences 83
assumption that the user has expressed a detailed and complete description of his preferences is not
very realistic (recall the discussion at the end of Section 2 and the DiFEPreKO Hypothesis that will be
discussed in Section 6.3), if we want to support such scenarios then a possible direction is to devise an
appropriate top-k algorithm. Top-k algorithms for preference-aware queries have been proposed (e.g.
Georgiadis et al. (2008); Stefanidis et al. (2010); Spyratos et al. (2011)), however they are appropriate for
plain relational sources, meaning that hierarchically organized values or set-valued attributes are not
supported. However notice that Alg. 9 canbe slightly changed to become a top-K algorithm. Specifically
we consume blocks of L until OL has reached K objects. With this we complete the discussion of the
main cases where the adoption of Alg. 9 is beneficial.
On the other hand Alg. 8, can be faster than Alg. 9 if the T
i
is large in comparison to A(e.g. suppose
T
i
is a thesaurus and Ais a set of few tens of objects). Also note that another merit of Alg. 8 is that it can
be straightforwardly extended to accommodate object-anchored preference actions, or other multi-facet
preferences, due to its scope-based approach.
4.3 Optimizations for Multi-Facet Preferences
4.3.1 Prioritized Composition
According to Section 4.1, the cost of MFPriority (presented at Section 3.4) is in O(|Obj|(k|Obj| +
|B|
2
)) for k facets. Let us now suppose that in algorithm MFPriority we use PrefOrder
Opt
instead
of PrefOrder. The cost of MFPriority in that case is in O(|A|(k|A| + |B|
2
)). Analogously, one could
adopt Alg. 9 and calculate accordingly the complexity of MFPriority.
Now we will introduce an alternative approach for supporting what we call efficient priority-driftage.
We refer to the scenario where the user changes priorities with one click, and we want the newordering
of objects to appear instantly. To begin with, as long as the user does not submit an action, each (B
i
, )
can be kept stored and reused. Now suppose the user is inspecting an answer set A and he changes
facets (i.e. he clicks on one facet) just for changing the priorities. Specifically, suppose the user has al-
ready specified B
i
B
j
, meaning that both L
B
i
and L
B
i
B
j
have already been computed (according to
MFPriority). Suppose that the user now clicks on F
j
just for changing the prioritized multi-facet pref-
erence to B
j
B
i
. According to the approach presented at Section 3.4, the application of MFPriority
will first compute L
B
j
and finally it will produce L
B
j
B
i
. The key idea of the alternative algorithm is
that we can avoid calling Alg. PrefOrder for each block of L
B
j
at Step 2 of Alg. MFPriority. Specif-
ically we will show that from L
B
i
and L
B
j
we can compute L
B
j
B
i
. It can be easily proved that the
first blocks of L
B
j
B
i
is the restriction of L
B
i
on the objects of the first block, say A
j1
, of L
B
j
. This
means that the first blocks of L
B
j
B
i
can be obtained by scanning once L
B
i
and deleting (skipping)
each object encountered that does not belong to A
j1
.
In the example of Section 3.4, the first two blocks of L
B
Loc
B
Manuf
(i.e. the blocks {F1, F2}, {L}),
can be obtained by replacing the first block of L
B
Loc
(i.e. {L, F1, F2}) by what is left after scanning
L
B
Manuf
and ignoring the elements that do not belong to {L, F1, F2}. Since L
B
Manuf
= {B, A1, A2,
F1, F2},{L} we will get {F1, F2}, {L}.
The cost of this approach, and assuming two facets, is in O(|A|
2
), since we have to scan |A| elements
and for each one of them to perform a lookup to a set that consists of at most |A| elements. If we have
k facets then the cost is in O((k 1) |A|
2
). Notice that its cost is independent of the number of user
preference actions B
i
for each facet assuming that the user does not submit new actions. However this
approach requires keeping in memory L
B
1
, , L
B
k
. Each has at most |A| objects (according to the
suggested scenario), therefore the main memory cost is k |A| where k is the number of facets.
To summarize, an alternative to Alg. MFPriority policy is to compute and have stored the bucket
order L
B
i
for each 1 i k. Then any prioritized composition of these sets of actions can be obtained
by the method just described. The cost of priority driftage in this case does not depend on the number
of preference actions but requires hosting k|A| objects in main memory.
Top-K Prioritized Composition. Nowsuppose that the user wants (or the user screen has place for)
only the top-P hits where P is a positive integer. We can exploit this constraint to speedup the process.
In particular, fromthe bucket order of the first in priority B
i
, we can get the minimumnumber of blocks
whose cardinality if summed is greater or equal to P (if this is possible, i.e. if P |A|). For instance, if
P = 4 in our example then we will get only the first 2 blocks of L
B
Loc
.
4.3.2 Pareto Composition
According to Section 4.1, the cost of MFPriority (presented at Section 3.4) is in O(|Obj|(k|Obj| +|B|
2
))
for k facets. Let us nowsuppose that inalgorithmMFPareto we use PrefOrder
Opt
instead of PrefOrder.
4.3. Optimizations for Multi-Facet Preferences 85
The cost of MFPriority in that case is in O(|A|(k|A|
2
+|B|
2
)). Analogously, one could adopt Alg. 9 and
calculate accordingly the complexity of MFPareto.
4.3.3 Combination of Priority and Pareto Compositions
Using the optimizations described in Sections 4.3.1 and 4.3.2, the cost of the combination of the two
algorithms is in O(|A|(k|A|
2
+|B|
2
)).
86
Chapter 5
Applicability and the System Hippalus
Contents
5.1 Application in Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.1.1 Case: Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.2 Case: Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1.3 Case: RDF Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2 Hippalus: A Preference Enriched Faceted Exploratory System . . . . . . . . . . 98
5.2.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.2 Visualization and User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.3 Interaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
The objective of this chapter is to elaborate on the feasibility of the proposed interaction and pref-
erence framework over different application domains. Furthermore it presents the design and imple-
mentation of Hippalus a prototype system that realises the preference enriched FDT. In more detail,
Section 5.1 elaborates on the applicability of the proposed approach over an FDT-based WSEs, relational
databases and RDF/S respectively. Finally, Section 5.2 describes the Hippalus system.
87
88 Chapter 5. Applicability and the System Hippalus
5.1 Application in Searching
Searching is a process that can be applied over a number of different application domains. Here we
elaborate on the feasibility of the proposed interaction and preference framework over WSEs, relational
databases and RDF/S respectively.
Figure 5.1: Processes of Web Searching and Exploratory Web Searching
The left part of Figure 5.1 shows (with a traditional WSE in mind) how search is performed today.
On the other hand, the right part of Figure 5.1 and Figure 5.2 showcase the proposed approaches for
exploratory and preference based searching. The same processes can be applied over the relational
databases and the Semantic Web domains, by submitting instead of a free text query the appropriate
SQL or SPARQL queries and applying FDT and the proposed preference framework over the results.
5.1. Application in Searching 89
Figure 5.2: Process of Exploratory Web Searching Enhanced with Preference Actions
5.1.1 Case: Web Searching
One applicationdomainof the proposedapproachis that of Web searching. Commonly, the various static
metadata that are available to a search engine (e.g. domain, language, date, filetype, etc.) are exploited
only through the advanced (form-based) search facilities that some WSEs offer (and users rarely use). An
approach that exploits such metadata by adopting the interaction scheme of FDT exploration was first
proposed and analyzed in Papadakos et al. (2009a). The proposed process for exploratory web searching
is sketched in the right column of Figure 5.1. Specifically the process constitutes of the following steps:
The user submits a free text query which he assumes that corresponds to his specific information
need
The system computes a ranked set of pages, documents, items
Available static metadata are thenloadedto the systemfor these items (i.e. date, language, filetype,
domain, etc.)
Available small top K excerpts (i.e. snippets), which can be produced in real-time, are then
computed
Based on the previously top K computed snippets, we can mine dynamic metadata, by using a
clustering or entity mining algorithm
Thenthe FDTinteractionscheme is applied, by calculating for eachfacet, the corresponding values
and count numbers (i.e. static and dynamic metadata and their values)
The system visualizes the available information regarding the facets, terms and objects
The user can explore the information space by restricting his focus
The can explore the next topK results, by dynamically mining metadata fromthe next topK
snippets
Finally, if the user is not satisfied with the results returned by the initial query, he submits a new
query
The previous process can be enriched with preference expression over the facets, zoom-points and
objects. Figure 5.2 depicts the process in more detail. The difference with the previous process is that
now the user can express preferences over the visualized facets, zoom-point and objects. Then the sys-
temcomputes and presents to the user the most preferred facets, zoom-points and objects, according to
the expressed user preferences.
Since the first two steps of the above process correspond to the query-and-answer interaction that
current WSE offer (which is stateless), what we propose essentially extends and completes the current
interaction.
Note that the FDT interaction scheme has already been implemented over the Mitos WSE
1
. Figure
5.3 shows the GUI of that engine. This figure shows 5 facets and their values and counts for the user
submitted query library. Specifically, one facet is dynamically mined (i.e. By clustering) while the rest
4 facets are based on static metadata (By domain, By date, By filetype, By language). To the best of our
knowledge, there are no other WSEs that offered the same kind of information/interaction at that time.
A complete presentation that also includes an incremental algorithm for speeding up the interaction
and the results of a user study is available to Papadakos et al. (2012a).
Facets
based on
static metadata
Facet based on dynamic
metadata extracted from
the top-k resources
A (objects
of focus)
facet
facet
facet
zoom
points
facet
Figure 5.3: Mitos GUI for Exploratory Web Searching
Regarding preferences, the default operation mode of Mitos (and of most FDT search engines) is
captured by the following actions:
1
Under development by the Department of Computer Science of the University of Crete and FORTH-ICS
(http://groogle.csd.uoc.gr:8080/mitos/).
facets order: lexicographic min
terms order: count max
objects order: Relevance value max
This indicates that the language presented in Section 3 can capture the default behaviour of various
systems. In order to extend the exploratory web searching process with preferences we have to add the
two additional steps of the process depicted in Figure 5.2. This functionality can be provided in the same
way as described later on in Section 5.2.2.
5.1.2 Case: Relational Databases
The interaction scheme of FDT can also be applied over relational databases (i.e. over a single table or
over the results of a query defined by using the query language (SQL)). If we want to explore data that
are not stored in one table, then we can exploit the view mechanism that relational databases provide: a
view is actually a named SQL query, over which other queries can be formulated as if it was a table of
the database. Specifically, we can define a view comprising attributes from different relations (tables)
and its definition may include joins, and various other descriptions. Subsequently, we can apply the FDT
interaction scheme over the contents of this view (i.e. over its answer), by assuming that each attribute
of the view is a facet, and the set of its distinct values that appear in that attribute correspond to the
terms of this facet. The tuples in the answer of that view are the objects. This is the straightforward
approach to apply FDTs and preference-based browsing over relational sources.
Let us now compare this method with Preference SQL at a syntactical level, i.e. the usability of our
method in comparison to using directly preference SQL. Suppose the following table Car:
Id Manufacturer Color
o1 VW Silver
o2 Ferrari Red
o3 Fiat Yellow
o4 BMW Silver
o5 Kia Silver
o6 Lexus Silver
o7 Toyota Silver
o8 Kia Silver
o9 Fiat Red
Consider now a user that wants to buy a car and assume that he prefers European cars to any other
cars. Also he likes Lexus equally to European cars. Finally Fiat and Kia are his least preferred brands.
The above preferences can easily be expressed using our preference language. Specifically, they can
be expressed as preference actions that are anchored to terms, with objects as their scopeType Such a
preference could be expressed using Preference SQL (Kieling et al. (2011a)). Preference SQL returns the
BMO set. i.e. the Pareto optimal. The query could have the following format, assuming that the user
knows all the distinct manufacturers that can be stored in the database:
SELECT * FROM CAR PREFERRING
Manufacturer EXPLICIT ('Kia' < 'Toyota',
'Kia' < 'Ferrari','Kia' < 'Lancia', 'Kia' < 'Citroen',
'Kia' < 'Peugeot', 'Kia' < 'BMW', 'Kia' < 'VW',
'Kia' < 'Lexus',
'Fiat' < 'Toyota', 'Fiat' < 'Honda', 'Fiat' < 'Ferrari',
'Fiat' < 'Lancia', 'Fiat' < 'Citroen', 'Fiat' < 'Peugeot',
'Fiat' < 'BMW', 'Fiat' < 'VW', 'Fiat' < 'Lexus',
'Toyota' < 'Ferrari', 'Toyota' < 'Lancia', 'Toyota' < 'Citroen',
'Toyota' < 'Peugeot', 'Toyota' < 'BMW', 'Toyota' < 'VW',
'Toyota' < 'Lexus')
In this query the user explicitly defines the preference relation over all available manufacturers. An
alternative and simpler query is to provide a layered form of the user preferences:
SELECT * FROM CAR PREFERRING
Manufacturer LAYERED(('Ferrari', 'Lancia', 'Peugeot',
'Citroen', 'BMW', 'VW', 'Lexus'),
('Toyota'), ('Kia', 'Fiat))
These queries will return the following bucket order {o
1
, o
2
, o
4
, o
6
}. The first query is too com-
plex for a plain user and presupposes knowledge of the schema and the values stored in the database.
The second query is simpler, but again presupposes that the user is able to construct the appropriate
layers of the preference relation and that he knows the available stored values and database schema.
Furthermore, such a query must be given in one shot. If the user changes his preferences over time or
by exploring the available objects, then he must submit a reformulated query.
If the user was exploring this table using the interaction scheme of FDT he would get the facets and
zoom-points showninFigure 5.4. Subsequently, he wouldbe able to express his preferences interactively
Figure 5.4: Facets and Zoom-Points of Running Example
(by clicking on values and selecting the desired action). Furthermore, he would express his preferences
gradually, until the point that he gets a list of results that satisfies him.
5.1.3 Case: RDF Bases
Recently, the amount of data published on the public Semantic Web has exploded, especially in the form
of Linked Data
2
(Bizer et al. (2009)). Specifically, by September 2011 available datasets had grown to 31
billion RDF triples, interlinked by around 504 million RDF links
3
. The interaction scheme of FDT can also
be applied over the Semantic Web and there are already several browsers that provide FDT over RDF.
Examples include /facet (Hildebrand et al. (2006)), Ontogator (Mkel et al. (2006)) and BrowseRDF (Oren
et al. (2006)).
RDF/S sources actually follow a structurally object-oriented model. The structuring of information
assumed by FDTs is simpler: objects described by attributes whose values may be hierarchically orga-
nized, meaning that associations between objects of the same or different types are not assumed. This
implies that for applying FDT over RDF/S sources one has to decide the part (in its original native form
or transformed) of the RDF/S source that should be explored according to FDT.
One way for this is to follow the approach described for relational sources. Specifically, we can spec-
ify the desired part (or the desired transformation) by a SPARQL query. Then we can apply FDT over the
results of this query. Since the structure of the results is actually a relational table, we can apply FDT
exploration as in relational tables.
2
Wikipedia defines Linked Data as a termused to describe a recommended best practice for exposing, sharing, and connect-
ing pieces of data, information, and knowledge on the Semantic Web using URIs and RDF
3
http://en.wikipedia.org/wiki/Linked_data
An alternative way, that does not use a query but instead specifies the part of the source that should
be explored is described below. Moreover this method can exploit the subClassOf relationships. Specifi-
cally, subClassOf relationships are treated as hierarchically organized values. In this case, the objects of
interest (i.e. the set Obj) can be defined by selecting one class of the source: all direct and indirect in-
stances of this class constitute the set Obj. For instance, assuming the case of Fig 5.5 we can define that
the objects of interest are the instances of the class Vehicle. As facets we can consider the properties
that start or point to the above class. Moreover the class hierarchies (of Vehicle, Location, Manufacturer)
are exploited. Specifically, in this example it is like having three facets: type whose values are the hierar-
chy of Vehicle, madeBy whose values are the instances of Manufacturer, organized hierarchically through
the subclasses of that class, and locatedIn whose values are the instances of Location, organized hierarchi-
cally through the subclasses of that class.
More expressive exploration models which exploit the full structuring of RDF/S sources (even its
fuzzy extensions (Manolis and Tzitzikas (2011))), go beyond the scope of this work.
Figure 5.5: Example of RDF/S
5.2 Hippalus: A Preference Enriched Faceted Exploratory System
To demonstrate the feasibility of our approach and for identifying possible difficulties or other issues
related to implementation and application, we have designed and implemented a proof of concept pro-
totype, named Hippalus. The logo of Hippalus is a ancient greek boat withthe symbol of preferences
as sail
4
. This system was used for the user study described in Section 6.5.
5.2.1 Software Architecture
Instead of starting from scratch, we have decided to design and build Hippalus over RDF/S sources
and RDF/S managing software. Specifically, we have implemented the proposed preference framework
over a prototype for browsing and exploring RDF sources
5
based on the model described in Manolis and
Tzitzikas (2011), apart from the aspect of fuzzyness. Hippalus uses Jena
6
, which is a Java framework
for building Semantic Web applications. The architecture of the system and its components is given in
Figure 5.6.
The user submits his preferences through HTML5 context menus, which are then translated to state-
ments of the preference language described in Section 3.1. These statements are then send to the servlet-
based server, throughHTTP requests. The server checks the validity of the received requests and analyze
them using a parser of the preference language described in Chapter sec:IPS. If the action is valid, it is
passed as input to the appropriate preference algorithm (as described in Chapter 3 and 4). To query
the underlying RDF information base, we use Jena through a Data Manager component, for abstracting
the details of this particular component. Finally, the computed preference relation and therefore the
preference bucket order is send to the State component, which in turn updates the UI through an HTTP
response.
4
Hippalus was a Greek navigator and merchant who probably lived in the 1st century BCE. He is credited to have discovered
the direct route from the Red Sea to India over the Indian Ocean by plotting the scheme of the sea and the correct location of
the trade ports along the Indian coast, and by taking advantage of the monsoon wind.
5
The information base that feeds Hippalus is represented in RDF/S, using a schema adequate for representing objects
described according to dimensions with hierarchically organized values.
6
http://jena.apache.org/
5.2. Hippalus: A Preference Enriched Faceted Exploratory System 97
Figure 5.6: System Architecture
Manufacturer
Drive_System Vehicle_Type Transmission
European American
U.K. U.S.A.
1 835 3
Aston_Martin J eep
Car Truck Manual Semi-automatic 2-Wheel_Drive All-Wheel_Drive
2-Wheel_Drive, Rear
Figure 5.7: The RDF Knowledge Base
5.2.2 Visualization and User Interface
Regarding visualization and FDT one has to decide where and howto visualize: the focus (current object
set), the facets, the zoompoints (and their count information), the intentional description of the current
state, and finally the information related to preferences. These are the main decisions.
The most widely adopted approach or policy (evidenced by the UI design of global systems like book-
ing.com), is to use a left bar for the facets and the corresponding zoom points, the right area for the
scrollable list of objects in the focus, and a top small area for the description of the current state. For
each of these elements, various visual elements can be used. A thorough description is available at Chap-
ter 4 of Sacco and Tzitzikas (2009) book.
In our case we have to decide where to show the preference-related information and actions, since
this has not been supported by any system so far. Regarding preference actions, one approach is to
provide the preference-related action through right-click activated pop-up menus. This policy does
not require allocating permanent screen space for these actions. However the user should be aware
that these options exist. The design of the preference actions, includes actions that are anchored to
one element, and this makes the right click activated actions straightforward. However, the proposed
preference based framework also supports actions that concern two elements (i.e. relative preferences
like German Italian).
Regarding the way the description of the current state is shown to the user, the user should be able to
view not only the intentional description of his current state, but also the accumulated preferences that
he has formulated. Finally, the user should be able to store and load his preferences, since exploration
is a time depth process.
Based on the above requirements, we have designed a Web application that offers exploration ser-
vices for a set of objects described in using several dimensions, where all this information is represented
in RDF. In this case we map facets and terms from FDT to classes and subclasses respectively. The pref-
erence actions are offered through HTML 5 context menus
7
and AJAX, which are enacted by right clicking
in the browser window. The user is able to order classes, subclasses and objects using best, worst, pre-
fer to actions (i.e. relative preferences), around to actions (over a specific value), or actions that order
them lexicographically, or based on their values or count values. Furthermore, he can compose object
7
Available only to firefox 8 and up.
Figure 5.8:
Hippalus: The Main Page of Hippalus. (a) Shows the Area where Facets and
Terms are Displayed, (b) the Ranked Objects Area, (c) the Preference Actions His-
tory and Composition Tool, (d) Interesting Objects Tool (i.e. Like a Shopping
Cart) and (e) the Object Restriction History
related preferences, using Priority, Pareto, Pareto Optimal, and Combination
8
compositions, by selecting the
appropriate composition mode and selecting classes through the classes context menus. The default
composition is Combination. Regarding objects, since their number can be very large, the user is able to
define a threshold, so that preferences are applied only whenthe number of objects is reduced under this
threshold
9
. Options and parameters regarding the systemfunctionality can be set through a drop-down
8
Order according to priorities if defined. The rest actions use Pareto composition and are the least prioritized.
9
The user can reduce the number of objects (simple menus support only actions affecting objects), by navigating over the
classes, subclasses, and objects and restricting his focus.
menu (i.e. simple or full support of preference menus, threshold, evaluation parameters, load-save etc.)
5.2.3 Interaction Example
For demonstration purposes as well as for the needs of the user study (described in detail at section 6.5)
we have constructed an information base about cars. Each is described using classes like Manufacturer
and Drive_System, which are hierarchically organized, while the rest like Vehicle_Type are flat (as
shown in Fig. 5.7). In this figure, continuous arrows denote subClassOf relationships while dashed arrows
denote typeOf relationships. The information base contains 50 cars, indexed under 23 classes and 85
subclasses.
Here we describe a more complete scenario demonstrating how hard and soft constraints can be
specified by the user, in an easy and gradual manner. It also aims at making clear the merits of the
underlying preference framework (preference inheritance and scope-based conflict resolution). A video
showcasing this scenario is available online
10
.
Figure 5.8 shows the main page of Hippalus over the collection of 50 cars. Specifically, part (a)
shows the attributes, their values (which can be hierarchically organized), accompanied by the number
of their occurrences, where the user can restrict his focus or express preferences anchored to them.
Part (b) depicts the objects area, which is ranked according to preference, part (c) shows the preference
actions history and composition tool, part (d) displays the Interesting Objects tool (i.e. like a shopping
cart) and finally, part (e) the object restriction history.
Figure 5.9 shows that one can expand broad values, like Asian (from the attribute Manufacturer),
and that by clicking on the value Korean the focus is restricted on three Korean cars. Notice that the left
bar has been updated, i.e. only the values that appear in the restricted set are presented (all attributes
have count up to 3). With additional clicks the user can further reduce the focus, e.g. from the attribute
Fuel Type we can see that one of the cars consumes Diesel and two cars Gasoline. By clicking on
Gasoline we see these two cars and by mouse over one of them the user gets its Object Card showing
all attributes of that car. At the right bottom frame the user can see the history of his clicks and can
undo any click.
Preferences are activated through right click menus. Suppose we cancel all clicks and assume that
10
http://www.youtube.com/watch?v=Cah-z7KmlXc
Object
restriction
mouse
over
(b)
Figure 5.9: Hippalus: Value Expansion - Object Restriction
Figure 5.10: Hippalus: Expression of Relative Preference Korean European
we want to express that we prefer Korean cars than European. This means that we do not want to see
only Korean; we just want to get them ranked higher than European. This is shown in Figure 5.11.a (top)
where we see that now the user is getting a linear list of blocks of equally preferred objects, here the
first contains Korean cars, the next one European (thanks to inheritance the user does not have to say
anything about German, Italian, French, etc).
It is important that preferences can be expressed incrementally, and at any point during the interac-
tion, e.g. suppose that the user also prefers prices around 12,000. He can use the action around 12,090
as shown in Figure 5.11.a (bottom). We can see that the object order now becomes more refined (the
figure shows 14 blocks). Notice that the first block contains one Korean (Hyunday) and one Fiat. This
happens because both of his preference actions have the same priority (and Fiat is closer to 12090). If the
user wants to give higher priority to one preference he can use the right frame dedicate on this. Figure
5.11(b) shows the object order obtained after expressing that the preferences on manufacturers have
higher priority than the preferences over prices.
At any time the user can click on a value from a facet to restrict the current focus, which is now a
preference-based list of cars. For instance, if the user wants to see only cars having two doors, he can
click on 2 in the attribute Doors. We can see that now he gets only 8 cars, which are ranked according
to his preferences so far. The user could cancel this extra restriction from the object restriction history.
In general the user can combine object restriction (or relaxation) actions and preference actions in
any order. Figure 5.14 shows the ranked list of objects, after restricting our focus only to cars that have
2 doors. The two previous preference actions (i.e. Korean European and price around 12090) are
used for the final preference ranking of objects.
The composition of preference actions is shown in Figure 5.12. Specifically, we have created two
priority levels, by pressing the Add Priority Level button. Then we defined the desired priority order,
by drag-and-drop facets to the appropriate priority levels. As the user changes the priority levels, for
example by placing Manufacturer to Level 1 priorities and Price to Level 2 priorities, the system calculates
on the fly the new order of objects. Notice how refined the objects ranking is, because of the second
preference action that orders the cars around the price 12090. If we revert the priorities, the objects
order changes Figure 5.13. The default composition mode is the Combination mode. This mode shown in
Figure 5.14, is like Pareto, if no priorities are defined.
(a)
(b)
Figure 5.11:
Hippalus: (a): Expressing Preferences, (b): Object Restrictions after Preference
Expression
Figure 5.12: Hippalus: Composition of Preference Actions. Manufacturer Prioritized to Price
Figure 5.13: Hippalus: Composition of Preference Actions. Price Prioritized to Manufacturer
Figure 5.14: Hippalus: Composition of Preference Actions. Default Combination Mode
Figure 5.15: Hippalus: Restricted Focus with Preferences Applied
106
Chapter 6
Evaluation
Contents
6.1 Evaluation Approaches & Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1.1 Metrics for Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1.2 Metrics Related to the Proposed Interaction Scheme . . . . . . . . . . . . . . . 114
6.2 Theoretical Analysis of the Number of User Decisions and Effort in FDT . . . . . 116
6.3 DiFEPreKO Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.1 Analytical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.2 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.4 Evaluation of Various Exploration Approaches . . . . . . . . . . . . . . . . . . . 131
6.5 Evaluation of Hippalus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.6 Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
The objective of this chapter is toelaborate onhowthe proposedpreference-basedinteractionscheme
could be evaluated. Specifically, Section 6.1 reviews the related work, identifies the various metrics and
evaluation approaches that have been proposed or used and are related to our interaction scheme, and
proposes new metrics for decision making. Consequently, Section 6.2 studies theoretically the conver-
gence of FDT-based UIs and the required user effort with or without preference actions. Afterwards,
Section 6.3 introduces and evaluates through a simple experiment, an hypothesis saying that without
107
108 Chapter 6. Evaluation
the ability to explore the existing choices, the expression of preferences can be time-consuming and
result to incomplete preferences. Furthermore, Section 6.4 and 6.5 discuss two user-based evaluations.
The first one shows the effectiveness of FDT for exploratory tasks, while the second one evaluates the
proposed preference-based scheme over Hippalus and discusses the results of the evaluation. Finally
Section 6.6 concludes this chapter.
6.1 Evaluation Approaches & Metrics
Here we discuss a number of exploratory search metrics and we identify metrics that are relevant to our
proposed preference-based interaction scheme. Furthermore, we study theoretically the convergence
of FDT-based UIs and the required user effort with or without preference actions.
6.1.1 Metrics for Exploratory Search
One characteristic of any ES approach is that it is session-based. With session-based we refer to a dialogue
between the user and the system such that the response of the system (e.g. answer, branch shown)
does not depend only on the current user request (e.g. query, click) but also on his previous request
and session history in general. Furthermore, according to Marchionini (2006) ES is recall-oriented. As a
result, standard single query metrics like traditional Precision and Recall metrics or Instance Recall (Over
(1997)), which allowmultiple queries per session (rewarding for the number of distinct relevant answers
identified in a session of a given length), are inefficient for evaluating session-based information tasks.
The evaluation of systems offering session-based IIR is difficult, because of the complexity of data re-
lationships, diversity of displayed data, interactive nature of exploratory search, along with the percep-
tual and cognitive abilities offered. They rely heavily on users ability to identify and act on exploration
opportunities, as described in White et al. (2007).
Discussions of available methods and metrics for evaluating experimental UIs for web searching are
provided in the works of Kki and Aula (2008) and Kelly et al. (2009). Kanoulas et al. (2011b) give an
overview of data collections and metrics for the evaluation of session-based IR
1
. Finally, a recent survey
regarding the evaluation of web retrieval effectiveness, is provided in Carterette et al. (2012). According
1
The 2012 session can be visited in http://ir.cis.udel.edu/sessions/index.html
6.1. Evaluation Approaches & Metrics 109
to it, we can group evaluation methods and metrics in three different categories. The first one include
traditional metrics, that do not make any assumption regarding the user. The second one tries to use
simple user models and finally, the third uses advanced user models. Figure 6.1 shows the different
groups of metrics, which are described below.
First Group of Metrics: No User Model
The first group of metrics assumes binary relevance (i.e. a document is either relevant or not) and is
based on sets of documents and not ranked lists. Specifically, it includes the traditional metrics of Pre-
cision and Recall from the Cranfields studies and their combinations, like F-measure and Average Precision.
Average Precision is the most widely used metric in IR.
Second Group of Metrics: Simple User Model
The second group includes metrics that make simple assumptions about user behaviour. For example
the Expected Search Length metric described in Cooper (1968), assumes that the user walks down a ranked
list of documents and observes every document until a stopping point. This is the point where he sat-
isfies his need. This metric uses a cost function for each visited document, based on the relevance of
the document. In addition, Robertson (2008) demonstrates that Precision and Recall can be redefined us-
ing the above user model, by defining the utility of each document. In this case, the Precision becomes a
measure of utility and Average Precision becomes an expectation of utility over a number of browsing deci-
sions. Furthermore, Robertson et al. (2010) proposed the Graded Average Precision (GAP), a newmeasure
that redefines Average Precision by taking into consideration different relevant grades. Specifically, this
metric assumes that the user regards as relevant documents that have a relevance value over a specific
threshold.
The Rank Biased Precision (RBP) measure, described in Moffat and Zobel (2008), tries to incorporate
the users persistence to examine a certain number of documents in the results list (e.g. the user looks
only the first result, or the top ten results). However, this measure does not take into account the quality
of the answer. Another relative metric is the Discounted Cumulated Gain (DCG) and its variations. The
best known is the normalized discounted cumulative gain (nDCG), described in Jrvelin and Kekli-
nen (2002), which uses a graded relevance scale of documents and measures the usefulness or gain of
a document, based on its position in the result list. The gain is accumulated from the top of the list to
Figure 6.1: Available IR Metrics
the bottom, with the gain of each result discounted at lower ranks. The result is normalized by dividing
with the DCGof the ideal ranked answer set. Schuth and Marx (2011) suggest an adaptation of nDCG
for FDT-based information systems. Specifically, they are interested in which facet-value pairs will be
presented to the user. They also propose nrDCG, which is a recursive version of DCG.
In the same manner, Expected Reciprocal Rank (ERR) described in Chapelle et al. (2009), tries to over-
come the problem of DCG and RBP, that a text in a specific position has always the same profit, by
taking into account the quality of the response of the system. It is a popular measure for tasks that re-
turn a single relevant document and is based on the cascade user model. This model assumes a user that
accumulates utility by stepping down the ranked list and decides whether to continue browsing based
on the accumulated utility. Yilmaz et al. (2010) proposes the EBU metric, a similar metric to ERR,
which uses the same cascade user model, but in addition takes into consideration the effect of document
snippets.
Third Group of Metrics: Advanced User Model
The third category includes metrics that use more advanced user models. We can consider two different
subfamilies for these metrics. The first includes metrics that take into consideration novelty and diver-
sity. Examples include subtopic generalizations of recall and precision as described in Zhai et al. (2003),
where the user gets utility fromeach different topic that was retrieved. In addition, the intent-aware fam-
ily of measures described in Agrawal et al. (2009), assumes that there is a probability distribution over
subtopics. In Clarke et al. (2008), the a-nDCG metric, takes into account duplicate text, by penalizing
duplicate text. Finally, Chapelle et al. (2011) describe an intent-aware ERR, computed as a weighted
average of ERR over intents.
The secondsubfamily includes the metrics that are session-based. This subfamily includes nsDCG, a
variant of nDCGfor sessions, which is described in Jrvelin et al. (2008) and incorporates a cost for each
query reformulation. Furthermore, the work described in Yang and Lad (2009), proposes a theoretical
probabilistic framework that takes into consideration the user interactions over multi-session ordered
lists, inorder to evaluate andoptimize informationdistillation
2
. The associateduser models is a user that
steps down a list until a point where he reformulates his query and begins again fromthe newranked list.
Finally, recent work of Kanoulas et al. (2011a) generalizes the traditional measures of IR such as Precision,
Recall and Average Precision during a session. These metrics assume that a user steps down a ranked list
until a point where he either reformulate his query or abandons the search.
2
Information Distillation is an emerging area of research, which focuses on the effective combination of ad-hoc IR, novelty
detection and adaptive filtering.
Other Evaluation Approaches
In addition to all the above, the work of Kules et al. (2009) examined the interaction with a faceted online
library catalogue and found that facets are very important in exploratory processes. Azzopardi (2009)
represents the usage of an ES as a stream of documents and studies the performance of such systems
based on time and usage. Kules and Capra (2008) discusses ways to create exploratory tasks for faceted
search UIs. Wilson and Schraefel (2007) propose a method for evaluating exploratory search by blend-
ing IR frameworks with HCI design. Works that use statistical methods such as factor analysis (FA) and
structural equation models (SEM) in order to examine the interrelationships between multiple evalua-
tion criteria are described in Toms et al. (2005) and OBrien et al. (2008) respectively. Finally, Carterette
et al. (2011) simulates user behaviour by using click data and a Bayes procedure.
6.1.2 Metrics Related to the Proposed Interaction Scheme
Regarding the evaluation of our preference-based interaction scheme, we will consider both non session-
based and session-based metrics, which can be measured at each step of the interaction. We include both
non session-based and session-based metrics, so that we can conclude how each user action affects the
results set and the user task respectively.
Non-session based metrics
The following non-session based metrics could be beneficial for the evaluation of our approach:
Average Precision. It is one of the most commonly used metrics, since it takes into consideration
both precision and recall. It is calculated by the following formula:
AP =
n
i=1
p(i)r(i) (6.1)
where p(i) is precision of document in position i of the search results, r(i) is the difference in recall
from document in position i 1 to document in position i and n is the number of objects in the result
set.
normalized Discounted Cumulative Gain - nDCG. Discounted Cumulative Gain, is a metric de-
scribed in Jrvelin and Keklinen (2002), which promotes systems that return relevant documents near
the top of the answer set and penalizes systems that return relevant documents at the bottom of the
answer set. It is calculated by the following formula for the position k:
DCG
k
=
k
i=1
2
rel
i
1
log
2
(i + 1)
(6.2)
where rel
i
is the relevance of document i and rel
i
[0, 1]. The normalized DCG i.e. nDCG in the position
r is calculated by diving DCG
r
with the IDCG
r
value, which is the ideal DCG
r
value (documents were
returned in the optimum way). Specifically,
nDCG
r
=
DCG
r
IDCG
r
(6.3)
normalized Discounted Cumulative Gain - nDCG for FDT. An adaptation of nDCGfor FDT-based
information systems by taking into consideration the facet-value pairs, is described in detail in Schuth
and Marx (2011). This metric focuses on two aspects: (a) prefer facet-values that would return a lot
of relevant documents high in the return list and (b) prefer facet-values that would return relevant
documents we have not seen by earlier facet-values.
normalized recursive Discounted Cumulative Gain - nrDCG for FDT. A recursive version of
nDCG for FDT is also proposed in Schuth and Marx (2011). Such a metric could be very useful for
suggesting the top-K most valuable facet-values to the user, when the display area is limited (i.e. mobile
devices). Furthermore, these metrics could provide the default ordering of facets and their values (in
addition to the lexicographic, value and count based ordering).
normalized Expected Reciprocal Rank - nERR. This is a metric that takes into consideration the
usability of the documents in the answer set and is described in detail in Chapelle et al. (2009). This
metric is calculated by the following equation:
ERR =
n
r=1
1
r
P(user stops at position r) (6.4)
where P(r) is the probability that the user stops searching after the document in position r. This prob-
ability is calculated by the following equation:
P(user stops at position r) =
r1
i=1
(1 R(rel
i
))R(rel
r
) (6.5)
where R(rel
i
) is the probability that document i satisfies the user. In more detail R(rel) is calculated
by the following equation:
R(rel) =
2
rel
1
2
max rel
(6.6)
where max rel is the maximum relevance score. While there is no justification for using this formula
(like in the gain function of DCG), values could be inferred from logged user data.
ERR can be normalized (nERR), by dividing with the maximumERR for a specific query.
Session-based Metrics
Session-based metrics include:
Session-based Precision, Recall and Average Precision. These metrics extend the classic preci-
sion, recall and average precision metrics for sessions. They are described in detail in Kanoulas et al.
(2011a).
normalized session Discounted Cumulative Gain - nsDCG. Jrvelin et al. (2008) extends DCGand
nDCGto a session. This specific metric takes into consideration also the number of queries. The bigger
the number of the queries, or the number of interactions in the case of explaratory systems, the smaller
the value of the metric. Specifically, nsDCGis calculated by:
nsDCG(q) =
(1 +log
bq
q)
1
DCG
IDCG
(6.7)
where q is the number of the query or user interaction and 1 < bq < 1000.
6.2 Theoretical Analysis of the Number of User Decisions and Effort in
FDT
Here we try to measure the number of choices that a user has to make in order to reach (through explora-
tory browsing) the desired object, assuming that all objects are described by one or more hierarchically
6.2. Theoretical Analysis of the Number of User Decisions and Effort in FDT 115
organized attributes. Specifically, we theoretically discuss the convergence of FDT-based UIs and the
required user effort with and without preference actions.
Convergence of FDT Exploration
The algorithmpresented at Section 4.2 (Alg. PrefOrder
Opt
) is based on the assumption that the focus A
can be reduced very fast in a FDT-based interaction. In this section we report an analysis for justifying
this claim.
Consider one taxonomy having the form of a complete and balanced tree of depth d and degree b.
Let n be the number of objects in the information base (which are indexed by that tree). In that case
b d is the number of choices a user has to see in order to reach (select) a particular leaf (i.e. the number
of terms whose label the user has to read if he starts from the root of the tree), and d is the number of
decisions (i.e. clicks) he has to make. If we want each object to have a distinct description (assuming that
each object is classified to one leaf of the taxonomy), then this means that:
b d = b log
b
n =
d
n d (6.8)
The real-valued degree b that minimizes the product b d is the Eulers number e, so let us assume
that b = 3 is the more beneficial degree. If each leaf should index 10 objects, then for n = 10
11
objects
we need 10
10
descriptions. Assuming b = 3 we get b d = 3 log
3
10
10
= 3 19 = 57 choices, and
d

= 19 clicks.
Now suppose that we have k facets. Finding the desired description requires selecting one leaf from
each T
i
. As there are k facets, and we must select one leaf from each one of them, the overall displayed
choices are obtained by multiplying by k. Since we have k facets, we can obtain the n distinct descrip-
tions if each facet has
k
n leaves (since their cartesian product yields n distinct descriptions). In this

case, the depth d of a facet equals to log
b
k
n, and the degree b is

d
n =
dk
n. It follows that the

overall displayed choices are:
b d k = b log
b
(
k
n) k = b log
b
(n
1
k
) k =
= b log
b
(n
k
k
) = b log
b
n (independent of d)
b d k =
dk
n d (independent of b)
and the number of clicks required is kd. Some indicative values of these parameters are shown in Table
6.1. Fromthe last rowwe can see, that in order to select the desired 10 objects froma peta-sized ( 10
15
)
information base, the user has to make 30 clicks.
Table 6.1: Choices and Number of Clicks
n/10 k b d Num. of Choices
b d k
Num. of Clicks
k d
531.441 ( 10
6
) 3 3 4 36 12
3.486.784.401 ( 10
11
) 5 3 4 60 20
10
15
10 3 3 90 30
In the previous analysis we have considered plain faceted taxonomies, not dynamic ones. According
to FDT, during the interaction process the only displayed terms are those whose addition to the current
selection yields a conjunction having a non empty extension. So, although the number of clicks will not
be reduced, the number of choices (i.e. the number of terms the user has to read) will be less, since each
displayed term will not have all of its b children active.
Fromthis small example we canrealize the potential of FDTonrapidly reducing very big information
spaces. We should also mention that the analysis in Sacco (2006b) shows that 3 zoom operations on leaf
terms are sufficient to reduce aninformationbase of 10
7
objects described by a taxonomy with10
3
terms
to an average of 10 objects.
Plain FDT versus FDT with Preferences w.r.t. User Effort
Note that term-scoped preferences (i.e. those that order the terms of a facet according to users pref-
erences) make the aforementioned choices less laborious since the more desired options are shown first.
Specifically if we assume that a preference relation for each facet has been defined, and we assume that
the most preferred choice from each facet is prompted first (and it is unique), then the cost of the re-
quired decisions is not b d k but 1 d k since the user just clicks on the first choice without having
to look at the rest choices.
Returning to the context of the car selection use case, if we assume that each of the 7 billions persons
living on this planet sells one car, then for n = 10
9
objects we need 10
8
descriptions if we want to reach
a block comprising 10 cars. Assuming k = 10 and degree b = 3 we get that b d k = b log
3
10
8
=
3 15 = 45 choices have to be displayed (using plain faceted taxonomies), and certainly less than 45
6.3. DiFEPreKO Hypothesis 117
using dynamic taxonomies. If we assume that preferences have been defined for each of the k = 10
facets, then the choices are reduced to 15, which is equal to the number of required clicks.
6.3 Difficulty of Formulating Effective Preferences without Knowing the
Options (DiFEPreKO) Hypothesis Evaluation Through a User Study
In this section we introduce the Difficulty of Formulating Effective Preferences without Knowing the
Options (DiFEPreKO) hypothesis:
Hypothesis Without the ability to view and explore the existing choices, the expression of preferences is time-
consuming and in most cases results to incomplete preferences (i.e. preferences that are not sufficient for selecting
the most desired option from a particular set of choices).
Initially, we provide an analytical comparison between extensional and intentional preferences re-
garding effort, completeness and correctness. Afterwards, we describe the conducted user study for
evaluating the DiFEPreKO hypothesis. We present the results and conduct a statistical significance test
to check the randomness of our results.
6.3.1 Analytical Comparison
Let A
1
, . . . , A
k
be the k attributes that are used for describing the choices, and let dom(A
i
) denote the
set of values that A
i
can take (for each i = 1..k). Let V be the cartesian product of the domains of the
attributes, i.e. V = dom(A
1
) . . . dom(A
k
).
We can consider that a complete (over V ) intentionally specified preference aims at defining a linear order
of the elements of V . Let denote by ip an intentionally specified preference and let
ip
denote the linear
order of V that ip defines.
Now consider a specific set of choices S ( S V ). We can consider that a complete extensional pref-
erence over a set of choices S aims at defining a linear order of the elements of S. Let denote by ep the
preference specification and let
ep
denote the defined linear order of S that ep defines.
Completeness and Correctness of Intentional Specified Preferences
Consider that we have an S, the user has defined an
ep
, and suppose that we consider
ep
correct. We
could say that an ip is correct and complete with respect to
ep
, if the restriction of
ip
on S is equal to
ep
.
However note that since in decision tasks humans mainly have to select the most desired element
(the hotel to book, the car to buy, the place for holidays), and not to order the entire list of available
options, in our user study (described afterwards), we will consider and compare only the first and the
second most preferred elements, i.e. only the two most preferred elements according to
ep
and
ip
.
Effort Required for Expressing Complete Preferences
We could quantify, in a very rough way, the effort required for expressing complete intentional prefer-
ences with the amount |V |. Similarly, we could quantify, in a very rough way, the effort required for
expressing extensional predicates with the amount |S|.
For instance, if we have only one attribute that takes two values, and only two objects, then |V | =
|S| = 2, meaning that it is equally laborious to express preferences intentionally or extensionally. If
on the other hand we have 10 attributes, each having 10 possible values, and two objects, then |S| = 2
while |V | = 10
10
, indicating that it is much more laborious to express preferences intentionally than
extensionally in this case.
The above specifiedcosts, donot aimat being accurate; theyaimtocapture the mainpoint. One could
easily refine the costs according to various aspects, for instance, according to the type of the attribute
values. Specifically, for an attribute A
i
we can define Cost(A
i
) = |dom(A
i
)| if categorical, else (i.e.
if the domain is arithmetic) we can define Cost(A
i
) = 1. The latter because in arithmetic attributes
(e.g. horsePower, fuelConsumption, price) commonly the user just has to express whether he prefers
the highest values, the lowest values, or those around a specific value, hence he does not have to inspect
the available values. In contrast, in categorical attributes (e.g. bodyType, brand, color), the user has
to express his preference on the specific values of the attributes. Based on the above perspective, the
cost for specifying complete intentional preferences could then defined as Cost(A) = Cost(A
1
) . . .
Cost(A
k
) (note that Cost(A) |V |).
6.3.2 User Study
In this user study 30 persons participated, 18 male and 12 female, from 7 countries and of ages between
22 - 75 years old. All of the participants had at least secondary education, while most of them had a MSc
or PhD degree
3
. The experiment had two steps, Step 1 and Step 2.
Step 1
In the first step, all participants were asked to express their preferences, according to the following:
Suppose that you have (it is obligatory) to change your car. You have to select and buy a new one, which you
will use for the next 5 years, and of course you will have to pay it. Please express your preferences on paper. This
paper will be handed to a different person who has at his disposal a limited collection of available cars. This person
will select one for you based on the available cars and the preferences that you expressed.
You have 30 minutes at most to express your preferences. You are free to express them in any form you like,
e.g. in natural language text (e.g. I prefer a car with an engine volume between 1200 and 1400 cc), by providing
an ordering of the firms according to your preference (e.g. Japanese, European or BMW, Audi), by specifying the
preferred (ideal) price, etc. Other characteristics could include year, body type, engine volume, power, max speed,
acceleration, fuel consumption, weight, fuel type, price, trunk, etc.
Please measure how much time you spent on this exercise and give us the paper.
Step 2
Immediately after completing the first step, participants continued with the second step, in order to
avoid users preferences alteration. In this step, users were given a list of cars and were asked to identify
which car was ideal for them in order to buy it. In total, the list consisted of 50 cars and is shown in Fig.
6.2 and Fig. 6.3. Again, users were asked to measure how much time they spent on finding the ideal car.
Results
Subsequently, we checked if the paper-written preferences of Step 1 would allow someone to obtain the
car selected in Step 2. In case the answer is YES then it means that the preference expression on paper
3
9 persons that participated in this evaluation were participants of the First MUMIA Training Summer School Building Next
Generation Search Systems (http://www.mumia-network.eu/index.php/training-school-2012), Olympiada, Chalkidiki, Greece.
F
i
g
u
r
e
6
.
2
:
E
v
a
l
u
a
t
i
o
n
S
t
e
p
B
:
U
s
e
r
s
S
e
l
e
c
t
a
C
a
r
f
r
o
m
t
h
e
L
i
s
t
(
1
s
t
p
a
g
e
)
F
i
g
u
r
e
6
.
3
:
E
v
a
l
u
a
t
i
o
n
S
t
e
p
B
:
U
s
e
r
s
S
e
l
e
c
t
a
C
a
r
f
r
o
m
t
h
e
L
i
s
t
(
2
n
d
p
a
g
e
)
was complete and sufficient in order to select the ideal car. If the answer is NO, then the conclusion
would be that they did not manage to express their preferences in a sufficient way, in order to get the
most desired car from the small list of available cars. In addition, we compared the times users spent in
both steps.
In order to check the results, a broker was given the forms of Step 1, with the expressed preferences
of each one of the participants. Subsequently, he was asked to select the ideal and the second ideal car
from the list of Step 2, based on the participants preferences.
Users preferences were divided in two categories: Specific and General. Specific preferences are pref-
erences that use specific values of the attributes domain (i.e. I prefer red to yellow cars or I want a car
with a displacement between 1200cc and 1400cc). General preferences are preferences that do not use specific
values (i.e. I want a cheap car or I want a car that does not pollute the environment).
The broker used a number of criteria in order to select the most ideal car for a specific user, based
on the users preferences expressed in Step 1. The following criteria, ordered according to significance,
were used for ranking the objects of our collection of cars:
1. Specific Preferences Criterion (SPC) Initially we only consider the Specific preferences applicable
to our collection. If the expressed preferences are prioritized (i.e. Firstly I prefer a car that costs
less than 10000 Euros, secondly a car with a displacement between 1200cc and 1400cc, etc.), then
cars are ranked according to the most prioritized preference, then according to the second most
prioritized preference, etc. This is the Prioritized composition we discussed in Section 3.4.1. In the
case that the user did not provide any priority order, preferences were considered equal. When a
number of preferences have the same priority, then Pareto composition is used as it was described
in Section 3.4.2. Lastly, in case of ties, the final bucket order is derived by ordering the cars of each
bucket according to the number of wins per preference, like the rules described in Section 3.3.2.
2. General Preferences Criterion (GPC) If there are still ties, when deriving the most ideal and
second ideal car, based on the ordering of cars created by the previous step, we take advantage
of the General preferences, in case they can be applied to our collection. Specifically, the broker
transformed each General preference to an ordering of cars. For example the preference I want a
cheap car, means ordering the cars of each bucket according to their price. Again the same criteria
(Priority composition, Pareto composition and the wins rule) were used to derive the ideal and the
second ideal car.
3. Broker Assumption Criterion (BAC) Finally, in the very few cases of a second tie, the broker was
free to use his own assumptions like preferring most of the times the cheaper one, or based on his
own opinion about manufacturers reliability. The above assumptions were used in our evaluation
and were sufficient for the small number of cases that we had a 2nd tie (i.e. a tie in the General
preferences).
An indicative example of the results table is shown in Table 6.2. In the first columns, the table stores
information regarding the user. The Step 1 column, holds information for Step 1, specifically the number
of Specific preferences (SP), the number of General preferences (GP), and the total number of prefer-
ences (TP), which are 3, 8 and 11 respectively in our case. Furthermore, in this specific example, the
user spent 20 minutes to express his preferences (Time). Step 2 column holds information regarding the
second step of the evaluation. Specifically, according to our example, the user selected from the list of
cars the car with id 46 (CID), after searching the list of cars for only 3 min. (Time).
Additionally, there is a grade for this specific car. This grade indicates the priority of the Specific
preferences expressed by the user and is based on the number of Specific preferences this car satisfies.
Specifically, the preference grade PG
iz
of a preference P
i
for car c
z
can take the following values:
(i.e. the preference P
i
is satisfied for this car)
i
is not satisfied for this car)
i
is not applicable for this car). These are the inactive elements which in
this case we have considered as worse than elements that satisfy this preference, but better that
elements that do not satisfy this preference.
a number if this car satisfies the corresponding value from an ordered set of preference values
(i.e. 1 for the first, 2 for the second, etc.). For example, if Fiat Audi and Audi Mercedes,
then a car made by Fiat would have a value 1 for this preference, Audi would have a value 2, etc.
and the rest of the cars made by other manufacturers would have a value of 0, which are inactive
elements and are considered as the worst in this case.
When there was no preference priority, all preferences were considered equal i.e. the grades would
be {PG
1z
, ..., PG
nz
} for a specific car c
z
. On the other hand, if P
1
is prioritized over P
2
, which is priori-
User
User Information Step 1 Step 2
Id Age Gen. Educ. Country SP GP TP Time CID Grade Time
16 23 F MSc Greece 3 8 11 20 m. 46 {{0}, {}} 3 m.
Broker
Ideal 2nd Ideal 1st vs 2nd
CID Grade CID Grade wins in
43 {{1}, {}} 28 {{2}, {}} SPC
Results
Br vs Us General
wins in NIIP IIPR NUP UPR S1 S2
SPC 0 0% 5 45.4%
Table 6.2: Example of Hypothesis Evaluation Results
tized over all the other preferences for car car
z
, then the grade would be {{PG
1z
}, {PG
2z
}, {PG
3z
, ...,
PG
nz
}}. In our example, P
1
is prioritized over P
2
and P
3
, which have equal priority. Notice that for
space reasons, we provide only grades regarding the SPC criterion.
The next columns concern the broker, and specifically the first and second ideal cars that he pro-
poses. Again we store the car ids (CID) and their corresponding grades. The next column (1st vs 2nd)
describes in which step of the brokers criteria process, the ideal car was preferred over the 2nd ideal
car, and takes a value from{SPC, GPC, BAC}.
Finally, the last seven columns give us an overview of the results. The first Br vs Us column de-
scribes in which step of the brokers criteria process, his ideal car was preferred over the users selected
ideal car, and again takes a value from {SPC, GPC, BAC}. The next column, Number of Intentional
Inconsistent Preferences (NIIP) means the number of preferences, expressed in Step 1 by the user, that
were intentionally overriden when the user selected the ideal car from the collection (e.g. choosing a
car with a displacement of 1500cc when he had expressed a preference for an engine less than 1400cc).
Intentional Inconsistent Preferences Percentage (IIPP) holds the percentage of preferences that were
overriden over the number of Specific preferences. Number of Unused Preferences (NUP), holds the
number of preferences that were not used, since they were not applicable in our collection, and Unused
Preferences Percentage (UPP) holds the percentage of preferences that were not used over the whole
number of preferences that the user expressed. Finally, (S
1
) marks if the user selected the same car as
the ideal car proposed by the broker and (S
2
) marks if the user selected the same car as the second ideal
car proposed by the broker.
The results of the evaluation are shown in Table 6.3. From the results it is obvious that only 6 out
of the 30 participants (20%) were able to select the ideal car in Step 2, according to their expressed
preferences in Step 1. This supports our initial hypothesis that without exploring the existing choices,
the expressionof preferences results to incomplete preferences. Inaddition, whenthe user didnot select
the ideal car according to his preference, the brokers ideal car won the user selected car, according to
his expressed preferences in Step 1, 79.16%during the Specific preference criteria phase (SPC), 16.67%
during the General preference criteria phase (GPC), and only 4.16% during the brokers assumption
phase (BAC).
If we also take into consideration the 2nd ideal car proposed by the broker, then the number of the
participants raises to 10 (33.3%). Notice though, that 75% of the 2nd ideal cars, lost from the ideal car
during the Specific preference criteria phase (SPC). This means that the ideal car is clearly preferred to
the 2nd ideal one, according to the expressed Specific preferences by the user. The rest 25%of these cars
lost during the brokers assumption phase (BAC). Regarding the (1st vs 2nd) column, we can see that
40% the broker was able to discriminate the ideal from the 2nd ideal car during the SPC phase, 36.6%
during the GPC phase and 23.3% during the broker biased BAC phase.
Furthermore, we can see that participants spent on average 10 minutes in Step 1 (the worst case was a
user that used the whole 30 minutes time slice). Users tried to take into consideration every aspect they
could imagine, since a test collection with the viable choices was not available. As a result the process
of preference expression was time consuming and also lead to a number of inconsistent preferences.
Specifically, each participant expressed on average 9.73 preferences, of which 6.7 were Specific prefer-
ences and the rest 3.06 were General preferences. Of the 6.7 Specific preferences, 1.54 were inconsistent
(i.e. meaning that finally the user selected an ideal car that did not satisfy this preference). 3.2 prefer-
ences per user were not applicable to our collection, and as a result were not used. This result showcases
that users spend a lot of time expressing preferences that are not consistent with their final decision or
applicable for selecting the ideal selection.
On the other hand, participants spent 4 minutes to find the ideal car from the list of 50 cars in Step 2.
We can argue that in the case of a list of thousands cars, participants would have to spent a lot more time
in order to find the ideal car. In this case we could exploit available information thinning approaches
and the proposed preference framework.
Another important conclusion is that only 2 out of the 30 participants provided a prioritized list of
preferences. This might explain the differences between the ideal car users selected and the ideal car
that was picked by the broker, since for example the price is one of the most important factors when
purchasing a car.
Finally, we can conclude that even though most of the participants have a high educational level,
they were not able to provide the appropriate preferences that could lead to the ideal car for them.
They spend on average 10 minutes in order to provide preferences that were either non applicable or
they were overridden when they chose the ideal car from the test collection.
Statistical Significance Test
We also conducted a statistical significance test to check the randomness of our results. In our evalu-
ation test we have dichotomous data, where each individual in the sample is classified in one of two
categories. The first category is the individuals who expressed preferences that can lead to the ideal car
for them (C
Ideal
) and the second category is the individuals who expressed preferences that could not
lead to the ideal car (C
NonIdeal
). A suitable statistical test in our case is a one-tailed (lower-tailed) bino-
mial significance test, since we have dichotomous data, observations are independent from each other,
probabilities of success and failure are constant across trials and the critical region falls at one end of
the possible values (Griffiths (2009)).
Our null hypothesis H
0
is:
Null Hypothesis (H
0
) More than half of the users expressed their preferences without exploring available cars,
and were returned the ideal car for them from a car collection.
Then the alternative hypothesis H
1
is:
Alternative Hypothesis (H
1
) Less than half of the users expressed their preferences without exploring avail-
able cars, and were returned the ideal car for them from a car collection.
In our case, we want the user selected car id (cid
u
) (i.e. the ideal car from the car collection for the
user) and the first ideal car id selected by the broker (cid
b
) to be the same (cid
u
= cid
b
), for more than
half of the cases.
U
s
e
r
B
r
o
k
e
r
R
e
s
u
l
t
s
U
s
e
r
I
n
f
o
r
m
a
t
i
o
n
S
t
e
p
1
S
t
e
p
2
I
d
e
a
l
2
n
d
I
d
e
a
l
1
s
t
v
s
2
n
d
B
r
v
s
U
s
G
e
n
e
r
a
l
I
d
A
g
e
G
e
n
.
E
d
u
c
.
C
o
u
n
t
r
y
S
P
G
P
T
P
T
i
m
e
C
I
D
G
r
a
d
e
T
i
m
e
C
I
D
G
r
a
d
e
C
I
D
G
r
a
d
e
w
i
n
s
i
n
w
i
n
s
i
n
N
I
I
P
I
I
P
R
N
U
P
U
P
R
S
1
S
2
1
3
2
M
M
S
c
G
r
e
e
c
e
6
3
9
4
m
.
4
0
{
1
}
2
m
.
4
0
{
1
}
4
{
3
}
S
P
C
-
0
0
%
0
0
%
2
2
5
F
M
S
c
G
r
e
e
c
e
4
5
9
5
m
.
4
6
{
}
3
m
.
4
6
{
}
4
0
{
}
G
P
C
-
0
0
%
5
5
5
.
5
%
3
2
6
F
M
S
c
G
r
e
e
c
e
7
0
7
7
m
.
3
1
{
}
1
m
.
3
0
{
}
3
8
{
}
S
P
C
S
P
C
1
1
4
.
2
%
2
2
8
.
5
%
4
2
7
M
M
S
c
G
r
e
e
c
e
3
4
7
5
m
.
4
6
{
}
3
m
.
4
6
{
}
3
0
{
}
G
P
C
-
0
0
%
0
0
%
5
2
6
M
P
h
D
G
r
e
e
c
e
5
1
6
6
m
.
4
6
{
}
2
m
.
1
5
{
}
1
6
{
}
B
A
C
S
P
C
3
6
0
%
0
0
%
6
3
0
F
P
h
D
F
r
a
n
c
e
6
1
7
1
0
m
.
4
7
{
}
5
m
.
1
0
{
}
3
2
{
}
B
A
C
S
P
C
3
5
0
%
1
1
4
.
2
%
7
3
2
M
U
n
i
v
e
r
s
.
A
u
s
t
r
i
a
8
1
9
1
0
m
.
1
6
{
}
3
m
.
3
1
{
}
3
4
{
}
B
A
C
S
P
C
3
4
2
.
8
%
0
0
%
8
3
3
F
P
h
D
E
s
t
o
n
i
a
3
5
8
7
m
.
4
8
{
}
1
m
.
4
8
{
}
4
{
}
B
A
C
-
1
3
3
.
3
%
5
6
2
.
5
%
9
3
3
M
S
c
N
o
r
w
a
y
4
7
1
1
1
0
m
.
4
5
{
}
3
m
.
4
1
{
}
4
2
{
}
B
A
C
B
A
C
1
2
5
%
7
6
3
.
5
%
1
0
3
1
F
P
h
D
B
u
l
g
a
r
i
a
4
6
1
0
1
5
m
.
3
4
{
}
1
5
m
.
3
0
{
}
3
5
{
}
G
P
C
G
P
C
0
0
%
5
5
0
%
1
1
3
2
M
P
h
D
I
n
d
i
a
1
0
0
1
0
7
m
.
4
{
{
}
,
{
}
}
5
m
.
6
{
{
}
,
{
}
}
4
{
{
}
,
{
}
}
S
P
C
S
P
C
2
2
0
%
3
3
0
%
1
2
3
1
M
P
h
D
G
r
e
e
c
e
7
3
1
0
5
m
.
4
0
{
{
}
,
{
1
}
}
2
m
.
4
0
{
{
}
,
{
1
}
}
4
2
{
{
}
,
{
2
}
}
S
P
C
-
0
0
%
0
0
%
1
3
2
9
M
M
S
c
G
r
e
e
c
e
1
1
0
1
1
1
0
m
.
3
4
{
}
5
m
.
1
1
{
}
1
6
{
}
S
P
C
S
P
C
1
9
.
0
9
%
6
5
4
.
5
%
1
4
2
8
M
M
S
c
G
r
e
e
c
e
1
0
0
1
0
1
0
m
.
1
4
{
}
3
m
.
2
6
{
}
4
3
{
}
S
P
C
S
P
C
1
1
0
%
1
1
0
%
1
5
3
7
F
M
S
c
G
r
e
e
c
e
1
5
6
2
1
1
3
m
.
4
9
{
}
4
m
.
3
3
{
}
5
0
{
}
S
P
C
S
P
C
2
1
3
.
3
%
1
4
8
2
.
3
%
1
6
2
3
F
M
S
c
G
r
e
e
c
e
3
8
1
1
2
0
m
.
4
6
{
}
3
m
.
4
3
{
}
2
8
{
}
G
P
C
S
P
C
0
9
%
5
4
5
.
5
%
1
7
7
6
M
H
i
g
h
S
c
h
.
G
r
e
e
c
e
1
4
1
1
5
1
5
m
.
4
6
{
}
5
m
.
3
1
{
}
4
6
{
}
S
P
C
S
P
C
3
2
1
.
4
%
7
4
6
.
6
%
1
8
4
4
M
P
h
D
A
u
s
t
r
i
a
5
5
1
0
3
0
m
.
1
7
{
}
3
0
s
.
3
7
{
}
1
7
{
}
S
P
C
S
P
C
2
4
0
%
4
4
0
%
1
9
2
5
M
M
a
s
t
e
r
G
r
e
e
c
e
5
6
1
1
5
m
.
4
6
{
}
2
m
.
1
5
{
}
1
4
{
}
G
P
C
G
P
C
2
4
0
%
2
1
8
.
8
%
2
0
4
2
M
M
a
s
t
e
r
G
r
e
e
c
e
1
0
4
1
4
1
4
m
.
4
6
{
1
}
1
8
m
.
2
0
{
1
}
1
7
{
1
}
S
P
C
S
P
C
1
2
0
%
7
5
0
%
2
1
4
7
F
U
n
i
v
e
r
s
.
G
r
e
e
c
e
3
2
5
1
5
m
.
1
5
{
}
1
m
.
2
0
{
}
1
7
{
}
S
P
C
S
P
C
2
6
6
.
6
%
0
0
%
2
2
4
6
M
U
n
i
v
e
r
s
.
G
r
e
e
c
e
8
0
8
5
m
.
3
0
{
}
1
m
.
3
0
{
}
4
6
{
}
S
P
C
-
1
1
2
.
5
%
1
1
2
.
5
%
2
3
2
2
M
U
n
i
v
e
r
s
.
G
r
e
e
c
e
9
0
9
8
m
.
2
1
{
}
1
m
.
4
7
{
}
5
{
}
G
P
C
S
P
C
3
3
3
.
3
%
0
0
%
2
4
2
6
F
M
S
c
G
r
e
e
c
e
4
0
4
5
m
.
3
0
{
}
5
m
.
4
6
{
}
3
0
{
}
G
P
C
G
P
C
0
0
%
1
2
5
%
2
5
3
0
F
P
h
D
G
r
e
e
c
e
5
1
2
1
7
1
0
m
.
3
6
{
}
5
m
.
3
0
{
}
1
6
{
}
G
P
C
S
P
C
1
2
0
%
1
0
5
8
.
8
%
2
6
2
6
F
M
S
c
G
r
e
e
c
e
9
0
9
7
m
.
3
1
{
}
1
m
.
1
6
{
}
3
0
{
}
B
A
C
S
P
C
3
3
7
.
5
%
0
0
%
2
7
2
5
M
M
S
c
G
r
e
e
c
e
5
4
9
1
5
m
.
4
6
{
}
5
m
.
1
6
{
}
3
2
{
}
G
P
C
S
P
C
3
6
0
%
3
3
3
.
3
%
2
8
3
2
M
M
S
c
G
r
e
e
c
e
9
0
9
1
2
m
.
3
2
{
}
5
m
.
3
0
{
}
3
9
{
}
B
A
C
S
P
C
4
4
4
.
4
%
3
3
3
.
3
%
2
9
2
6
F
M
S
c
G
r
e
e
c
e
6
4
1
0
1
0
m
.
4
8
{
}
5
m
.
1
4
{
}
3
9
{
}
G
P
C
S
P
C
3
5
0
%
3
3
0
%
3
0
2
7
M
M
S
c
G
r
e
e
c
e
3
3
6
5
m
.
4
{
}
5
m
.
4
9
{
}
3
0
{
}
G
P
C
G
P
C
0
0
%
1
1
6
.
6
%
A
v
e
r
a
g
e
V
a
l
u
e
s
6
.
7
3
.
0
6
9
.
7
3
1
0
.
1
m
4
.
0
1
m
1
.
5
3
2
4
.
4
%
3
.
2
2
8
.
7
1
%
T
o
t
a
l
N
u
m
b
e
r
6
4
T
a
b
l
e
6
.
3
:
R
e
s
u
l
t
s
o
f
t
h
e
h
y
p
o
t
h
e
s
i
s
e
v
a
l
u
a
t
i
o
n
If Y is the number of successes in n trials, then the probability of getting Y successes in n trials is
due to the binomial distribution (Griffiths (2009)):
P(Y = y) =
(
n
k
)
p
k
(1 p)
nk
=
n!
y! (n y)!
p
y
q
(ny)
where p is the probability of success and q = 1 p the probability of failure. In our case n = 30, p = 0.5
and q = 0.5. So we want to check what is the probability that 6 or less participants are successful in
finding the appropriate car for them. So, we want to calculate:
P(X 6) =
6
i=0
P(X = i)
which will provide the p-value (the probability of obtaining a test statistic at least as extreme as the one
that was actually observed). Figure 6.4 shows the probabilities of the binomial distribution for different
number of successes, the cumulative distribution function and the Type I error area (i.e. rejecting falsely
a true null hypothesis).
Regarding the significance level, according to Wasserman (2004) an value of 0.05, which is com-
monly used in the bibliography, provides a strong evidence against H
0
, while a value of less than 0.01
provides a very strong evidence against H
0
. In our case we used an value of 0.01. The value deter-
mines the risk of a Type I error.
We used the R language
4
in order to calculate the above probability. Specifically, we executed the
command:
binom.test(6, 30, 0.5, alternative="less")
which returned a p-value = 0.0007155 = 0.01. As a result we have a very strong evidence against
H
0
. So we can reject the null hypothesis H
0
and we can conclude that: Less than half of the users can
express their preferences without exploring available cars, in such a way that they can be returned the ideal car for
them from a car collection.
Furthermore, if we also consider the second ideal cars, where the number of successes is 10 then, we
4
R is an open source programming language and software environment for statistical computing and graphics. The R lan-
guage is widely used among statisticians and data miners for developing statistical software and data analysis. (http://www.r-
project.org/)
6.4. Evaluation of Various Exploration Approaches 129
Figure 6.4: Probabilities and Distribution Function of the Binomial Distribution
can execute the command:
binom.test(10, 30, 0.5, alternative="less")
which returned p-value = 0.04957 = 0.01. As a result, with a significance level = 0.01 we
cannot reject the H
0
. But if we relax our significance level to = 0.05, then p-value = 0.04957 =
0.05. And as a result we have a strong evidence (instead of the very strong evidence that is provide by
= 0.01), to reject the H
0
and accept the H
1
.
6.4 Evaluation of Various Exploration Approaches
We conducteda comparative evaluationbetweena) aninterface providing FDTover static metadata b) an
interface providing a clustering algorithmand c) a combination of FDT with both static and dynamically
mined metadata (i.e. clustering). The purpose of this evaluation was to prove the effectiveness of the
FDT scheme over other exploratory schemes like clustering.
Thirteen users participated in the evaluation with ages ranging from20 to 30, 61.5%males and 38.5%
females. We can distinguish two groups: the advanced group consisting of 3 users and the regular one
consisting of 10 users. The advanced users had prior experience in using clustering and multidimensional
browsing services, while the regular ones had not.
We specified four information needs (or tasks) of exploratory nature, and for the first three we spec-
ified three variations for each. According to Lindgaard and Chattratichart (2007), a big number of tasks
can improve usability test performance. All tasks were refined using the task refinement steps described
in Kules and Capra (2008):
The task descriptions should include words or semantically close terms that are values of a facet.
By using keywords of the task description, the user should not complete the task by using the first
10 results of the answer set (else the task would be too easy and would not requiring exploratory
search).
The facets should be useful without having to click the show more link of a facet.
In the described comparative approach the idea is to let users compare a number of different systems
and rank them according to the following different criteria:
Log Data Analysis
During the evaluation we logged and counted for each user: (a) the number of submitted queries and (b)
the number of clicked zoom points (by facet).
Task Completeness
We measured the average percentage of the correct URLs that users found (both regular and advanced
users) for the evaluating tasks in each user interface, out of the total number of correct URLs in our
testbed.
User Preference
To identify the most preferred interface (for regular and advanced users) we aggregated the preference
rankings for each task using Plurality Ranking (i.e. we count the first positions), and Borda ranking Borda
(1781) (i.e. we summed the positions of each interface in all rankings). In a Plurality column, the higher
6.4. Evaluation of Various Exploration Approaches 131
a value is, the better (i.e. the more first positions it got), while in a Borda column the less a value is, the
better. The rows marked with All show the sum of the values of all tasks. With bold we have marked the
best values for each task.
User Satisfaction
Users ranked the interfaces based on their satisfaction, and we aggregated the satisfaction rankings for
each task again using Plurality Ranking and Borda ranking.
User Friendliness
Users ranked the interfaces based on their user friendliness, and we aggregated the rankings for each
task again using Plurality Ranking and Borda ranking.
The results, which are described in detail at Papadakos et al. (2012a) showed that the FDT-based
approaches were the most preferred. Specifically, users both advanced and regular ones, were able to
achieve a significantly higher degree of task completeness with the FDT-based approaches, instead of
the plain clustering one. Furthermore, they submitted the least number of queries with FDT interfaces
(advanced users made more than 50%less queries with FDT). The plain clustering interface was the least
preferred for 58.3% of the advanced users and for 65% of the regular, while the (c) interface was the
most preferred one for the advanced users. For regular users there was a tie between (a) and (c). Finally,
regarding satisfaction, 55%of the advanced users were highly satisfied from(c), while 50%of the regular
users were satisfied by (a). Only 16.6% of the advanced users and 12.5% of the regular users were highly
satisfied from the plain clustering interface.
Furthermore a statistical analysis was conducted, where the upper and lower limits with 95% confi-
dence were computed. For regular users, we made the following observations:
Only 5% of the regular users with a 9.2 error were not satisfied by the FDT interface (A)
Only 5% of the regular users with a 15.91 error have low preference for the combined interface
(C)
Only 12.5% of the regular users, with a 20 error, find the clustering interface highly satisfactory
Only 12.5%of the regular users, with a 20 error, have lowsatisfaction for the combined interface
(C)
For the advanced users, we did not come up with a clear conclusion due to the big errors.
Summarizing the results of the evaluation, the UIs providing the FDT interaction scheme over static
metadata was the most preferred UIs for regular users. On the other hand, advanced users preferred by
a small margin an FDT UI which in addition provided dynamic metadata through a clustering algorithm.
In any case, the browsing interaction scheme of FDT with or without dynamic metadata was preferred,
provided better user satisfaction, and resulted in a higher task completeness degree with less queries,
than other browsing interaction schemes (i.e. plain clustering).
6.5 Evaluation of Hippalus System
We evaluated the Hippalus systemover two different user groups, plains users and expert users. These
two groups were asked to complete a number of tasks over two different UIs:
a) UI
1
: Hippalus system with exploration and browsing capabilities only (preference functional-
ity was disabled)
b) UI
2
: Hippalus systemwith exploration and browsing capabilities and preference functionality
enabled
We compared the above interfaces with respect to ease of use, ease of learning, usefulness, user preference,
user satisfaction, and task accomplishment. Furthermore, we wanted to examine how users really used the
above interfaces and for this reason we conducted a log analysis (usage-based evaluation). Finally, for
each user action we calculated a number of the metrics that were described in Section 6.1.2, so that we
could evaluate each user action. Figure 6.5 depicts the steps of our evaluation process.
Participants
In this study, 26 persons
5
, males and females of varying age (i.e. between 23-43 years) and expertise
(i.e. tertiary education - PhD level) participated. We formed two groups. The first group, named plain
5
According to Faulkner (2003), 10 evaluators are enough for getting more than 82% of the usability problems of a user
interface (at least in their experiments).
6.5. Evaluation of Hippalus System 133
Figure 6.5: Comparative Evaluation Process
users, consisted of 20 regular users, while the second one, expert users, consisted of 6 people with a
prior experience in using multi-dimensional services and preferences. Before starting the evaluation,
users were given a simple tutorial of 15 minutes
6
to all the participants of the evaluation. Specifically,
initially users were given a description of the information base (domain, attributes). In the next five
minutes they were described the interactive process of information thinning and finally the rest of the
tutorial demonstrated the preference actions by showing specific examples. Users were allowed to get
acquainted with the UI and complete a number of simple tasks.
6
A video is available with the tutorial in http://www.youtube.com/watch?v=Cah-z7KmlXc
Attribute Users percentage Attribute Users percentage
Price 90% (27/30) Trunk 40% (12/30)
Manufacturer 90% (27/30) Year 36.7% (11/30)
Engine Volume 80% (24/30) Number of doors 33.3% (10/30)
Body Type 73.3% (22/30) Max Speed 16.7% (5/30)
Fuel type 53.3% (16/30) Torque 13.3% (4/30)
Power 43.3% (13/30) Acceleration 13.3% (4/30)
Consumption city 43.3% (13/30) Drive System 6.7% (2/30)
Consumption national 43.3% (13/30)
Table 6.4: Percentages of the 30 Users that Expressed a Preference Over a Valid Attribute
Information Base
We used an information base of 50 cars, indexed under a big number of classes and subclasses. Specif-
ically, there is a total of 23 classes and 85 subclasses. Some of them are hierarchically organized, like
Manufacturer and some other are flat like Vehicle Type (an example is shown in Figure 5.5).
Tasks
A question during the design of the user tasks was which attributes to use. So, in order to provide repre-
sentative tasks, we designed them on top of attributes for which real users expressed their preferences
(these user preferences were collected in the evaluation described previously in Section 6.3). Specifi-
cally, Table 6.4 shows the percentages of the 30 users that participated in the evaluation described in 6.3,
who expressed a preference over an attribute which is valid in our information base. Users expressed
preferences for a total of 15 attributes that appear in our information base (and for a number of other
attributes valid in our collection base, like color, ABS, etc.). In order to create the task of this evaluation
we identified the most important attributes for which users expressed their preferences. Specifically,
we only considered those with a percentage bigger than 50% (i.e. price, manufacturer, engine volume,
body type and fuel) for the design of the tasks. Notice that for the hierarchically organized values of at-
tribute Manufacturer, a number of users expressed preferences of the formAudi is better than BMW, while
others expressed preferences like Japanese are better to European which are better to American and Korean, so
we try to capture both of them in our task description. Finally, in this specific evaluation we make the
assumption that the user does not change his preference criteria as he is exploring the available choices.
We created two variations of equal
7
tasks for the plain users evaluation and for the expert users eval-
7
In our context task equality is defined as tasks that consist of the same kind of preference actions and criteria.
uation. Each task, in the first subtask used prioritized preference actions, while in the second one used
Pareto composition. Tasks for plain users were designed on top of only 3 criteria. The tasks regarding
the expert users, were more difficult and complicated, since they used 6 different criteria. Specifically,
the tasks that users completed were the following:
Plain User-Based Evaluation
Task A You are supposed to buy a new car, which you will select through the Hippalus system. In
order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine
Volume: You would like a car with an engine volume around 1200cc . b) Price: You are willing to
pay around 10000 Euros. c) Manufacturers: Generally, you prefer European to Korean, and German
manufacturers to other European. You consider Japanese cars better than Korean ones.
Subtask 1: Which are the best cars according to the above description, if you consider that a) and
b) are equally important and the most important criteria for you, followed by the criterion c).
Subtask 2: Which are the best cars according to the above description, if you consider that all of
the 3 criteria are equally important?
Task B You are supposed to buy a new car, which you will select through the Hippalus system. In
Volume: You would like a car with an engine volume around 1600cc. b) Price: You are willing to pay
around 14000 Euros. c) Manufacturers: Generally, you prefer Asian manufacturers to European. From
European you prefer German. Finally, European are better than American.
Subtask 1: Which are the best cars according to the above description, if you consider that a) and
b) are equally important and the most important criteria for you, followed by c).
Subtask 2: Which are the best cars according to the above description, if you consider that all of
the 3 criteria are equally important?
Expert-Based Evaluation
Task A You are supposed to buy a new car, which you will select through the Hippalus system. In
around 10000 Euros. c) Manufacturers: Generally, you prefer European manufacturers to American,
and German to other European manufacturers. You consider Japanese cars better than Korean ones. d)
Body Type: You want a car with a hatchback body type. Finally, e) Fuel type: fuel should be gasoline
and not diesel and f) Year: you prefer a modern car, i.e. a car from a recent year.
Subtask A.1: Which are the best cars according to the above description, if you consider that a), b)
and c) are equally important and the most important criteria for you, while d) is more important
than e) which is more important than f).
Subtask A.2: Which are the best cars according to the above description, if you consider that all
of the 6 criteria are equally important?
Task B You are supposed to buy a new car, which you will select through the Hippalus system. In
around 14000 Euros. c) Manufacturers: Generally, you prefer European manufacturers to Asian and
German to other European. Finally, you prefer Japanese to Korean. d) Body Type: You do not want a car
with a body type of a minivan. Finally, e) Fuel type: fuel type should be diesel and f) Doors: you prefer
a car with 5 doors instead of 3.
Subtask B.1: Which are the best cars according to the above description, if you consider that a), b
and c) are equally important and the most important criteria for you, while d) is more important
than e) which is more important than f).
Subtask B.2: Which are the best cars according to the above description, if you consider that all
of the 6 criteria are equally important?
We used rotation and counterbalancing, in order to control for order effects and to increase the
chance that results can be attributed to the experimental treatments and conditions (Kelly 09). Specif-
ically, we used a Graeco-Latin Square Design, rotating both the order of tasks and the order in which
subjects experience the interfaces. Specifically, we created 4 user groups, UG
P1
, UG
P2
, UG
P3
, and
UG
P4
8
. Each group completed the tasks as shown in Table 6.5, where column headings represent points
8
Unfortunately we only formed two groups of expert users UGE1 and UGE2, since only 6 experts were available.
Users Time 1 Time 2
UGP1 UI1 : TaskA1, TaskB2, UI2 : TaskA2, TaskB1
UGP2 UI1 : TaskA2, TaskB1 UI2 : TaskA1, TaskB2
Table 6.5: Graeco-Latin Square Design
in time and order and the rows represent subjects
9
.
Evaluation
Users were askedto evaluate two different UIs over the Hippalussystem, using the previously described
tasks. In the first UI (UI
1
), preference actions were disabled. As a result, in order to complete the afore-
mentioned tasks, they browsed the car collection and used the available information thinning function-
ality (selection of appropriate facets and terms to restrict their focus). The second UI (UI
2
), in addition
to the information thinning functionality described previously, provided on top of it the proposed in
this thesis preference actions through context menus. For both UIs, the users provided the set of cars
which they believed fulfilled the needs of each task.
For each task, an expert user provided the ordering of the collection according to preference. The
order was a bucket order, meaning that two cars can be incomparable (i.e. equally preferred ).
Users provided scores for the two exploratory systems, regarding Ease of use, Ease of learning,
Usefulness, Preference and Satisfaction using a psychometric Likert scale. We calculated Effective-
ness (Task completeness) and Efficiency (Time to complete a task) using the logged data.
Main Results
We gathered a number of interesting results from this evaluation. The main results can be synopsized
to the following:
All plain users preferred the preferences UI instead of the non-preferences UI. Specifically 75%
of the 20 plain users preferred the preference UI very strongly, 20% strongly and only 5% strongly
enough. In addition all 6 expert users preferred the preference UI, 50% of them very strongly and
the other 50% strongly.
9
For expert users due to their low number, we only rotated the interfaces.
The preference-enabled UI, allowed the users to complete successfully all the tasks, in average
less than a third of the time and with a third of user interactions compared to the plain FDT UI.
None of the users was able to successfully complete both of the tasks with the plain UI (only 1
expert user and 2 plain users completed successfully one of the two tasks they were assigned using
UI
1
).
As a result we verify the conclusions of the theoretical user effort analysis, since the preference-
based UI helped the users to find the desired results in less time and with fewer actions and less
decisions.
Fine Grained Results
Here we discuss in more detail the results of this evaluation.
Specifically, Figure 6.6 (a) depicts the aggregated results according to Plurality (i.e. how many times
each UI was ranked first) and Borda (i.e. the total score each UI gathered from all users and tasks), re-
garding Ease of Use, while Figure 6.6 (b) for Usefulness, Figure 6.6 (c) for Preference and Figure 6.6 (d) for
Satisfaction respectively. Scores are given for both plain and expert users.
It is easy to see that for each one of the above criteria, U
2
(i.e. the UI with preferences) was ranked
almost always first for both expert and plain users. There were a number of ties between the two UIs,
especially in the case of plain users (e.g. 14 ties regarding Satisfaction). Notice though, that the less ties
(i.e. more wins for UI
2
) are in the case of the Preference criterion, where UI
2
is a clear winner.
Regarding the total scores of each UI according to Borda, for plain users, UI
2
scored on average
almost always 1/3 more than UI
1
. Specifically, UI
2
reached almost 9/10 of the top score (200 in the
case of plain users) a system could score. On the other hand, expert users gave a bit lower rankings for
UI
2
, which in this case reached 3/4 of the top score (60 in the case of expert users). Again, UI
2
was a
clear winner over all criteria, while UI
1
reached 1/2 of the top score.
Table 6.6 reports the average, max and min timings and actions per each user group of both plain
and expert users. From the results it is obvious that the timings and number of user actions of UI
2
(i.e. the UI with preferences) are much smaller than the ones gathered using UI
1
(i.e. the UI without
preferences). Furthermore, we can see that the deviations of min and max actions and timings of UI
2
Figure 6.6: Plurality and Borda results for (a) Ease of Use, (b) Usefulness, (c) Preference and (d) Satisfaction.
from the average ones are also much smaller than the respective deviations of UI
1
10
. In addition, the
timings and interactions for expert users is bigger for UI
2
than the plain users since the users had to
express a lot more preference actions.
In more detail, Table 6.7 reports the average, max and min timings and actions per all and per each
task for both UIs. Lets discuss first the average timings and user actions for all tasks, for both plain and
expert users. It is obvious that UI
2
is much more efficient in terms of timings and interactions for both
user groups. Specifically, plain users on average were almost 3 times more efficient with UI
2
instead of
UI
1
for both timings and user actions. On the other hand, expert users were on average more than 3.3
times more efficient and made half the interactions with UI
2
instead of UI
1
11
.
10
Notice that since a number of users were checking the correctness of the preferred cars returned by Hippalus for UI2,
the timings and numbers of user actions reported here should be bigger than the results that would be gathered from users
that are confident about Hippalus.
11
It seems that expert users were more conservative regarding their interactions with the system.
Plain UGP1 UI1 A1 (sec) A1 (act.) B2 (sec) B2 (act.) U2 A2 (sec) A2 (act.) B1 (sec) B1 (act.)
Average 688.71 108.2 681.00 86.2 379.40 43 223.74 38.8
Max 1071.53 176 1104.38 122 803.70 58 409.92 49
Min 270.75 71 347.41 50 146.19 31 130.22 25
Average 220.06 41.8 226.52 39.2 596.15 100.2 705.22 100.8
Max 357.92 75 367.29 61 1438.11 165 1744.77 167
Min 136.49 32 151.04 31 294.34 53 178.76 43
Average 284.58 36.2 154.47 33.2 878.51 136.6 554.29 87.8
Max 347.59 45 189.43 42 1431.88 297 881.39 110
Min 202.60 29 121.29 25 500.75 73 256.45 51
Average 949.86 116.4 631.56 125.2 243.77 37.8 133.99 33
Max 1824.65 166 1062.11 205 391.87 52 188.46 36
Min 480.35 77 119.93 19 146.22 33 75.82 27
Expert UGE1 UI1 A1 (sec) A1 (act.) B2 (sec) B2 (act.) U2 A2 (sec) A2 (act.) B1 (sec) B1 (act.)
Average 862.46 142.33 1020.00 125.33 192.47 47.33 308.85 57.33
Max 1416.10 246 1636.02 157 260.92 58 355.73 64
Min 441.72 59 394.92 70 61.67 27 282.89 50
Expert UGE2 UI2 B2 (sec) B2 (act.) A1 (sec) A1 (act.) U1 A2 (sec) A2 (act.) B1 (sec) B1 (act.)
Average 280.99 69.33 365.20 70.66 1083.08 148.33 842.79 92.66
Max 346.04 75 580.59 85 1530.93 194 1447.35 100
Min 185.60 58 246.26 42 853.40 57 434.22 78
Table 6.6:
Plain and Expert Users Average, Max and Min Timings and User Actions for each
Task for both UIs per each User Group
The task that benefited the most fromthe preference interaction,seems that for plainusers, was Task
B
2
, since the speedupfor timings was 4.80x. Onthe contrary, the speedupfor user actions was almost the
same for all tasks. Furthermore, notice that for Task B
1
, there was a plain user that completed the task
quickly and with only 19 interactions
12
. On the other hand, regarding expert users, Task A
2
benefited
the most from the preference interaction, since the speedup for timings was 5.62x and 3.13x regarding
user actions. On the other tasks the speedup for user actions was much smaller. The above results are
synopsized in Figure 6.7 (a).
Finally, the preference-based approach gave better average values for each used metric during the
session of each exploratory task. Specifically, none of the users was able to successfully complete both
of the tasks with the plain UI UI
1
. On the contrary all the users completed successfully all the tasks with
UI
2
, a result that highlights the user-friendliness and efficiency of the proposed interaction scheme.
12
The correct answer of this task included 4 cars and this user found only one of them.
Plain All Tasks UI1 (sec) UI2 (sec) Speedup U1 (act.) U2 (act.) Speedup
Average 710.66 233.32 3.04x 107.67 37.87 2.84x
Plain Task A1
Average 642.43 264.17 2.43x 104.2 37 2.81x
Max 1438.11 391.87 3.66x 176 52 3.38x
Min 270.75 146.22 1.85x 53 29 1.82x
Plain Task A2
Average 914.18 299.73 3.04x 126.5 42.4 2.98x
Max 1824.65 803.702 2.27x 297 75 3.96x
Min 480.35 136.49 3.51x 73 31 2.35x
Plain Task B1
Average 592.93 225.13 2.63x 106.5 39 2.73x
Max 1062.11 409.92 2.59x 205 39.2 5.22x
Min 119.93 130.22 0.92x 19 38.8 0.48x
Plain Task B2
Average 693.11 144.23 4.80x 93.5 33.1 2.82x
Max 1744.77 189.43 9.21x 167 42 3.97x
Min 178.76 75.82 2.35x 43 25 1.72x
Expert All Tasks UI1 (sec) UI2 (sec) Speedup U1 (act.) U2 (act.) Speedup
Average 952.08 286.88 3.32x 127.17 61.17 2.08x
Expert Task A1
Average 862.46 365.20 2.36x 142.33 70.66 2.01
Max 1416.11 580.60 2.44x 246 85 2.89x
Min 441.72 246.27 1.79x 59 42 1.40x
Expert Task A2
Average 1083.08 192.47 5.63x 148.33 47.33 3.13x
Max 1530.94 260.92 5.87x 194 58 3.35x
Min 853.41 61.68 13.84x 57 27 2.11x
Expert Task B1
Average 842.79 308.85 2.73x 92.67 57.33 1.62x
Max 1447.35 355.73 4.07x 100 64 1.56x
Min 434.22 282.90 1.53x 78 50 1.56x
Expert Task B2
Average 1020.00 281.0 3.00x 125.33 69.33 1.34x
Max 1636.03 346.05 4.73x 157 75 1.33x
Min 394.92 185.61 2.13x 70 58 1.34x
Table 6.7:
Plain and Expert Users Average, Max and Min Timings and User Actions per each
Task and All Tasks for both UIs
Notice, that 1 expert user and 2 plain users managed to successfully complete one of the two tasks they
were assigned using UI
1
. The above are depicted in the calculated Recall, Precision, and Average Preci-
sion metrics, which are reported in Table 6.8. Notice that on average for all tasks, there is a 2.30x and
3.49x improvement regarding average precision for plain and expert users respectively. Task B
2
seems
to be the most difficult task for both plain and expert users, since the biggest gains in all three metrics
Figure 6.7:
Average Values inLast Step of EachTask. (a) for Timings (T) and Actions (A), while
(b) Depicts the Values for Recall (R), Precision (P) and Average Precision (AP)
were observed here (i.e. regarding average precision more than 3.62x improvement for plain and 6.36x
improvement for expert users.) Furthermore, notice that there were higher improvements per each
metric for expert users with UI
2
, since their tasks were much more complicated and the number of cri-
teria was bigger than the tasks of the plain users. As a result, although experts, these users achieved
lower rankings with UI
1
for almost all metrics. The above results are synopsized in Figure 6.7 (b). Since
the results of the two approaches show such significant differences for the basic metrics of Recall, Pre-
cision and Average Precision, we did not consider evaluating the other more refined metrics described in
Section 6.1.
6.6 Evaluation Conclusion
In this chapter we have discussed a number of evaluation metrics and approaches for exploratory search
and we selected those that could apply in our preferencebased approach.
In addition, the provided theoretical analysis of user effort in FDT interaction schemes described in
Section 6.2, shows the benefits of the FDT interaction (i.e. small number of interactions and decisions).
Specifically, the section provided an example where a user could find the desired 10 objects in a peta-size
collection with only 30 clicks (number of decisions is 90). The extension of this study to the proposed
preference-enriched scheme, assuming that a user has expressed a preference relation for each facet
and that the most preferred choice is prompted first, shows that the number of decisions is reduced
to the number of clicks (i.e. to 30 for a peta-sized collection).
6.6. Evaluation Conclusion 143
Metric Plain Users Expert Users
All Tasks UI1 UI2 Improv. UI1 UI2 Improv.
Recall 0.56 1 1.7x 0.52 1 1.92x
Precision 0.61 1 1.62x 0.43 1 2.31x
Average Precision 0.433 1 2.30x 0.28 1 3.49x
Task A1 UI1 UI2 Improv. UI1 UI2 Improv.
Recall 0.63 1 1.57x 0.66x 1 1.5x
Precision 0.48 1 2.06x 0.44x 1 2.25x
Average Precision 0.42 1 2.33x 0.44x 1 2.25x
Task A2 UI1 UI2 Improv. UI1 UI2 Improv.
Recall 0.55 1 1.81x 0.33 1 3x
Precision 0.80 1 1.24x 0.53 1 1.87x
Task B1 UI1 UI2 Improv. UI1 UI2 Improv.
Recall 0.61 1 1.62x 0.83 1 1.2x
Precision 0.77 1 1.28x 0.38 1 2.57x
Task B2 UI1 UI2 Improv. UI1 UI2 Improv.
Recall 0.46 1 2.14x 0.25 1 4x
Precision 0.39 1 2.53x 0.36 1 2.76x
Table 6.8:
Plain and Expert Users Recall, Precision and Average Precision Metrics per each
and all Tasks for both UIs
Subsequently, we formulated the an hypothesis expressing Difficulty of Formulating Effective Pref-
erences without Knowing the Options (DiFEPreKO), and the conducted user study showed that without
the ability to explore the existing choices, the expression of preferences is time-consuming and in most
cases results to incomplete preferences. Specifically, we found that only 20% of the users were able to
identify the ideal car froma list of cars according to their previously expressed preferences (the percent-
age raises to 33% if we also consider the second ideal car). Furthermore, users expressed preferences
that were inconsistent to their final decision (23% of the preferences). The statistical analysis over
the results, provide a strong evidence against the formulated null hypothesis and we can conclude that
without exploring available cars only less than half of the users can express their preferences in
a way sufficient for returning the ideal car for them from a car collection.
We also conducted two comparative user studies, one for evaluating the FDT per se and another
for the proposed preference-enriched FDT interaction. The first one was conducted over the Mitos
WSE and evaluated a number of different exploratory interfaces. This evaluation showed that the UI
that supported the FDT interaction scheme over static metadata was the most preferred UI among the
regular users. On the other hand, advanced users preferred by a small margin the FDT UI which in
addition to the static metadata, also uses dynamic metadata through a clustering algorithm. In any case,
the browsing interaction scheme of FDT with or without dynamic metadata was preferred, provided
better user satisfaction, and resulted in a higher task completeness degree with less queries, than
the other browsing interaction schemes (i.e. plain clustering).
Finally, the seconduserbasedcomparative evaluationthat we conductedover the Hippalussystem,
showed that 100% of the users (both expert and plain ones) preferred the preferencebased UI, a
result that was supported by each distinct qualitative result. The preference-enabled UI, allowed users
to complete successfully all the tasks, in average less than a third of the time and with a third of user
interactions compared to the plain FDT UI. Furthermore, none of the users was able to successfully
complete both of the tasks with the plain UI (1 expert user and 2 plain users completed successfully one
of the two they were assigned using UI
1
). As a result we verify the conclusions of the theoretical user
effort analysis, since the preference-based UI helps users to find the desired results in less time and with
fewer actions and less decisions. Finally, the preference-based approach gave better average values for
each used metric during the session of each exploratory task.
Chapter 7
Conclusion and Future Research
Contents
7.1 Synopsis of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 Directions for Future Work and Research . . . . . . . . . . . . . . . . . . . . . . 148
7.1 Synopsis of Contributions
In this thesis we motivated the need for real-time preference elicitation and we introduced a language
(including its syntax, semantics andGUI-level exploitationmethods) for enriching the interactionscheme
of Faceted and Dynamic Taxonomies (FDT) with preference elicitation and preference-based interaction. Key
aspects of the proposed approach include, the support of hierarchically organized values, the support of
set-valued attributes, and the incremental preference specification mode with the scope-based method for re-
solving conflicts. In addition, the rapid reduction of the information space that is possible with FDT, makes
preference-based ordering feasible on large information bases, since the introduced algorithms for pro-
ducing the preference order are independent of the size of the information base; they depend on the size of
the focus, and the number of the preference actions enacted by the user. Furthermore, we provided a
top-k variation of the algorithm suitable for the case where the size of the focus is big.
To demonstrate the feasibility of our approach and for identifying possible difficulties or other is-
145
146 Chapter 7. Conclusion and Future Research
sues related to implementation and application, we have designed and implemented a proof of concept
prototype, the Hippalus system. This systemprovides exploration services over RDF information bases
and supports the introduced preference framework through HTML 5 context menus. Specifically, the user
is able to order classes, subclasses and objects and he can compose object related preferences, using Priority,
Pareto and Pareto Optimal compositions.
We provided a theoretical analysis of user effort in FDT interaction schemes, plain and preference
enabled ones, that suggests the effectiveness of the proposed approach in respect to the interaction and
decision cost. In addition, we formulated the Difficulty of Formulating Effective Preferences without
Knowing the Options (DiFEPreKO) hypothesis and the conducted user study showed that without the
ability to explore the existing choices, the expression of preferences is time-consuming and in most
cases results to incomplete preferences. Finally, we conducted two comparative user studies, one for
evaluating the FDT per se and another for the proposed preference-enriched FDT interaction. The first
one was conducted over the Mitos WSE and suggested that the browsing interaction scheme of FDT with
or without dynamic metadata was preferred, provided better user satisfaction, and resulted in a higher task
completeness degree with less queries, than the other browsing interaction schemes (i.e. plain clustering).
The second one, conducted over the Hippalus system, showed that 100% of the users (both expert and
plain ones) preferred the preferencebased UI. In more detail this UI allowed users to complete success-
fully all the tasks, in less than a quarter of the time and with a quarter of user interactions compared to the plain
FDT UI. Furthermore, none of the users was able to successfully complete any of the tasks with the plain
UI. As a result we verify the conclusions of the theoretical user effort analysis, since the preference-based
UI helps users to find the desired results in less time and with fewer actions and less decisions.
7.2 Directions for Future Work and Research
There are several issues that are worth further work and research.
As regards applicability, it is worth developing wrappers that can be used for feeding (synchronously
or asynchronously) Hippalus with the results of queries from web search engines (e.g. at least those
which are OpenSearch compatible), database sources, SPARQL queries, etc. The availability of such wrap-
pers can lead to a generic client of search services that can bring the benefits of Hippalus system to a
plethora of users. Furthermore, up to now Hippalus does not support multi-valued attributes.
7.2. Directions for Future Work and Research 147
Regarding the interaction model, we have not realized any substantial requirement for change or ad-
vancement. This is also supported by the results of the user study.
As far as the algorithmic part is concerned, in this thesis we strongly suggest a process that contains
both information thinning and preference actions, since apart from giving users the required overview
for decision making, it also significantly reduces the computational effort for deriving the preference-
order. But it is still interesting to investigate optimizations for the case where the current answer is very
big, i.e. to further research the direction described in Section 4.
Finally, considering the structure of the information space (either of the information corpus or the
search results), (i.e. objects described according to a multidimensional space with hierarchically orga-
nized values), one possible future direction could be to consider more complex structures. For example,
objects described with values accompanied by numbers expressing various quality aspects like accuracy
Powley and Dale (2007), specificity Tzitzikas et al. (2013), certainty Webber et al. (2012), trust, authority
and popularity Kazai and Milic-Frayling (2008), etc. Then we can investigate the required advancements
of both the interaction model and the preference framework.
148
References
Abel, F., Celik, I., and Siehndel, P. 2011. Towards a Framework for Adaptive Faceted Search on Twitter.
InProcs of the International Workshop onDynamic andAdaptive Hypertext (DAH11), ACMHypertext, Eindhoven,
The Netherlands.
Agrawal, R., Borgida, A., and Jagadish, H. 1989. Efficient Management of Transitive Relationships in
Large Data and Knowledge Bases. ACM SIGMOD Record 18, 2, 253262.
Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S. 2009. Diversifying Search Results. In Procs of the
Second ACM International Conference on Web Search and Data Mining (WSDM09). ACM, New York, NY, USA,
514.
Agrawal, R. and Wimmers, E. L. 2000. AFramework for Expressing and Combining Preferences. In Procs
of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD 00). ACM, New York, NY,
USA, 297306.
Andreka, H., Ryan, M., and Schobbens, P.-Y. 2002. Operators and Laws for Combining Preference Rela-
tions. Journal of Logic and Computation 12, 1, 1353.
Azzopardi, L. 2009. Usage Based Effectiveness Measures: Monitoring Application Performance in Infor-
mation Retrieval. In Procs the 18th ACM Conferemce on Information and Knowledge Management (CIKM09).
ACM, New York, NY, USA, 631640.
Balke, W.-T. and Gntzer, U. 2004. Multi-Objective Query Processing for Database Systems. In Procs of
the Thirtieth International Conference on Very large Data Bases (VLDB04). VLDB Endowment, 936947.
Barrett, R. and Salles, M. 2006. Social Choice with Fuzzy Preferences. Economics Working Paper
Archive (University of Rennes 1 & University of Caen), Center for Research in Economics and Man-
agement (CREM), University of Rennes 1, University of Caen and CNRS.
149
150 References
Basu, C., Hirsh, H., and Cohen, W. W. 1998. Recommendation as Classification: Using Social and Content-
Based Information in Recommendation. In In Procs of the Fifteenth National Conference on Artificial Intel-
ligence (AAAI/IAAI98). 714720.
Becker, C. and Bizer, C. 2009. Exploring the Geospatial Semantic Web with DBpedia Mobile. Web Seman-
tics: Science, Services and Agents on the World Wide Web 7, 4, 278 286.
Ben-Yitzhak, O., Golbandi, N., HarEl, N., Lempel, R., Neumann, A., Ofek-Koifman, S., Sheinwald, D.,
Shekita, E., Sznajder, B., and Yogev, S. 2008. Beyond Basic Faceted Search. In Procs of the Interna-
tional Conference on Web Search and Web Data Mining, (WSDM08). Palo Alto, California, USA, 3344.
Binshtok, M., Brafman, R. I., Shimony, S. E., Martin, A., and Boutilier, C. 2007. Computing Optimal
Subsets. In Procs of the 22nd National Conference on Artificial Intelligence - Volume 2 (AAAI07). AAAI Press,
12311236.
Bizer, C., Heath, T., and Berners-Lee, T. 2009. Linked Data - The Story So Far. International Journal of
Semantic Web Information Systems 5, 3, 122.
Borda, J. C. 1781. Memoire sur les Elections au Scrutin. Histoire de lAcademie Royale des Sciences,
Paris.
Bot, R. S. and Wu, Y. B. 2004. Improving Document Representations Using Relevance Feedback: The RFA
Algorithm. In Procs of the 13th ACM International Conference on Information and Knowledge Management
(CICM04). Washington, USA.
Boutilier, C., Brafman, R. I., Domshlak, C., Hoos, H. H., and Poole, D. 2004. CP-nets: A Tool for Repre-
senting and Reasoning with Conditional Ceteris Paribus Preference Statements. Journal Of Artificial
Intelligence Research 21, 135191.
Brafman, R. I., Domshlak, C., Shimony, S. E., and Silver, Y. 2006. Preferences Over Sets. In Procs of the
21st National Conference on Artificial Intelligence - Volume 2 (AAAI06). AAAI Press, 11011106.
Braziunas, D. 2006. Computational Approaches to Preference Elicitation. Tech. rep., Department of
Computer Science, University of Toronto.
References 151
Breese, J., Heckerman, D., and Kadie, C. 1998. Empirical Analysis of Predictive Algorithms for Collab-
orative Filtering. In Procs of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98).
Morgan Kaufmann, San Francisco, CA, 4352.
Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. N. 2005.
Learning to Rank Using Gradient Descent. In Procs of the 22nd international conference on Machine learn-
ing (ICML05). 8996.
Bystrm, K. and Jrvelin, K. 1995. Task Complexity Affects Information Seeking and Use. In Information
Processing and Management. 191213.
Brzsnyi, S., Kossmann, D., and Stocker, K. 2001. The Skyline Operator. In Procs of the 17th International
Conference on Data Engineering (ICDE01). 421430.
Callan, J. 1996. Document Filtering with Inference Networks. In Procs of the 19th Annual International
Conference on Research and Development in Information Retrieval (SIGIR96). New York, NY, USA, 262269.
Carpineto, C., Osiski, S., Romano, G., and Weiss, D. 2009. A Survey of Web Clustering Engines. ACM
Computing Surveys 41, 3, 17:117:38.
Carterette, B., Kanoulas, E., and Yilmaz, E. 2011. Simulating Simple User Behavior for System Effective-
ness Evaluation. In Procs of the 20th ACM International Conference on Information and Knowledge Manage-
ment (CIKM11). ACM, New York, NY, USA, 611620.
Carterette, B., Kanoulas, E., and Yilmaz, E. 2012. Evaluating Web Retrieval Effectiveness. In Web Search
Engine Research, D. Lewandowski, Ed. Emerald Books, 105137.
Chakrabarti, K., Chaudhuri, S., and Hwang, S. 2004. Automatic Categorization of Query Results. Procs
of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD04), 755766.
Chan, C.-Y., Jagadish, H. V., Tan, K.-L., Tung, A. K. H., and Zhang, Z. 2006a. Finding k-Dominant Skylines
in High Dimensional Space. In Procs of the 2006 ACM SIGMOD International Conference on Management of
Data (SIGMOD06). ACM, New York, NY, USA, 503514.
152 References
Chan, C.-Y., Jagadish, H. V., Tan, K.-L., Tung, A. K. H., andZhang, Z. 2006b. OnHighDimensional Skylines.
In Procs of the 10th International Conference on Advances in Database Technology (EDBT06). Springer-Verlag,
Berlin, Heidelberg, 478495.
Chang, K. C. and Hwang, S. 2002. Minimal Probing: Supporting Expensive Predicates for Top-k Queries.
In Procs of the 2002 ACM SIGMOD International Conference on Managementt of Data (SIGMOD02). 346357.
Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., and Wu, S.-L. 2011. Intent-Based Diversification of
Web Search Results: Metrics and Algorithms.. Information Retrieval 14, 6, 572592.
Chapelle, O., Metlzer, D., Zhang, Y., and Grinspan, P. 2009. Expected Reciprocal Rank for Graded Rele-
vance. In Procs of the 18th ACM Conference on Information and Knowledge Management (CIKM09). 621630.
Chaudhuri, S. and Gravano, L. 1999. Evaluating Top-k Selection Queries. In Procs of 25th International
Conference on Very Large Data Bases (VLDB99). 397410.
Chen, G. and Kotz, D. 2000. A Survey of Context-Aware Mobile Computing Research. Tech. rep.,
Hanover, NH, USA.
Chen, L. and Pu, P. 2004. Survey of Preference Elicitation Methods. Tech. rep., Swiss Federal Institute
of Technology in Lausanne (EPFL).
Choi, J., Kim, M., and Raghavan, V. V. 2001. Adaptive Feedback Methods in an Extended Boolean Model.
In ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval. New Orleans, LA.
Chomicki, J. 2003. Preference Formulas in Relational Queries. ACMTransactions on Database Systems 28, 4,
427466.
Chomicki, J. 2007. Database Querying Under Changing Preferences. Annual of Mathematics and Artificial
Intelligence 50, 1-2, 79109.
Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. 2003. Skyline with Presorting. Procs of Data Engineering,
International Conference (ICDE03), 717719.
Chowdhury, S., Gibb, F., and Landoni, M. 2011. Uncertainty in Information Seeking and Retrieval: A
Study in an Academic Environment. Information Processing & Management 47, 2, 157175.
References 153
Ciaccia, P. and Torlone, R. 2011. Modeling the Propagation of User Preferences. In Procs of the 30th
International Conference on Conceptual Modeling (ER11). 304317.
Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Bttcher, S., and MacKinnon, I. 2008.
Novelty and Diversity in Information Retrieval Evaluation. In Procs of the 31st Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval (SIGIR08). ACM, New York, NY, USA,
659666.
Cohen, W. W., Schapire, R. E., and Singer, Y. 1999. Learning to Order Things. Journal of Artificial Intelli-
gence Research 10, 243270.
Cooper, W. S. 1968. Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the
Weak Ordering Action of Retrieval Systems. In American Documentation. 3041.
Crawford, D. E. 2006. Supporting Exploratory Search. Communications of ACM 49, 4.
Croft, B. W. and Lafferty, J., Eds. 2003. Language Modeling for Information Retrieval. The Information
Retrieval Series, vol. 13. Springer.
Dakka, W., Ipeirotis, P., and Wood, K. R. 2005. Automatic Construction of Multifaceted Browsing Inter-
faces. In Procs of the 14th ACM International Conference on Information and Knowledge Management (CIKM
05). New York, NY, USA, 768775.
Dash, D., Rao, J., Megiddo, N., Ailamaki, A., and Lohman, G. 2008. Dynamic Faceted Search for Discovery-
Driven Analysis. In Procs of CIKM.
Delgado, J. and Ishii, N. 1999. Memory-Based Weighted-Majority Prediction for Recommender Systems.
desJardins, M., Eaton, E., and Wagstaff, K. L. 2006. Learning User Preferences for Sets of Objects. In Procs
of the 23rd International Conference on Machine Learning (ICML 06). ACM, New York, NY, USA, 273280.
desJardins, M. and Wagstaff, K. 2005. DD-PREF: A Language for Expressing Preferences Over Sets. In
Procs of the 20th national conference on Artificial intelligence (AAAI05). 620626.
Doyle, J. 2004. Prospects for Preferences. Computational Intelligence 20, 2, 111136.
154 References
Fafalios, P., Kitsos, I., Marketakis, Y., Baldassarre, C., Salampasis, M., and Tzitzikas, Y. 2012a. Web Search-
ing with Entity Mining at Query Time. In Procs of the 5th Information Retrieval Facility Conference (IRFC05).
Vienna, Austria.
Fafalios, P., Kitsos, I., and Tzitzikas, Y. 2012b. Scalable, Flexible and Generic Instant Overview Search.
In Procs of the 21st international conference companion on World Wide Web. ACM, 333336.
Fafalios, P., Salampasis, M., and Tzitzikas, Y. 2013. Exploratory Patent Search with Faceted Search and
Configurable Entity Mining. In Procs of the 1st International Workshop on Integrating IR technologies for
Professional Search (ECIR 2013 Workshop). Moscow, Russia.
Fafalios, P. and Tzitzikas, Y. 2013. X-ENS: Semantic Enrichment of Web Search Results at Real-Time.
In Procs of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR13 Demo paper). Dublin, Ireland.
Faulkner, L. 2003. Beyond the Five-User Assumption: Benefits of Increased Sample Sizes in Usability
Testing. Behavior Research Methods, Instruments & Computers 35, 3, 379383.
Ferr, S. and Hermann, A. 2012. Reconciling Faceted Search and Query Languages for the Semantic
Web. International Journal of Metadata, Semantics and Ontologies 7, 1, 3754.
Fishburn, P. 1970. Utility Theory for Decision Making. Wiley, New York.
Fishburn, P. 1999. Preference Structures and their Numerical Representations. Theoretical Computer
Science 217, 359383.
Gadanho, S. C. and Lhuillier, N. 2007. Addressing Uncertainty in Implicit Preferences. In Procs of the
2007 ACM Conference on Recommender Systems (RecSys 07). ACM, New York, NY, USA, 97104.
Georgiadis, P., Kapantaidakis, I., Christophides, V., Nguer, E. M., and Spyratos, N. 2008. Efficient Rewrit-
ing Algorithms for Preference Queries. In Procs of the 24th International Conference on Data Engineering
(ICDE08).
Golfarelli, M., Rizzi, S., and Biondi, P. 2011. myOLAP: An Approach to Express and Evaluate OLAP Pref-
erences. IEEE Transactions Knowledge and Data Engineering 23, 7, 10501064.
References 155
Griffiths, D. 2009. Head First Statistics. Head first. OReilly, Sebastopol, CA.
Hansson, S. O. 2001. Preference Logic. In Handbook of Philosophical Logic, D. Gabbay and F. Guenthner,
Eds. Vol. 4. Kluwer, Chapter 4, 319393.
Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, K.-P. 2002. Finding the Flow in Web
Site Search. Communications of ACM 45, 9, 4249.
Hearst, M. and Pedersen, J. 1996. Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval
Results. InProcs of the 19thAnnual International ACMConference onResearchandDevelopment inInformation
Retrieval, (SIGIR96). Zurich, Switzerland, 7684.
Hearst, M. A. 2006. Clustering versus Faceted Categories for Information Exploration. Communications
of the ACM 49, 4, 5961.
Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl, J. 1999. AnAlgorithmic Framework for Performing
Collaborative Filtering. In Procs of the 22nd Annual International ACM conference on Research and Develop-
ment in Information Retrieval (SIGIR99). ACM Press, New York, NY, USA, 230237.
Hildebrand, M., van Ossenbruggen, J., and Hardman, L. 2006. /facet: A Browser for Heterogeneous
Semantic Web Repositories. In Procs of International Semantic Web Conference, (ISWC06). Athens, GA,
USA, 272285.
Hofmann, T. and Puzicha, J. 1999. Latent Class Models for Collaborative Filtering. InProcs of the Sixteenth
International Joint Conference on Artificial Intelligence (IJCAI99). Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 688693.
Hyvnen, E., Mkel, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M., and Kettula, S. 2005.
MuseumFinland Finnish Museums on the Semantic Web. Journal of Web Semantics 3, 2, 25.
Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2004a. Supporting Top-k Join Queries in Relational
Databases. VLDB Journal 13, 3, 207221.
Ilyas, I. F., Shah, R., Aref, W. G., Vitter, J. S., and Elmagarmid, A. K. 2004b. Rank-Aware Query Opti-
mization. In Procs of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD 04).
203214.
156 References
Inan, H. 2006. Search Analytics: A Guide to Analyzing and Optimizing Website Search Engines. Book Surge
Publishing.
Jrvelin, K. and Keklinen, J. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transac-
tions Information Systems 20, 4, 422446.
Jrvelin, K., Price, S. L., Delcambre, L. M. L., and Nielsen, M. L. 2008. Discounted Cumulated Gain Based
Evaluation of Multiple-Query IR Sessions. In European Conference on Information Retrieval (ECIR08). 415.
Jin, R., Chai, J. Y., and Si, L. 2004. An Automatic Weighting Scheme for Collaborative Filtering. In Procs of
the 27th Annual International ACMConference on Research and Development in Information Retrieval (SIGIR04).
ACM Press, 337344.
Kahn, A. B. 1962. Topological Sorting of Large Networks. Communications of the ACM 5, 11, 558562.
Kki, M. and Aula, A. 2008. Controlling the Complexity in Comparing Search User Interfaces via User
Studies. Information Processing & Management 44, 1, 8291.
Kanoulas, E., Carterette, B., Clough, P., and Sanderson, M. 2011a. Evaluating Multi-Query Sessions. In
Procs of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR11). 10531062.
Kanoulas, E., Carterette, B., Clough, P., and Sanderson, M. 2011b. Session Track 2011 Overview. In Procs
of the Twentieth Text REtrieval Conference Procs (TREC 2011). National Institute of Standards and Technol-
ogy.
Karlson, A. K., Robertson, G. G., Robbins, D. C., Czerwinski, M. P., and Smith, G. R. 2006. FaThumb:
a Facet-Based Interface for Mobile Search.. In Procs of the Conference on Human Factors in Computing
Systems, (CHI06). New York, NY, USA, 711720.
Kashyap, A., Hristidis, V., and Petropoulos, M. 2010. FACeTOR: Cost-DrivenExplorationof Faceted Query
Results. In Procs of the 19th ACM international conference on Information and knowledge management (CIKM
10). ACM, New York, NY, USA, 719728.
References 157
Kazai, G. and Milic-Frayling, N. 2008. Trust, Authority and Popularity in Social Information Retrieval.
In Procs of the 17th ACM conference on Information and knowledge management (CIKM08). ACM, New York,
NY, USA, 15031504.
Keeney, R. L. and Raiffa, H. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John
Wiley & Sons.
Kelly, D. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Founda-
tions and Trends in Information Retrieval 3, 1-2, 1224.
Kelly, D. and Belkin, N. J. 2001. Reading Time, Scrolling and Interaction: Exploring Implicit Sources of
User Preferences for Relevance Feedback. In Procs of the 24th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval (SIGIR01). ACM, New York, NY, USA, 408409.
Kelly, D., Dumais, S., and Pedersen, J. O. 2009. Evaluation Challenges and Directions for Information-
Seeking Support Systems. Computer 42, 3, 6066.
Kelly, D. and Teevan, J. 2003. Implicit Feedback for Inferring User Preference: a Bibliography. SIGIR
Forum 37, 2, 1828.
Kieling, W. 2002. Foundations of Preferences in Database Systems. In Procs of the 28th International
Conference on Very Large Data Bases (VLDB02). VLDB Endowment, 311322.
Kieling, W., Endres, M., and Wenzel, F. 2011a. The Preference SQL System - An Overview. IEEE Data
Engineering Bulletin 34, 2, 1118.
Kieling, W., Hafenrichter, B., 0003, S. F., and Holland, S. 2001. Preference XPATH: AQuery Language for
E-Commerce. In Wirtschaftsinformatik, H. U. Buhl, A. Huther, and B. Reitwiesner, Eds. Physica Verlag /
Springer, 32.
Kieling, W. and Kostler, G. 2002. Preference SQL - Design, Implementation, Experiences. In Procs of
the 28th International Conference on Very Large Data Bases (VLDB02). Hong Kong, China, 9901001.
Kieling, W., Soutschek, M., Huhn, A., Roocks, P., Endres, M., Mandl, S., Wenzel, F., and Zelend, A. 2011b.
Context-Aware Preference Search for Outdoor Activity Platforms. Tech. rep., Institut fur Informatik,
Universitat Augsburg, Augsburg, Germany. November.
158 References
Kitsos, I., Magoutis, K., and Tzitzikas, Y. 2013. Scalable Entity-Based Summarization of Web Xearch
Results Using MapReduce. Distributed and Parallel Databases.
Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., and Riedl, J. 1997. GroupLens:
Applying Collaborative Filtering to Usenet News. Communications of the ACM 40, 3, 7787.
Kopidaki, S., Papadakos, P., and Tzitzikas, Y. 2009. STC+ and NM-STC: Two Novel Online Results Cluster-
ing Methods for Web Searching. In Procs of the 10th International Conference on Web Information Systems
Engineering (WISE09).
Koren, J., Zhang, Y., and Liu, X. 2008. Personalized Interactive Faceted Search. In Procs of the 17th
International Conference on World Wide Web (WWW08). WWW, 477486.
Korfhage, R. R. 1997. Information Storage and Retrieval. John Wiley & Sons.
Kossmann, D., Ramsak, F., and Rost, S. 2002. Shooting Stars in the Sky: An Online Algorithm for Skyline
Queries. In Procs of the 28th International Conference on Very large Data Bases (VLDB02). 275286.
Koutrika, G. and Ioannidis, Y. 2005. Personalized Queries under a Generalized Preference Model. In
Procs of the 21st International Conference onData Engineering (ICDE 05). IEEE Computer Society, Washington,
DC, USA, 841852.
Koutrika, G. and Ioannidis, Y. E. 2004. Personalization of Queries in Database Systems. In Procs of the
20th International Conference on Data Engineering (ICDE 04). 597608.
Kules, B. and Capra, R. 2008. Creating Exploratory Tasks for a Faceted Search Interface. In Workshop on
Computer Interaction and Information Retrieval, (HCIR08 Workshop). 1821.
Kules, B., Capra, R., Banta, M., and Sierra, T. 2009. What do Exploratory Searchers Look at in a Faceted
Search Interface?. In Procs of the 9th ACM/IEEE-CS joint conference on Digital libraries (JCDL09). 313322.
Le Phuoc, D., Parreira, J. X., Reynolds, V., and Hauswirth, M. 2010. RDF On the Go: RDF Storage and
Query Processor for Mobile Devices. In Procs of the 9th International Semantic Web Conference (ISWC10
Posters&Demos).
Lee, J., You, G.-w., and Hwang, S.-w. 2009. Personalized Top-k Skyline Queries in High-Dimensional
Space. Information Systems 34, 1, 4561.
References 159
Levandoski, J. J., Mokbel, M. F., and Khalefa, M. E. 2010. FlexPref: AFramework for Extensible Preference
Evaluation in Database Systems. In Procs of the 26th International Conference on Data Engineering (ICDE10).
828839.
Lewis, D. D. 2001. Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing
Tasks. In Text Retrieval Conference (TREC-10). 286292.
Li, G., Feng, J., Zhou, X., and Wang, J. 2011. Providing Built-In Keyword Search Capabilities in RDBMS.
The VLDB Journal 20, 119.
Lichtenstein, S. and Slovic, P. 2006. The Construction of Preference Thirteenth Ed. Cambridge University
Press.
Lin, X., Yuan, Y., Zhang, Q., and Zhang, Y. 2007. Selecting Stars: the k Most Representative Skyline
Operator. In Procs of the 23th International Conference on Data Engineering (ICDE07).
Linden, G., Hanks, S., and Lesh, N. 1997. Interactive Assessment of User Preference Models: The Au-
tomated Travel Assistant. In Procs of the Sixth International Conference of User Modeling (UM97), C. P. A.
Jameson and C. Tasso, Eds. Springer Wien, 6778.
Lindgaard, G. and Chattratichart, J. 2007. Usability Testing: What Have we Overlooked?. In Procs of the
SIGCHI Conference on Human Factors in Computing Systems (CHI07). ACM, New York, NY, USA, 14151424.
Liu, T.-Y. 2011. Learning to Rank for Information Retrieval. Springer.
Mkel, E., Hyvnen, E., and Saarela, S. 2006. Ontogator - A Semantic Biew-Based Search Engine Service
for Web Applications. In Procs of International Semantic Web Conference (ISWC06). Athens, GA, USA, 847
860.
Mkel, E., Viljanen, K., Lindgren, P., Laukkanen, M., and Hyvnen, E. 2005. Semantic Yellow Page Ser-
vice Discovery: The Veturi Portal. Poster paper at International Semantic Web Conference (ISWC05),
Galway, Ireland.
Manolis, N. and Tzitzikas, Y. 2011. Interactive Exploration of Fuzzy RDF Knowledge Bases. In Procs of
the 8th Extended Semantic Web Conference (ESWC11). Heraklion, Greece.
160 References
Marchionini, G. 2006. Exploratory Search: From Finding to Understanding. Communications of the
ACM 49, 4, 4146.
Meij, E., Mika, P., and Zaragoza, H. 2009. An Evaluation of Entity and Frequency Based Query Com-
pletion Methods. In Procs of the 32nd International ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR09). ACM, 678679.
Melville, P., Mooney, R. J., and Nagarajan, R. 2001. Content-Boosted Collaborative Filtering. In Procs of
the 2001 SIGIR Workshop on Recommender Systems (SIGIR01 Workshop).
Moffat, A. and Zobel, J. 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM
Transactions Information Systems 27, 1, 2:12:27.
Neumann, G. and Schmeier, S. 2012. Exploratory Search on the Mobile Web. In 4th International Confer-
ence on Agents and Artificial Intelligence (ICAART 2012). SciTePress, 110119.
Neves, R. D. S. and Kaci, S. 2010. Combining Totalitarian and Ceteris Paribus Semantics in Database
Preference Queries. Logic Journal of the IGPL 18, 3, 464483.
OBrien, H. L., Toms, E. G., Kelloway, K., and Kelly, E. 2008. Developing and Evaluating a Reliable Measure
of User Engagement. 45, 1, 110.
Oren, E., Delbru, R., and Decker, S. 2006. Extending Faceted Navigation for RDF Data. In Procs of the 5th
Internation Semantic Web Conference (ISWC06). Athens, GA, USA, 559572.
Over, P. 1997. TREC-7 Interactive Track Report. In Procs of Text REtrieval Conference (TREC97). 5764.
Papadakos, P. 2009. Exploratory Web Searching with Dynamic Taxonomies, Results Clustering and Vi-
sualization. In Procs of the 13th European Conference on Digital Libraries Doctoral Consortium (ECDL09 DC).
Corfu, Greece. http://www.ieee-tcdl.org/Bulletin/v6n1/Papadakos/papadakos.html.
Papadakos, P., Armenatzoglou, N., Kopidaki, S., and Tzitzikas, Y. 2012a. On Exploiting Static and Dy-
namically Mined Metadata for Exploratory Web Searching. Knowledge and Information Systems 30, 3,
493525.
References 161
Papadakos, P., Kopidaki, S., Armenatzoglou, N., and Tzitzikas, Y. 2009a. Exploratory Web Searching with
Dynamic Taxonomies and Results Clustering. In Procs of the 13th European Conference on Digital Libraries
(ECDL09).
Papadakos, P., Kopidaki, S., Armenatzoglou, N., and Tzitzikas, Y. 2009b. Exploratory Web Searching with
Dynamic Taxonomies and Results Clustering. In Procs of the 8th Hellenic Data Management Symposium
(HDMS09).
Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., and Tzitzikas, Y. 2008a. Mitos: Design
and Evaluation of a DBMS-Based Web Search Engine. In Procs of the 12th Pan-Hellenic Conference on
Informatics (PCI08). Greece.
Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., and Tzitzikas, Y. 2009c. Object-
Relational Database Representations for Text Indexing. CoRR abs/0906.3112.
Papadakos, P., Tzitzikas, Y., and Zafeiri, D. 2012b. An Interactive Exploratory System with Real-Time
Preference Elicitation. In Procs of the 13th International Conference on Web Information Systems Engineering
(WISE12 Demo Paper).
Papadakos, P., Vasiliadis, G., Theoharis, Y., Armenatzoglou, N., Kopidaki, S., Marketakis, Y., Daskalakis,
M., Karamaroudis, K., Linardakis, G., Makrydakis, G., Papathanasiou, V., Sardis, L., Tsialiamanis, P.,
Troullinou, G., Vandikas, K., Velegrakis, D., and Tzitzikas, Y. 2008b. The Anatomy of Mitos Web Search
Engine. CoRR, Information Retrieval abs/0803.2220. Available at http://arxiv.org/abs/0803.2220.
Papadias, D., Ta, Y., Fu, G., and Seeger, B. 2005. Progressive Skyline Computation in Database Systems.
ACM Transactions on Database Systems 30, 1, 4182.
Peintner, B., Viappiani, P., and Yorke-Smith, N. 2008. Preferences in Interactive Systems: Technical
Challenges and Case Studies. AI Magazine 29, 4, 1324.
Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., andBreuel, T. 2002. Person-
alized Search: A Contextual Computing Approach may Prove a Breakthrough in Personalized Search
Efficiency. Communications of the ACM 45, 9, 5055.
162 References
Pound, J., Paparizos, S., and Tsaparas, P. 2011. Facet Discovery for Structured Web Search: a Query-Log
Mining Approach. In Procs of the 2011 International Conference on Management of Data (SIGMOD11). ACM,
New York, NY, USA, 169180.
Powley, B. and Dale, R. 2007. Evidence-Based Information Extraction for High-Accuracy Citation Ex-
traction and Author Name Recognition. In Procs of the 8th RIAO International Conference on Large-Scale
Semantic Access to Content.
Pu, P. and Chen, L. 2008. User-Involved Preference Elicitation for Product Search and Recommender
Systems. AI Magazine 29, 4, 93103.
Rashid, A. M., Albert, I., Cosley, D., Lam, S. K., Mcnee, S. M., Konstan, J. A., and Riedl, J. 2002. Getting to
Know You: Learning New User Preferences in Recommender Systems. In Procs of the 7th International
Conference on Intelligent User Interfaces (IUI02). ACM Press, New York, NY, USA, 127134.
Reisner, P. 1981. Human Factors Studies of Database Query Languages: A Survey and Assessment. ACM
Computing Surveys 13, 1, 1331.
Robertson, S. 2008. A New Interpretation of Average Precision. In Procs of the 31st Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR08). ACM, New York, NY,
USA, 689690.
Robertson, S. E. and Jones, S. K. 1976. Relevance Weighting of Search Terms. Journal of the American
Society for Information Science 27, 3, 129146.
Robertson, S. E., Kanoulas, E., and Yilmaz, E. 2010. Extending Average Precision to Graded Relevance
Judgments. In Procs of the 33rd International ACM SIGIR Conference on Research and Development in Infor-
mation Retrieval (SIGIR10). ACM, New York, NY, USA, 603610.
Rochio, J. 1971. Relevance Feedback in Information Retrieval. In The SMART Retrieval System, G. Salton,
Ed. Prentice Hall, Englewood Cliffs, NJ, 313323.
Rose, D. E. and Levinson, D. 2004. Understanding User Goals in Web Search. In Procs of the 13th Interna-
tional Conference on World Wide Web (WWW04). ACM, New York, NY, USA, 1319.
References 163
Ross, K. A. 2007. On the Adequacy of Partial Orders for Preference Composition. Tech. rep., In DBRank
Workshop.
Rossi, F., Venable, K. B., and Walsh, T. 2008. Preferences in constraint satisfaction and optimization.
AI Magazine 28, 4.
Roy, S. B. andDas, G. 2009. TRANS: Top-k ImplementationTechniques of MinimumEffort DrivenFaceted
Search For Databases. In Procs of the 15th International Conference on Management of Data (COMAD09),
S. Chawla, K. Karlapalem, and V. Pudi, Eds. Computer Society of India.
Roy, S. B., Wang, H., Das, G., Nambiar, U., and Mohania, M. 2008. Minimum-Effort Driven Dynamic
Faceted SearchinStructured Databases. InProcs of the 17th ACMConference on Information and Knowledge
Management (CIKM08). New York, NY, USA, 1322.
Ruotsalo, T., Athukorala, K., Glowacka, D., Konyushkova, K., Oulasvirta, A., Kaipiainen, S., Kaski, S., and
Jacucci, G. 2013a. Supporting Exploratory Search Tasks with Interactive User Modelling. In Procs of
ASIST 2013, the 76th ASIS&T Annual Meeting.
Ruotsalo, T., Peltonen, J., Eugster, M. J., Gowacka, D., Konyushkova, K., Athukorala, K., Kosunen, I., Reijo-
nen, A., Myllymki, P., Jacucci, G., et al. 2013b. Directing Exploratory Search with Interactive Intent
Modeling. In Procs of the 22nd ACM International Conference on Information and Knowledge Management
(CIKM13).
Sacco, G. 2006a. Some Research Results in Dynamic Taxonomy and Faceted Search Systems. In Procs
of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Workshop on Faceted Search (SIGIR06).
Sacco, G. M. 2006b. Analysis and Validation of Information Access Through Mono, Multidimensional
and Dynamic Taxonomies. In Flexible Query Answering Systems, 7th International Conference (FQAS06).
659670.
Sacco, G. M. and Tzitzikas, Y., Eds. 2009. Dynamic Taxonomies and Faceted Search: Theory, Practise and
Experience. Springer.
Schafer, J. B., Konstan, J. A., and Riedl, J. 2001. E-Commerce Recommendation Applications. Data Mining
and Knowledge Discovery 5, 1-2, 115153.
164 References
Scherer, K. R. 2005. What are Emotions? And How can They be Measured?. Social Science Information 44,
695729.
Schraefel, M. C., Karam, M., and Zhao, S. 2003. mSpace: Interaction Design for User-Determined, Adapt-
able Domain Exploration in Hypermedia. In Procs of Workshop on Adaptive Hypermedia and Adaptive Web
Based Systems (AH03). Nottingham, UK, 217235.
Schuth, A. and Marx, M. 2011. Evaluation Methods for Rankings of Facetvalues for Faceted Search.
In Procs of the Second International Conference on Multilingual and Multimodal Information Access Evaluation
(CLEF11). Springer-Verlag, Berlin, Heidelberg, 131136.
Shawe-Taylor, J., Cancedda, N., Cesa-Bianchi, N., Conconi, A., Gentile, C., Goutte, C., Graepel, T., Li, Y., and
Renders, J.-M. 2002. Kernel Methods for Document Filtering. In The Eleventh Text Retrieval Conference
(TREC 2002), E. Voorhees and L. P. Buckland, Eds. Vol. NIST Special Publication 500-251. Department of
Commerce, National Institute of Standards and Technology.
Shokouhi, M. 2013. Learning to Personalize Query Auto-Completion. In Procs of the 36th International
ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR13). ACM, New York, NY,
USA, 103112.
Shokouhi, M. and Radinsky, K. 2012. Time-Sensitive Query Auto-Completion. In Procs of the 35th Inter-
national ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR12). ACM, New
York, NY, USA, 601610.
Spyratos, N., Sugibuchi, T., and Yang, J. 2011. Personalizing Queries over Large Data Tables. In Procs
of the 15th East-European Conference on Advances in Databases and Information System (ADBIS 2011). Vienna,
Austria.
Stefanidis, K., Drosou, M., and Pitoura, E. 2010. PerK: Personalized Keyword Search in Relational
Databases through Preferences. In Procs of the 14th International Conference on Advances in Database
Technology (EDBT10). 585596.
Stefanidis, K., Koutrika, G., and Pitoura, E. 2011a. A Survey on Representation, Composition and Appli-
cation of Preferences in Database Systems. ACM Transactions on Database Systems 36, 19:119:45.
References 165
Stefanidis, K., Pitoura, E., and Vassiliadis, P. 2011b. Managing Contextual Preferences. Information
Systems 36, 8, 1158 1180.
Tao, Y., Ding, L., Lin, X., and Pei, J. 2009. Distance-Based Representative Skyline. In Procs of the 2009
IEEE International Conference on Data Engineering (ICDE09). IEEE Computer Society, Washington, DC, USA,
892903.
Toms, E. G., OBrien, H. L., Kopak, R. W., and Freund, L. 2005. Searching for Relevance in the Relevance
of Search. In Procs of the 5th International Conference on Conceptions of Library and Information Sciences
(CoLIS05). 5978.
Torlone, R. and Ciaccia, P. 2002. Which are my Preferred Items?. In Workshop on Recommendation and
Personalization in eCommerce, RPEC-2002. Malaga, Spain, 217225.
Tvaroek, M. 2006. Personalized Navigation in the Semantic Web.. In Procs of the 4th International Confer-
ence on Adaptive Hypermedia and Adaptive Web-Based Systems (AH06) (2006-06-27), V. P. Wade, H. Ashman,
and B. Smyth, Eds. Lecture Notes in Computer Science Series, vol. 4018. Springer, 467472.
Tvaroek, M., Barla, M., Frivolt, G., Toma, M., and Bielikov, M. 2008. Improving Semantic Search Via
Integrated Personalized Faceted and Visual Graph Navigation.. In Procs of the 34th Conference on Current
Trends in Theory and Practice of Computer Science (SOFSEM08) (2008-01-09). Lecture Notes in Computer
Science Series, vol. 4910. Springer, 778789.
Tvaroek, M. and Bielikov, M. 2007a. Adaptive Faceted Browser for Navigation in Open Information
Spaces. In Procs of the 16th International Conference on World Wide Web (WWW07). ACM, New York, NY,
USA, 13111312.
Tvaroek, M. and Bielikov, M. 2007b. Personalized Faceted Browsing for Digital Libraries. In Procs of
the 11th European Conference on Digital Libraries (ECDL07). 485488.
Tvaroek, M. and Bielikov, M. 2007c. Personalized Faceted Navigation for Multimedia Collections. In
Procs of the Second International Workshop on Semantic Media Adaptation and Personalization (SMAP07). IEEE
Computer Society, Washington, DC, USA, 104109.
Tvaroek, M. and Bielikov, M. 2007d. Personalized Faceted Navigation in the Semantic Web. Web
Engineering, 511515.
166 References
Tzitzikas, Y., Armenatzoglou, N., andPapadakos, P. Sept. 3, 2008. FleXplorer: AFrameworkfor Providing
Faceted and Dynamic Taxonomy-based Information Exploration. In Procs of 20th International Database
and Expert Systems Application Workshop FIND2008 (DEXA08 FIND Workshop). Torino, Italy, 212216.
Tzitzikas, Y., Kampouraki, M., and Analyti, A. 2013. Curating the Specificity of Ontological Descriptions
under Ontology Evolution. Journal on Data Semantics, 132.
Tzitzikas, Y. and Papadakos, P. 2013. Interactive Exploration of Multi-Dimensional and Hierarchical
Information Spaces with Real-Time Preference Elicitation. Fundamenta Informaticae 122, 4, 357399.
Vee, E., Shanmugasundaram, J., and Amer-Yahia, S. 2009. Efficient Computation of Diverse Query Re-
sults. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 32, 4, 5764.
Wagner, A., Ladwig, G., and Tran, T. 2011. Browsing-Oriented Semantic Faceted Search. In Procs of 22th
International Conference on the Database and Expert Systems Applicatione (DEXA11). 303319.
Wagstaff, K. L., desJardins, M., and Eaton, E. 2010. Modelling and Learning User Preferences Over Sets.
Journal of Experimental & Theoretical Artificial Intelligence 22, 237268.
Wang, J., de Vries, A. P., and Reinders, M. J. T. 2006. Unifying User-Based and Item-Based Collaborative
Filtering Approaches by Similarity Fusion. In Procs of the 29th Annual International ACM Conference on
Research and Development in Information Retrieval (SIGIR06). ACM Press, New York, NY, USA, 501508.
Wasserman, L. 2004. All of Statistics : A Concise Course in Statistical Inference.
Webber, W., Chandar, P., and Carterette, B. 2012. Alternative Assessor Disagreement and Retrieval
Depth. InProcs of the 21st ACMinternational conference onInformationandknowledge management (CIKM12).
ACM, New York, NY, USA, 125134.
Wellman, M. P. and Doyle, J. 1991. Preferential Semantics for Goals. In Procs of the 9th National Conference
on Artificial Intelligence (AAAI91). 698703.
White, R. W., Bennett, P. N., and Dumais, S. T. 2010. Predicting Short-Term Interests Using Activity-
Based Search Context. In Procs of the 19th ACM International Conference on Information and Knowledge
Management (CIKM10). ACM, New York, NY, USA, 10091018.
References 167
White, R. W., Drucker, S. M., Marchionini, G., Hearst, M. A., and Schraefel, M. C. 2007. Exploratory Search
and HCI: Designing and Evaluating Interfaces to Support Exploratory Search Interaction. In Procs of
the Extended Abstracts on Human Factors in Computing Systems (CHI07 EA), M. B. Rosson and D. J. Gilmore,
Eds. ACM, 28772880.
Wilson, M. L. and Schraefel, M. C. 2007. Bridging the Gap: Using IR Models for Evaluating Exploratory
Search Interfaces. In Workshop on Exploratory Search and HCI (SIGCHI2007). ACM.
Xia, T., Zhang, D., and Tao, Y. 2008. On Skylining with Flexible Dominance Relation. In Procs of the 2008
IEEE 24th International Conference on Data Engineering (ICDE08). IEEE Computer Society, Washington, DC,
USA, 13971399.
Yang, Y. and Lad, A. 2009. Modeling Expected Utility of Multi-session Information Distillation. In
Procs of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval
Theory (ICTIR09). Springer-Verlag, Berlin, Heidelberg, 164175.
Yang, Y., Yoo, S., Zhang, J., and Kisiel, B. 2005. Robustness of Adaptive Filtering Methods in a Cross-
Benchmark Evaluation. In Procs of the 28th Annual International ACM Conference on Research and Develop-
ment in Information Retrieval (SIGIR05). ACM, New York, NY, USA, 98105.
Yee, K., Swearingen, K., Li, K., and Hearst, M. 2003. Faceted Metadata for Image Search and Browsing.
Procs of the SIGCHI Conference on Human Factors in Computing Systems (CHI03), 401408.
Yilmaz, E., Shokouhi, M., Craswell, N., and Robertson, S. 2010. Expected Browsing Utility for Web Search
Evaluation. In Procs of the 19th ACM international conference on Information and knowledge management
(CIKM10). 15611564.
Yiu, M. L. and Mamoulis, N. 2007. Efficient Processing of Top-k Dominating Queries on Multi-
Dimensional Data. In Procs of the 33rd International Conference on Very Large Data Bases (VLDB07). VLDB
Endowment, 483494.
Yu, K., Tresp, V., and Yu, S. 2004. A Non-Parametric Hierarchical Bayesian Framework for Information
Filtering. In Procs of the 27th annual International ACM Conference on Research and Development in Informa-
tion Retrieval (SIGIR04). ACM, New York, NY, USA, 353360.
168 References
Zamir, O. and Etzioni, O. 1998. Web Document Clustering: A Feasibility Demonstration. In Procs of the
21th Annual International ACM Conference on Research and Development in Information Retrieval, (SIGIR98).
Melbourne, Australia, 4654.
Zha, H., Zheng, Z., Fu, H., and Sun, G. 2006. Incorporating Query Difference for Learning Retrieval
Functions in World Wide Web Search. In Procs of the 15th ACM International Conference on Information
and Knowledge Management (CIKM06). ACM, New York, NY, USA, 307316.
Zhai, C. and Lafferty, J. 2006. A Risk Minimization Framework for Information Retrieval. Information
Processing and Management 42, 3155.
Zhai, C. X., Cohen, W. W., and Lafferty, J. 2003. Beyond Independent Relevance: Methods and Evaluation
Metrics for Subtopic Retrieval. In Procs of the 26th Annual International ACM SIGIR Conference on Research
and Development in Informaion Retrieval (SIGIR03). ACM, New York, NY, USA, 1017.
Zhang, S., Mamoulis, N., Cheung, D. W., and Kao, B. 2010. Efficient Skyline Evaluation Over Partially
Ordered Domains. Procs of VLDB Endowment 3, 1-2, 12551266.
Zhang, X. and Chomicki, J. 2011. Preference Queries Over Sets. InProcs of the 27th International Conference
on Data Engineering (ICDE11). 10191030.
Zhang, Y. and Koren, J. 2007. Efficient Bayesian Hierarchical User Modeling for Recommendation Sys-
tem. In Procs of the 30th Annual International ACM Conference on Research and Development in Information
Retrieval (SIGIR07). ACM, New York, NY, USA, 4754.
Zigoris, P. and Zhang, Y. 2006. Bayesian Adaptive User Profiling with Explicit & Implicit Feedback. In
Procs of the 15th ACM international Conference on Information and Knowledge Management (CIKM06). ACM,
New York, NY, USA, 397404.
Appendix A
Complete Syntax of Preference Language
In this section we give the complete syntax of the language described in Section 3.1.
stmt ::= scopeTypespec
| facets order : prefer facet F
i
to F
j
| terms order : prefer term t

i
to t
j
| objects order : prefer term t

i
to t
j
| objects order : Pareto setOfFacets

| objects order : ParetoOptimal setOfFacets
| objects order : Priority orderOfFacets
| objects order : Combinational bucketOrderOfFacets
scopeType ::= facets order : | terms order : | objects order :
spec ::= anchorrankSpec
anchor ::= facet F
i
| term t
j
| object o
k
| // the empty string

169
170 Appendix A. Complete Syntax of Preference Language
rankSpec ::= {lexicographic | count | value | indexedBy} {min|max}
| best | worst
| use scoreFunction score() {min|max}
nonEmptyFacetElems ::= F
i
{, F
j
}
setOfFacet ::= {nonEmptyFacetElems}
orderOfFacets ::= < nonEmptyFacetElems >
bucketOrderOfFacets ::= < setOfFacet >
Appendix B
Binary Relations
Here we list several typical properties of binary relations. A binary relation R over a set S is called:
reflexive, if a S, aRa
irreflexive, if a S, (aRa)
symmetric, if a, b S, aRb bRa
asymmetric, if a, b S, aRb (bRa)
antisymmetric, if a, b S, (aRb bRa) a = b
transitive if a, b, c S, (aRb bRc) (aRc)
negatively transitive if a, b, c S, ((aRb) (bRc)) (aRc)
connected (strongly complete or total), if a, b S, (aRb) (bRa) (a = b)
The above properties are not independent. Asymmetry implies irreflexivity, while irreflexivity and
transitivity imply assymetry.
Based on its properties a binary relation is characterized as follows:
A binary relation is a preorder or quasi-order, if it is reflexive and transitive. If it is in addition
antisymmetric, then it is a partial order.
A binary relation is a strict partial order (or irreflexive partial order) if it is irreflexive, assymetric and
transitive.
171
172 Appendix B. Binary Relations
A binary relation is a total order, if it is a strict partial order and it is also connected.
A binary relation is a weak order, if it is a negatively transitive strict partial order.
Appendix C
Acronyms
AI Artificial Intelligence
DB Database
BMO Best Matches Only
DiFEPreKO Difficulty of Formulating Effective Preferences without Knowing the Options
ES Exploratory Search
FDT Faceted and Dynamic Taxonomies
HCI Human Computer Interaction
IIR Interactive Information Retrieval
IIPP Intentional Inconsistent Preferences Percentage
IR Information Retrieval
IS Information System
NIIP Number of Intentional Inconsistent Preferences
NUP Number of Unused Preferences
UI User Interface
UPP Unused Preferences Percentage
WSE Web Search Engine
173
174 Appendix C. Acronyms
175

Papadakos PHD 2013

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Papadakos PHD 2013

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF CRETE

DEPARTMENT OF COMPUTER SCIENCE

(t) for each t T

| // the empty string

stmt | terms order : prefer term t

stmt | objects order : prefer term t

over E consisting of one relationship, specifically R

) to denote the elements of E that participate in R

), in our case Fiat. Given a preference relation R

we will denote its dual order. Com-

and two objects o

over a set of elements E to be

3.3 Syntax to Semantics

be the relative preferences (of the form e

over E, and Policy for inactive elements

//add relative prefs

)) // I contains the inactive elements

= {Porsche Ferrari, Ferrari Lancia, Lancia

are two subsets of E, with wins(s, s

) we will denote the number of times s beats s

3.3. Syntax to Semantics 43

}| 1. Notice that each

having 10 worst terms

) = 10). We can now proceed and define:

, s) and Support(s) > Support(s

, and Policy for inactive elements

(e) stands for e and the narrower elements of e, formally N

that does not define

is dominated by b on A, and thus action b

should not determine the

, by including the newactions

in order to get the B, W

(line 8). Finally, it calls Alg. Apply (line 9).

7: //Part (iii): Derivation of the final bucket order

, Policy) // call to Alg. 1

is anchor of one preference action}

and A Obj s.t. A scope(b) scope(b

. This cannot be true,

will not contain A. Notice that although in

3.3. Syntax to Semantics 51

This means that expansion(e

iff scope(b) scope(b

). We can now define the active scope of a

3.3. Syntax to Semantics 53

belongs to the active scope of two actions b

should belong to the

can belong to the active scope of b

can belong to the active scope of b

(which is acyclic by assumption), it should be result of an inherited action, therefore

belongs. In the latter case it should be

belongs. In the latter cases,

c and e d (illustrated at Figure 3.10 (iii)).

would not be in the active scope of c d

, i.e. without having to compute R

) we check whether the condition of Proposition 4 holds. This

| 1)/2 times. To check the proposition once, we

by applying the reachability algorithm with cost analogous to

, all active scopes of the actions in B.

for the prefer-

3.4.1 Prioritized Composition

Porsche Ferrari, Fiat Korean, Fiat Kia, Japanese

is shown in Figure 3.15. The

| denotes the number of relationships of a taxonomic relation. If labeling