You are on page 1of 5

ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF

Home ImageC LEF 2016

Navigation
ImageCLEF 2016

Handwritten Scanned Document Retrieval Task 2016


View

Revisions

Medical Task
Image Annotation Task
Handw ritten Retrieval
Task
FAQ
LifeCLEF 2016

Welcome to the website of the ImageCLEF 2016 Handwritten Scanned


Document Retrieval Task!
Schedule

ImageCLEF 2015

15.11.2015: registration opens (registration instructions)

LifeCLEF 2015

16.12.2015: release of development data (dataset description)

ImageCLEF 2014

15.03.2016: release of test data

LifeCLEF 2014

01.05.2016: deadline for submission by the participants

ImageCLEF 2013

15.05.2016: release of processed results

ImageCLEF 2012
ImageCLEF 2011
ImageCLEF 2010
ImageCLEF 2009
ImageCLEF 2008
ImageCLEF 2007

25.05.2016: deadline for submission of w orking notes papers by the participants


17.06.2016: notification of acceptance of the w orking notes papers
01.07.2016: camera ready w orking notes papers
05.-08.09.2016: CLEF 2016, vora, Portugal

Overview

ImageCLEF 2006

Motivation

ImageCLEF 2005

In recent years there has been an increasing interest in digitizing the vast amounts of pre-digital

ImageCLEF 2004
ImageCLEF 2003

age books and documents that exist throughout the w orld. Many of the emerging digitizing
initiatives are aimed at dealing w ith huge collections of handw ritten documents, for w hich automatic
recognition is not yet as mature as for printed text Optical Character Recognition (OCR). Thus, there

Publications

is a need to develop reliable and scalable indexing techniques for manuscripts, targeting its

FAQ

particular challenges. Users for this technology could be libraries w ith fragile historical books, w hich

Resources

for preservation are being scanned to make them available to the public w ithout the risk of further

Events

deterioration. Apart from making the scanned pages available, there is also interest in providing
search facilities so that the people consulting these collections have information access tools that
they are already accustomed to have. The archaic approach is to manually transcribe and then use
standard text retrieval. How ever, this becomes too expensive for large collections. Alternatively,
handw ritten text recognition (HTR) techniques can be used for automatic indexing, w hich requires to
transcribe only a small part of the document for learning the models, or reuse models obtained from
similar manuscripts, thus requiring the least human effort.

Targeted participants
This task tries to allow easy participation from different research communities (by providing
prepared data for each), w ith the aim of having synergies betw een these communities, and
providing different ideas and solutions to the problems being addressed. In particular, w e hope to
have participation at least from groups w orking on: handw ritten text recognition, keyw ord spotting
and (digital) text retrieval.

Task description
The task is aimed at evaluating the performance of indexing and retrieval systems for scanned
handw ritten documents. Specifically, the task targets the scenario of free text search in a document,
in w hich the user w ants to find sections of the document for a given multiple w ord textual query.
The result of the search is not pages, but smaller regions (such as a paragraph), w hich could even
include the end of a page and start of the next. The system should also be able to handle broken
w ords due to hyphenation and w ords that w ere not seen in the data used for learning the
recognition models.

Figure 1. Example of a search result in a handw ritten document.


Since the detection of paragraphs is in itself a difficult problem, to simplify things the definition of the
segments to retrieve w ill be simply a concatenation of 6 consecutive lines (from top to bottom and
left to right if there are columns), ignoring the type of line it may be (e.g. title, inserted w ord, etc.).
More precisely, the segments are defined by a sliding w indow that moves one line at a time (thus
neighbouring segments overlap by 5 lines) traversing all the pages in the document, so there are
segments that include lines at the end of a page and at the beginning of the next, and the total
number of segments is #lines_in_collection - 5.
The queries are one or more w ords that have to be searched for in the collection and a segment is
considered relevant if all the query words appear in the given order. Due to the overlap of the
segments, for a given query, several consecutive segments w ill be relevant. In a real application
these consecutive segments w ould be fused into one result, how ever, to simplify the evaluation the
submissions must include all relevant segments. The participants are expected to submit for each
query only for the segments considered relevant a relevancy score and the bounding boxes of all
appearances of the query words within the segment irrespectively if it is or not an instance of the
w ord that made the segment relevant. The queries have been selected such that some key
challenges are included: w ords broken due to end of line hyphenation, queries w ith w ords not seen
in the training data, queries w ith a repeated w ord, and queries w ith zero relevant results. An
optional additional challenge w ill be proximity search, being the difference that a segment w ill be
considered relevant depending on an edit distance w ith the query. More details on this w ill be
included later.
Since the evaluation w ill measure the impact of w ords not seen in the training set, the use of
external data for learning a language model is prohibited. The use of external data for learning
the optical models is allow ed, but w ith the condition that results are also submitted using the same
techniques and only the provided data for learning the optical models. The development set should
not be used as training.
The task w ill not require the detection of the text lines, they w ill be assumed to be already
detected.

Registering for the task and accessing the data


To participate in this task, please register by follow ing the instructions found in the main ImageCLEF
2016 w eb page. The data is available at w w w .prhlt.upv.es/contests/iclef16, although the access
details w ill be available only after registering. The user name and passw ord are found in the
ImageCLEF system going to Collections and then Detail of the c_ic16_handw ritten_retrieval
collection.

Dataset description
The dataset used in this task is a subset of pages from unpublished manuscripts w ritten by the
philosopher and reformer, Jeremy Bentham, that have been digitised and transcribed under the
Transcribe Bentham project [Causer 2012]. The data is divided into three sets, training (363 pages),
development (433 pages) and test (details w ill be given later). For all the three sets the original
scanned page colour images are provided, each w ith an XML file in Page format that includes some
metadata.
The training set XMLs include the transcripts, polygons surrounding every line that have been
manually checked and also w ord polygons that w ere automatically obtained (these w ould be used
by the groups w orking in query-by-example w ord spotting). The development set additionally has
transcripts, manually checked line and w ord polygons, a list of 510 queries and the respective

retrieval ground truth (obtained using the w ord polygons). The test set w ill have baselines instead
of the polygons surrounding the lines, and the list of queries w ill be different.
To ease participation, the text lines are also provided as extracted images. Additionally for each line,
n-best recognition lists are given (including log-likelihoods and bounding boxes), w ith the aim that
researchers that w ork on related fields (such as text retrieval) but don't normally w ork w ith images
or HTR can easily participate.
[Causer 2012] T. Causer and V. W allace, Building a Volunteer Community: Results and Findings from
Transcribe Bentham, Digital Humanities Quarterly, Vol. 6 (2012),
http://w w w .digitalhumanities.org/dhq/vol/6/2/000125/000125.html

Submission instructions
The submissions w ill be received through the ImageCLEF 2016 system, going to "Runs" and then
"Submit run" and select track "ImageCLEF handw ritten retrieval".
The solution files are in plain text and contain several row s representing the match of a particular
query in a particular image segment. Each row has 4 or more fields separated by w hitespaces: the
first field is an integer representing the query, the second is the integer identifier of the segment
and the third field is a real number representing the relevance confidence score of the match (the
higher the number, the more confident the result is). Finally, the fourth and subsequent fields
represent the location(s) of each of the query w ords in the image segment (one field for each of the
w ords in the query).
For each w ord in the query, its respective field should include the bounding boxes of all the
locations w here that w ord appears in the segment (including hyphened w ords). Bounding boxes are
represented by a string w ith format: "L:W xH+X+Y", w here L is the absolute line number, W and H
are the w idth and height of the box (in pixels), and X and Y are the left-top coordinates of the box
(in pixels), respectively. Finally, since w e aim to include hyphened w ords in the retrieval results, a
location can be divided in tw o different bounding boxes. In such case, the bounding boxes of the
tw o w ord segments are separated by the slash ("/") symbol.
For instance, Figure 2 show s the six lines composing the segment number 7098 from the
development set. In red, w e marked the detected w ords for the query "building necessary" (query
id 110). Observe that the segment includes one instance of the w ord "building" (line 7102) and tw o
instances

of the

w ord "necessary", one

of them hyphened (lines

7101 and 7102/7013,

respectively). Also note that if the second instance of "necessary" did not appear, then the segment
w ould not be relevant due to the w ord order restriction. An ideal retrieval system should include in
its submission file a row similar to the follow ing one:
110 7098 1.0 7102:312x135+500+3807
7101:329x131+716+3641,7102:170x45+1782+3823/7103:147x73+1822+3903

Figure 2. Result of query 110 ("building necessary") in segment 7098 from the development set.
This submission format allow s to measure the performance of the system at a segment-level and
w ord-level, display and validate the submitted results in a very easy manner, w hile keeping the
submission files small.

Evaluation methodology
The submitted solutions w ill be evaluated using different Information Retrieval metrics. We w ill
compute, at least the follow ing scores (both at a segment and w ord-level):
Global Average Precision (AP): The (Global) Average Precision measures the performance of a
set of retrieval results based on the precision (p) and recall (r) metrics. The "Global" keyw ord
refers to the fact that the precision and recall are computed taking into account the global list of

results, submitted by each participant. If a submitted solution has N row s sorted by its
confidence score, then the Global Average Precision is computed using the follow ing formula:

Mean Average Precision (mAP): This metric differs from the Global Average Precision by the
fact that first computes the Average Precision for each individual query (i.e. precision and recall
only consider results from each particular query), and then computes the mean among all
queries.

Since precision and recall are only defined w hen at least there is one relevant segment, and
some of the queries that w e have included do not have any relevant segment associated, the
mAP w ill only be computed taking into account the set of queries that have at least one relevant
segment (Q R ). Realize that the Global Average Precision does not present this problem as long
as at least one query has at least one relevant segment, w hich is the case.
Global Normalized Discounted Cumulative Gain (NDCG): The Normalized Discounted
Cumulative Gain is a w idely used metric to evaluate the quality of search engines. Here, w e use
to keyw ord "Global" as w e did for the Average Precision: to remark the fact that all the retrieved
results are considered at once, regardless of their query. The follow ing definition of NDCG w ill be
used:

Here, relk is a binary value denoting w hether the result k in the submission file is relevant or
not, according to the ground truth. Z is a normalization factor (sometimes denoted as INDCG)
computed from the ideal submission file, so that the NDCG is a value betw een 0.0 and 1.0.
Mean Normalized Discounted Cumulative Gain (mNDCG): Usually, this metric is referred
simply as Normalized Discounted Cumulative Gain in the literature. How ever, w e added the
"Mean" keyw ord to denote the fact that the NDCG scores are computed individually for each
query and then the average is computed among all queries, as in the distinction betw een Global
and Mean Average Precision. The formula used to compute this metric is simply:

As it happened w ith the Average Precision, the NDCG is not w ell-defined w hen there are not
relevant documents associated to a set of queries, since the normalization factor is 0.0. Thus,
the mNDCG w ill only be computed taking into account the set of queries w ith at least one
relevant segment associated.

In order to compute these scores at a w ord-level, a measure of overlapping degree betw een the
reference and the retrieved bounding boxes needs also to be defined. The overlapping scores
betw een tw o bounding boxes A and B w ill be computed taking into account the areas of their
intersection and their union:

Finally, the softw are used to compute the previous scores from the ground truth and a submission
file w ill be provided to the users in order to validate their systems.

Acknowledgements
This w ork has been partially supported through the EU H2020 grant READ (Recognition and
Enrichment of Archival Documents) (Ref: 674943) and the EU 7th Framew ork Programme grant
tranScriptorium (Ref: 600707).

Organisers

Joan Puigcerver <joapuipe@prhlt.upv.es>, Universitat Politcnica de Valncia, Spain


Alejandro H. Toselli <ahector@prhlt.upv.es>, Universitat Politcnica de Valncia, Spain
Joan Andreu Snchez <jandreu@prhlt.upv.es>, Universitat Politcnica de Valncia, Spain
Enrique Vidal <evidal@prhlt.upv.es>, Universitat Politcnica de Valncia, Spain
Mauricio Villegas <mauvilsa@prhlt.upv.es>, Universitat Politcnica de Valncia, Spain

Attachment

Size

q110_s7098_example.png

1.83 MB

ndcg.png

8.57 KB

map.png

6.65 KB

ImageCLEF2016-handw ritten-CFP.txt

2.93 KB

ImageCLEF - Cross-language image retrieval evaluations

You might also like