Professional Documents
Culture Documents
Navigation
ImageCLEF 2016
Revisions
Medical Task
Image Annotation Task
Handw ritten Retrieval
Task
FAQ
LifeCLEF 2016
ImageCLEF 2015
LifeCLEF 2015
ImageCLEF 2014
LifeCLEF 2014
ImageCLEF 2013
ImageCLEF 2012
ImageCLEF 2011
ImageCLEF 2010
ImageCLEF 2009
ImageCLEF 2008
ImageCLEF 2007
Overview
ImageCLEF 2006
Motivation
ImageCLEF 2005
In recent years there has been an increasing interest in digitizing the vast amounts of pre-digital
ImageCLEF 2004
ImageCLEF 2003
age books and documents that exist throughout the w orld. Many of the emerging digitizing
initiatives are aimed at dealing w ith huge collections of handw ritten documents, for w hich automatic
recognition is not yet as mature as for printed text Optical Character Recognition (OCR). Thus, there
Publications
is a need to develop reliable and scalable indexing techniques for manuscripts, targeting its
FAQ
particular challenges. Users for this technology could be libraries w ith fragile historical books, w hich
Resources
for preservation are being scanned to make them available to the public w ithout the risk of further
Events
deterioration. Apart from making the scanned pages available, there is also interest in providing
search facilities so that the people consulting these collections have information access tools that
they are already accustomed to have. The archaic approach is to manually transcribe and then use
standard text retrieval. How ever, this becomes too expensive for large collections. Alternatively,
handw ritten text recognition (HTR) techniques can be used for automatic indexing, w hich requires to
transcribe only a small part of the document for learning the models, or reuse models obtained from
similar manuscripts, thus requiring the least human effort.
Targeted participants
This task tries to allow easy participation from different research communities (by providing
prepared data for each), w ith the aim of having synergies betw een these communities, and
providing different ideas and solutions to the problems being addressed. In particular, w e hope to
have participation at least from groups w orking on: handw ritten text recognition, keyw ord spotting
and (digital) text retrieval.
Task description
The task is aimed at evaluating the performance of indexing and retrieval systems for scanned
handw ritten documents. Specifically, the task targets the scenario of free text search in a document,
in w hich the user w ants to find sections of the document for a given multiple w ord textual query.
The result of the search is not pages, but smaller regions (such as a paragraph), w hich could even
include the end of a page and start of the next. The system should also be able to handle broken
w ords due to hyphenation and w ords that w ere not seen in the data used for learning the
recognition models.
Dataset description
The dataset used in this task is a subset of pages from unpublished manuscripts w ritten by the
philosopher and reformer, Jeremy Bentham, that have been digitised and transcribed under the
Transcribe Bentham project [Causer 2012]. The data is divided into three sets, training (363 pages),
development (433 pages) and test (details w ill be given later). For all the three sets the original
scanned page colour images are provided, each w ith an XML file in Page format that includes some
metadata.
The training set XMLs include the transcripts, polygons surrounding every line that have been
manually checked and also w ord polygons that w ere automatically obtained (these w ould be used
by the groups w orking in query-by-example w ord spotting). The development set additionally has
transcripts, manually checked line and w ord polygons, a list of 510 queries and the respective
retrieval ground truth (obtained using the w ord polygons). The test set w ill have baselines instead
of the polygons surrounding the lines, and the list of queries w ill be different.
To ease participation, the text lines are also provided as extracted images. Additionally for each line,
n-best recognition lists are given (including log-likelihoods and bounding boxes), w ith the aim that
researchers that w ork on related fields (such as text retrieval) but don't normally w ork w ith images
or HTR can easily participate.
[Causer 2012] T. Causer and V. W allace, Building a Volunteer Community: Results and Findings from
Transcribe Bentham, Digital Humanities Quarterly, Vol. 6 (2012),
http://w w w .digitalhumanities.org/dhq/vol/6/2/000125/000125.html
Submission instructions
The submissions w ill be received through the ImageCLEF 2016 system, going to "Runs" and then
"Submit run" and select track "ImageCLEF handw ritten retrieval".
The solution files are in plain text and contain several row s representing the match of a particular
query in a particular image segment. Each row has 4 or more fields separated by w hitespaces: the
first field is an integer representing the query, the second is the integer identifier of the segment
and the third field is a real number representing the relevance confidence score of the match (the
higher the number, the more confident the result is). Finally, the fourth and subsequent fields
represent the location(s) of each of the query w ords in the image segment (one field for each of the
w ords in the query).
For each w ord in the query, its respective field should include the bounding boxes of all the
locations w here that w ord appears in the segment (including hyphened w ords). Bounding boxes are
represented by a string w ith format: "L:W xH+X+Y", w here L is the absolute line number, W and H
are the w idth and height of the box (in pixels), and X and Y are the left-top coordinates of the box
(in pixels), respectively. Finally, since w e aim to include hyphened w ords in the retrieval results, a
location can be divided in tw o different bounding boxes. In such case, the bounding boxes of the
tw o w ord segments are separated by the slash ("/") symbol.
For instance, Figure 2 show s the six lines composing the segment number 7098 from the
development set. In red, w e marked the detected w ords for the query "building necessary" (query
id 110). Observe that the segment includes one instance of the w ord "building" (line 7102) and tw o
instances
of the
respectively). Also note that if the second instance of "necessary" did not appear, then the segment
w ould not be relevant due to the w ord order restriction. An ideal retrieval system should include in
its submission file a row similar to the follow ing one:
110 7098 1.0 7102:312x135+500+3807
7101:329x131+716+3641,7102:170x45+1782+3823/7103:147x73+1822+3903
Figure 2. Result of query 110 ("building necessary") in segment 7098 from the development set.
This submission format allow s to measure the performance of the system at a segment-level and
w ord-level, display and validate the submitted results in a very easy manner, w hile keeping the
submission files small.
Evaluation methodology
The submitted solutions w ill be evaluated using different Information Retrieval metrics. We w ill
compute, at least the follow ing scores (both at a segment and w ord-level):
Global Average Precision (AP): The (Global) Average Precision measures the performance of a
set of retrieval results based on the precision (p) and recall (r) metrics. The "Global" keyw ord
refers to the fact that the precision and recall are computed taking into account the global list of
results, submitted by each participant. If a submitted solution has N row s sorted by its
confidence score, then the Global Average Precision is computed using the follow ing formula:
Mean Average Precision (mAP): This metric differs from the Global Average Precision by the
fact that first computes the Average Precision for each individual query (i.e. precision and recall
only consider results from each particular query), and then computes the mean among all
queries.
Since precision and recall are only defined w hen at least there is one relevant segment, and
some of the queries that w e have included do not have any relevant segment associated, the
mAP w ill only be computed taking into account the set of queries that have at least one relevant
segment (Q R ). Realize that the Global Average Precision does not present this problem as long
as at least one query has at least one relevant segment, w hich is the case.
Global Normalized Discounted Cumulative Gain (NDCG): The Normalized Discounted
Cumulative Gain is a w idely used metric to evaluate the quality of search engines. Here, w e use
to keyw ord "Global" as w e did for the Average Precision: to remark the fact that all the retrieved
results are considered at once, regardless of their query. The follow ing definition of NDCG w ill be
used:
Here, relk is a binary value denoting w hether the result k in the submission file is relevant or
not, according to the ground truth. Z is a normalization factor (sometimes denoted as INDCG)
computed from the ideal submission file, so that the NDCG is a value betw een 0.0 and 1.0.
Mean Normalized Discounted Cumulative Gain (mNDCG): Usually, this metric is referred
simply as Normalized Discounted Cumulative Gain in the literature. How ever, w e added the
"Mean" keyw ord to denote the fact that the NDCG scores are computed individually for each
query and then the average is computed among all queries, as in the distinction betw een Global
and Mean Average Precision. The formula used to compute this metric is simply:
As it happened w ith the Average Precision, the NDCG is not w ell-defined w hen there are not
relevant documents associated to a set of queries, since the normalization factor is 0.0. Thus,
the mNDCG w ill only be computed taking into account the set of queries w ith at least one
relevant segment associated.
In order to compute these scores at a w ord-level, a measure of overlapping degree betw een the
reference and the retrieved bounding boxes needs also to be defined. The overlapping scores
betw een tw o bounding boxes A and B w ill be computed taking into account the areas of their
intersection and their union:
Finally, the softw are used to compute the previous scores from the ground truth and a submission
file w ill be provided to the users in order to validate their systems.
Acknowledgements
This w ork has been partially supported through the EU H2020 grant READ (Recognition and
Enrichment of Archival Documents) (Ref: 674943) and the EU 7th Framew ork Programme grant
tranScriptorium (Ref: 600707).
Organisers
Attachment
Size
q110_s7098_example.png
1.83 MB
ndcg.png
8.57 KB
map.png
6.65 KB
ImageCLEF2016-handw ritten-CFP.txt
2.93 KB