You are on page 1of 3

Information Retrieval

Query processing for natural language queries

Supervisor:TheoHuibers

byCristianCioboteaandMartinMyslk

Abstract
This paper is written in the scope of an Information Retrieval Course at the University of
Twente and Wisenose project. It aims at improving query processing in a search engine
focused on children. Children oftentendtoformulatetheirqueriesinnaturallanguageandthe
aim of this paper is to present a technique to reformulate these queries so that the search
engine can return relevant results. Theimplementationandevaluationofthisalgorithmwillbe
discussedwithreferencestootherscientificpapers.

Introduction
Information Retrieval is an important tool in obtaining information from large collections of
data, for example, the Internet. To satisfy the information need, we should take into
consideration many aspects, including the type of user that uses the search engine. In this
sense,abranchofIRhasdeveloped,whichiscalledInformationRetrievalforchildren.
As it was presented in the coursementionedinthe Abstract section,childrenbehavedifferent
when using Information Retrieval, so new challenges arise, such as reformulating the input
query, since many children tend to formulate their search queries in natural language. This
paper will discuss what techniques are used in Natural language processing (NLP) and how
arewegoingtoimplementtheminordertoretrieveagespecificcontentforchildren.Themain
focusofthisarticleisnaturallanguagequeryprocessing.
In otherwords,givenacollectionofdocumentswithvocabularylabels,naturallanguagequery
and specific age, how do wereformulatethequerytoretrievedocumentsrelevanttoauserof
thespecifiedage.

General Plan
There are several steps totake before the finalsetofdocumentscanberetrieved.Inaddition
to that, not only do we have to reformulate the query but alsoretrieveonly those documents
thatmatchthereadingleveloftheuser.
1. obtainsetofchildqueries
It is crucial to obtain a set containing natural language queries in order to better understand
the common behavior of children when formulating the queries. Thistest data will be usedto
testouralgorithm.

2. parsethequeriesandkeywordsidentification
As the queries are obtained, they can be transformed into a list ofkeywords or tokens with
propercase,stemmingandstopwordsremoval.
3. retrievedocuments
Once the query is specified, it is sent to the information retrieval system which retrieves the
matching documents together with their agespecific labels. Those documents with the
appropriatelabelarethenreturnedasthefinalresultset.
4. evaluatetheresults
The final step is to test if the retrieved documents from step 3 answer the query and is
appropriate for the childs reading level. The testwillbe donebymanualmeansofverification
and, depending on the data set and API functionalities,bychoosinganappropriatetraditional
evaluationmetricspresentedinthecourse.

You might also like