You are on page 1of 3

Exercise 1

lettuce onion salad tomato


d1 1 1 1
d2 3 3 3
d3 9
d4 1 6 1 1
d5 1 1 1
d6 1
number of docs which contain terms of interest 5 2 3 5
number of docs in the collection N 6
q1 1
q2 1 2
Exercise 2- simply follow the formula or use online calc (http://calculator.vhex.net/calculator/distance/cosine-distance)
Exercise 3- for each variable, takelog2(total amount of docs (we have 6) divided by amount of times query (in this case lettuce)
IDF LOG2(6/5) LOG2(6/3) LOG2(6/4) LOG2(6/5)
0.263 1.000 1.500 0.263
Exercise 4 Rank = 6 and 3, 4, 5, 2 and 1 ????? :(
cosine
similarity lettuce TF-IDF onion TFIDF
(for T1 t weights weights salad TFIDF - weights tomato TFIDF - we
0.183 1011 0.2630 0.0000 1.5000 0.2630
0.183 3033 0.7890 0.0000 4.5000 0.7890
0.292 9 0.0000 0.0000 0.0000 2.3670
0.773 1611 0.2630 6.0000 1.5000 0.2630
0.183 1101 0.2630 1.0000 0.0000 0.2630
0.292 1000 0.2630 0.0000 0.0000 0.0000

1001 0.0000 0.0000 0.0000 0.2630


1002 0.2630 0.0000 0.0000 0.5260
cosine-distance)
uery (in this case lettuce) appears amongst all docs
LOG2(6/5)
cosine similarity for TF.IDF
0.7568870
0.7568870
1.0000000
0.9827630
0.8257680
1.0000000

1.0000000
1.0000000

You might also like