Information Retrieval

Sommersemester 2014
Hinrich Schütze, Heike Adel, Sascha Rothe
We 12:15-13:45, L155
Th 12:15-13:45, L155

Downloads

All slides (including pdfs and sources) (not included: semantic search and multilingual IR: see below)

Textbook

IIR: Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Cambridge University Press, 2008. Web publication at http://informationretrieval.org

Assignments and solutions

assignment 1, solution 1

assignment 2, solution 2

assignment 3, solution 3

assignment 4, solution 4

assignment 5, solution 5

practice exam, solution

practice questions, solution

Schedule and Resources

Day Topic Chapter Slides Resources
IIR01 We 4/9 Boolean retrieval pdf html students
instructors
information retrieval links
search Shakespeare
IIR02 We 4/9 Term vocabulary & postings lists
pdf html students
instructors
Porter stemmer
credit card number searches (disabled)
IIR03 Th 4/10 Dictionaries & tolerant retrieval pdf html students
instructors
trie vs hash vs ternary tree
wild card search on Google
edit distance demo
P. Norvig's spell corrector
spelling correction gone wrong (1)
spelling correction gone wrong (2)
freq(misspelling)>freq(correct)
soundex demo
Assignment 1 pdf
IIR04 We 4/16 Index construction pdf html students
instructors
MapReduce paper
SPIMI paper
Google data center tour
IIR05 Th 4/17 Index compression pdf html students
instructors
variable byte codes
word-aligned binary codes
pos/freq compression
IIR06 We 4/23 Scores, weights, vector spaces pdf html students
instructors
exploring the similarity space
Okapi BM25
Lilian Lee on pivoted document length normalization
Th 4/24 Practical exercise Slides from the practical exercise 1
Assignment 2 pdf
IIR07 We 4/30 Computing scores pdf html students
instructors
how Google tweaks ranking
interview with Google's Udi Manber
Amit Singhal on Google ranking
SEO perspective: ranking factors
Yahoo BOSS: opening up search
compare Google/Yahoo rankings
eye tracking at Google
Th 5/1 no class
IIR08 We 5/7 Evaluation & result summaries pdf html students
instructors
TREC at NIST
v. Rijsbergen's definition of F
A/B testing
too much A/B testing?
early paper on dynamic summaries
search quality evaluation at Google
Th 5/8 Practical exercise Slides from the practical exercise 2
IIR09 We 5/14 Rel. feedback, query expansion pdf html students
instructors
original relevance feedback paper
relevance feedback at Excite
Justin Bieber: related searches fail
WordSpace
automatic word sense discrimination
Assignment 3 pdf
IIR13 Th 5/15 Text classification, Naive Bayes pdf html students
instructors
Weka (includes Naive Bayes)
Reuters-21578
vulgarity text classifier fail
IIR12 We 5/21 Language models for IR pdf html students
instructors
Ponte & Croft paper on LMs in IR
Zhai & Lafferty
Lemur Toolkit
Th 5/22 Practical exercise
IIR14 We 5/28 Vector space classification pdf html students
instructors
curseofdim.py
perceptron example
TC overview by Sebastiani
FSNLP (decision trees, perceptrons)
The elements of statistical learning
Assignment 4 pdf
Th 5/29 no class
IIR15-1 We 6/4 Support vector machines pdf html students
instructors
IIR15-2 Th 6/5 Learning to rank (LTR) pdf html students
instructors
Microsoft LTR datasets
IIR16 We 6/11 Flat clustering pdf html students
instructors
van Rijsbergen: Cluster Hypothesis
search result clustering: Yippy
search result clustering: Carrot2
search result clustering: Bing
# clusterings: Stirling number
Th 6/12 Practical exercise
Assignment 5 pdf
IIR21 We 6/18 Link analysis pdf html students
instructors
more on PageRank math
Jon Kleinberg (inventor of HITS)
Google bomb (January 2008)
defused Google bomb (June 2009)
Thu 6/19 no class
We 6/25 RDBMS for IR (LK) students
Th 6/26 Practical exercise Slides from the practical exercise 5
Practice exam pdf
Practice questions pdf
We 7/2 Review practice exam
IIR19 Web information retrieval pdf html students
instructors
how ads are priced
most expensive keywords
Geico search ca. 2004
geo-targeted ad
size of the web in 2007
size of the web in 2008
ad monitoring at Google
fighting webspam
Th 7/3 Final
We 7/9 Apache SOLR (LK) students
We 7/10 Plagiarism detection (LK) students
Mo 7/21 Make up final
IIR10 XML retrieval pdf html students
instructors
IIR11 Probabilistic information retrieval pdf html students
instructors
IIR17 Hierarchical clustering pdf html students
instructors
GoogleNews precursor: Newsblaster
PDDP algorithm
IIR18 Latent semantic indexing pdf html students
instructors
Original LSI paper
Probabilistic LSI
Dimensions of meaning: LSI for words
IIR20 Crawling pdf html students
instructors
Mercator web crawler
robots.txt standard
Google data centers
Semantic search (W. Kessler) instructors
CleverSearch
Yummly
SWSE
Ask The Wiki
Evi
PizzaFinder
Semantic Media Wiki
Cross-language IR (C. Lioma) instructors