
Day 
Topic 
Chapter 
Slides 
Resources 

IIR01 
We 4/9 
Boolean retrieval 
pdf
html

students
instructors

information retrieval links
search Shakespeare

IIR02 
We 4/9 
Term vocabulary & postings lists

pdf
html

students
instructors

Porter stemmer
credit
card number searches (disabled)

IIR03 
Th 4/10 
Dictionaries & tolerant retrieval 
pdf
html

students
instructors

trie vs hash vs ternary tree
wild card search on Google
edit distance demo
P. Norvig's spell corrector
spelling correction gone wrong (1)
spelling correction gone wrong (2)
freq(misspelling)>freq(correct)
soundex demo



Assignment 1 
pdf




IIR04 
We 4/16 
Index construction 
pdf
html

students
instructors

MapReduce paper
SPIMI paper
Google data center tour

IIR05 
Th 4/17 
Index compression 
pdf
html

students
instructors

variable byte codes
wordaligned binary codes
pos/freq compression


IIR06 
We 4/23 
Scores, weights, vector spaces 
pdf
html

students
instructors

exploring the similarity space
Okapi BM25
Lilian Lee on pivoted document length normalization


Th 4/24 
Practical exercise 


Slides from the practical exercise 1



Assignment 2 
pdf




IIR07 
We 4/30 
Computing scores 
pdf
html

students
instructors

how Google tweaks ranking
interview with Google's Udi Manber
Amit Singhal on Google ranking
SEO perspective: ranking factors
Yahoo BOSS: opening up search
compare Google/Yahoo rankings
eye tracking at Google


Th 5/1 
no class 




IIR08 
We 5/7 
Evaluation & result summaries 
pdf
html

students
instructors

TREC at NIST
v. Rijsbergen's definition of F
A/B testing
too much A/B testing?
early paper on dynamic summaries
search quality evaluation at Google


Th 5/8 
Practical exercise 


Slides from the practical exercise 2


IIR09 
We 5/14 
Rel. feedback, query expansion 
pdf
html

students
instructors

original relevance feedback paper
relevance feedback at Excite
Justin Bieber: related searches fail
WordSpace
automatic word sense discrimination



Assignment 3 
pdf



IIR13 
Th 5/15 
Text classification, Naive Bayes 
pdf
html

students
instructors

Weka (includes Naive Bayes)
Reuters21578
vulgarity text classifier fail


IIR12 
We 5/21 
Language models for IR 
pdf
html

students
instructors

Ponte & Croft paper on LMs in IR
Zhai & Lafferty
Lemur Toolkit


Th 5/22 
Practical exercise 




IIR14 
We 5/28 
Vector space classification 
pdf
html

students
instructors

curseofdim.py
perceptron example
TC overview by Sebastiani
FSNLP (decision trees, perceptrons)
The elements of statistical learning



Assignment 4 
pdf




Th 5/29 
no class 




IIR151 
We 6/4 
Support vector machines 
pdf
html

students
instructors


IIR152 
Th 6/5 
Learning to rank (LTR) 
pdf
html

students
instructors

Microsoft LTR datasets


IIR16 
We 6/11 
Flat clustering 
pdf
html

students
instructors

van Rijsbergen: Cluster Hypothesis
search result clustering: Yippy
search result clustering: Carrot2
search result clustering: Bing
# clusterings: Stirling number


Th 6/12 
Practical exercise 





Assignment 5 
pdf




IIR21 
We 6/18 
Link analysis 
pdf
html

students
instructors

more on PageRank math
Jon Kleinberg (inventor of HITS)
Google bomb (January 2008)
defused Google bomb (June 2009)


Thu 6/19 
no class 





We 6/25 
RDBMS for IR (LK) 

students



Th 6/26 
Practical exercise 


Slides from the practical exercise 5



Practice exam 
pdf





Practice questions 
pdf





We 7/2 
Review practice exam 



IIR19 

Web information retrieval 
pdf
html

students
instructors

how
ads are priced
most expensive keywords
Geico search ca. 2004
geotargeted ad
size
of the web in 2007
size of the web in 2008
ad monitoring at Google
fighting webspam


Th 7/3 
Final 





We 7/9 
Apache SOLR (LK) 

students 


We 7/10 
Plagiarism detection (LK) 

students 



Mo 7/21 
Make up final 




IIR10 

XML retrieval 
pdf
html

students
instructors



IIR11 

Probabilistic information retrieval 
pdf
html

students
instructors


IIR17 

Hierarchical clustering 
pdf
html

students
instructors

GoogleNews precursor: Newsblaster
PDDP algorithm

IIR18 

Latent semantic indexing 
pdf
html

students
instructors

Original LSI paper
Probabilistic LSI
Dimensions of meaning: LSI for words

IIR20 

Crawling 
pdf
html

students
instructors

Mercator web crawler
robots.txt standard
Google data centers



Semantic search
(W. Kessler) 

instructors

CleverSearch
Yummly
SWSE
Ask The Wiki
Evi
PizzaFinder
Semantic Media Wiki



Crosslanguage IR
(C. Lioma) 

instructors
