CS 5880: information retrieval
Instructor: Jugal Kalita
Relevant Links
Important: If it asks for a password, use "CS5880" as the username. Add "2012" to the end of the username for the password. Do not use the double quotes.
Class Material
Lectures
- Lecture 1 (1/18): Discussion of syllabus, grading scheme, etc.
- Lectures 2 & 3 (1/23, 1/25): Introduction to Information Retrieval
- Lecture 3 (1/25): Boolean Retrieval. We also read the following papers: Salton, Gerard, Edward Fox and Harry Wu, Extended Boolean Information Retrieval, Communications of the ACM, December 1983, Volume 26, Number 12, pages 1022-1036; Parameterised Compression for Sparse Bitmaps by Moffat and Zobel, 1992; and Fast Phrase Querying with Combined Indexes by Williams et al., ACM Transactions on Information Systems, Vol. 22, No 4, pp 573-594.
- Lectures 4 and 5 (1/30 and 2/2): Term Vocabulary and Postings Lists, Chapter 2 from Information Retrieval by Manning et al. 2008. We also looked at An algorithm for suffix stripping by Porter.
- Lectures 6 and 7 (2/6, 2/8): Dictionaries and Tolerant Retrieval, Chapter 3 from Information Retrieval by Manning et al. 2008. We read about Hashing and B-Trees (Sections 7.3 and 7.4) of Design and Analysis of Algorithms by Levitin. We also looked at Automatic Spelling Correction in Scientific and Scholarly Text by Pollock and Zamora, Communications of ACM, April 1984, Volume 27, No. 4, pp. 358-368; Syntactic Normalization of Twitter Messages by Kaufmann and Kalita, International Conference on Computational Linguistics (ICON) 2010; tRuCasIng by Lita et al. from ACL 2003.
- Lectures 8 and 9 (2/8 and 2/13) : Proposal presentations by students
- Lecture 10 (2/20): Index Construction, Chapter 4 from Introduction to Information Retrieval by Manning et al. 2008
- Lecture 11 (2/22): Continuation of discussion of Chapter 4 from Manning et al.; Discussion of map and reduce functions in Common Lisp, Read paper titled MapReduce: Simplified Data Processing on Large Clusters by Dean and Ghemawat, Communications of ACM, January 2008, Vol. 51, No. 1
- Lectures 12 (2/27): Finish up Chapter 4 from Manning et al. Start Chapter 5: Index Compression from Manning et al.
- Lecture 13 (2/29): Continuation of discussion of Chapter 5 from Manning et al.
- Lecture 14 (3/ 5): Read the paper titled "Inverted Index Compression Using Word-Aligned Binary Codes" by Anh and Moffat, Information Retrieval, Volume 8, pp. 151-166, 2005. Started Chapter 6: Scoring, Term Weighting, and the Vector Space Model from Manning et al.
- Lecture 15 (3/7): Continuation of Chapter 6 from Manning et al.
Start Chapter 7: Computing scores in a complete search system, of Manning et al.
- Lectures 16 and 17 (3/12) and (3/14): Midterm presentations by students. Each student has 15 minutes for presentation of progress so far.
- Lecture 18 (3/19): Finish Chapter 7. Start Chapter 8: "Evaluation of Information Retrieval" of Manning et al.
- Lecture 19 (3/21): Continuation of Chapter 8.
- Lecture 20 (4/2): Chapter 9: "Relevance Feedback" of Manning et al.
- Lecture 21 (4/4): Continuation of Chapter 9.
- Lecture 22 (4/9): Chapter 19: "Web Search Basics" of Manning et al.
- Lecture 23 (4/11):
Continuation of Chapter 19. We also looked at the following papers: Detecting Spam Web Pages through Content Analysis by Ntoulas et al., World Wide Web Conference 2006; and Combating Web Spam with TrustRank by Gyongyi et al., VLDB Conference 2004.
- Lecture 24 (4/16): Chapter 20: Web Crawling and Indices. We also looked at the paper titled "Mercator: A scalable, extensible Web crawler" by Heydon and Najrok, 1999. In addition, we looked at Section 7.4.2.3 titled "A Simple Web Crawler" by Kalita from the book On Perl: Perl for Students and Professionals, 2004.
- Lecture 25 (4/18): Chapter 21: Link Analysis. We looked at Section 4.4 titled "Introduction to Finite Markov Chains" from Statistical Methods in Bioinformatics by Ewens and Grant.
- Lecture 26 (4/23): We continued with the discussion of Markov Chains and started discussing PageRank ranking. We discussed the paper The PageRank Citation Ranking: Bringing Order to the Web by Page, Brin, Motwani and Winograd (1998).
- Lecture 27 (4/25): We finished discussion of the PageRank algorithm. We started discussing Hubs and Authorities. We looked at the paper Authoritative Sources in a Hyperlinked Environment by Kleinberg, Journal of the ACM, 1999.
- Lecture 28 (4/30): We finished discussion of the paper by Kleinberg.