CSCE 5200 Information Retrieval and Web Search

announcements . syllabus . class notes . assignments . readings . links


announcements



syllabus

Download syllabus as a [pdf]
Instructor: Rada Mihalcea, Research Park F228, tel: 940-369-7630, email: rada at cs.unt.edu
TA: Iris Gomez-Lopez, Research Park F205, email: ing0001 at unt.edu
Class hours: TTh 12:30-01:50pm
Instructor office hours: TTh 04:00-05:00pm or by appointment
TA office hours: MW 8-10am, F205
Course description: This course will cover traditional material, as well as recent advances in Information Retrieval (IR), the study of indexing, processing, and querying textual data. Basic retrieval models, algorithms, and IR system implementations will be covered. The course will also address more advanced topics in "intelligent" IR, including Natural Language Processing techniques, and "smart" Web agents.


class notes

Date Lecture Reading material NB
01/20/2009 Course overview (ppt) - -
01/22/2009 Introduction to IR models and methods [ppt] - -
01/27/2009 University closed because of weather conditions. - -
01/29/2009 Short Perl tutorial [ppt] One of the tutorials below [see the "Links" section] -
02/03/2009 Short Perl tutorial [ppt] One of the tutorials below [see the "Links" section] -
02/05/2009 Text processing [ppt] Porter stemmer
Chap.2: The term vocabulary & postings lists
-
02/10/2009 Text properties [ppt]
Web Spidering [ppt]
Chap.2: The term vocabulary & postings lists Assignment 1 issued
02/12/2009 Web Spidering [ppt]
Practical problems in web spidering [ppt]
Chap.5: Index compression, sect.5.1
Chap.20: Web crawling and indexes
Optional reading: Baeza-Yates chapter 6.3
-
02/17/2009 Boolean model and extensions [ppt] Chap.1: Boolean retrieval -
02/19/2009 Vector space model [ppt] Chap.6: Scoring, term weighting and the vector space model -
02/24/2009 Vector space model [ppt] Chap.6: Scoring, term weighting and the vector space model -
02/26/2009 Term weighting schemes Chap.6: Scoring, term weighting and the vector space model
[Sparck-Jones] Term weigthing approaches, pg. 323
Assignment 2 issued
03/03/2009 Alternative IR models. [ppt] Chap.11: Probabilistic IR
Chap.18: LSA
Guest lecture by Samer Hassan
03/05/2009 No class. - -
03/11/2009 Review: exam I preparation All the material studied so far -
03/13/2009 Exam I - -
03/18/2009 Spring break - -
03/20/2009 Spring break - -
03/24/2009 IR evaluation and IR test collections. [ppt] Chap.8: Evaluation in information retrieval
-
03/26/2009 Relevance feedback. [ppt] Chap.9: Relevance feedback and query expansion
Assignment 2 due.
03/31/2009 Text classification [ppt] [CM] Chapter 13. Assignment 3 issued.
03/02/2009 Text classification [ppt]
[CM] Chapter 13. -
04/07/2009 Link analysis. HITS. PageRank. [ppt] [CM] Chapter 21.
Page L. et. al Page Rank Citation Ranking: Bringing Order to the Web
Also check this page.
-
04/09/2009 Topic Sensitive PageRank Haveliwala. "Topic-Sensitive PageRank" [pdf] -
04/14/2008 Question Answering [ppt]
Check the TREC Q&A site -
04/16/2009 Cross language Information Retrieval (ppt) Check the Cross Language Evaluation Forum CLEF -
04/21/2009 Keyword Extraction (ppt) - Assignment 3 due
04/23/2009 Automatic Summarization - -
04/28/2009 Wrap-up. Review for second exam. All material studied so far (papers included) -
04/30/2009 Exam II - All material studied so far (papers included) -
05/05/2009 Project presentations I. - -
05/07/2009 Project presentations II. - -




assignments



readings

  • (required) Introduction to Information Retrieval
    (online version available) Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze
    Cambridge University Press, 2008.
  • (recommended) Readings in Information Retrieval
    K.Sparck Jones and P. Willett
    Morgan Kaufmann, 1997
  • (recommended) Modern Information Retrieval
    Ricardo Baeza-Yates and Berthier Ribeiro-Neto
    Addison-Wesley, 1999


links