|
Instructor:
|
Rada Mihalcea, Research Park F228, tel: 940-369-7630, email: rada at cs.unt.edu |
|
TA:
|
Iris Gomez-Lopez, Research Park F205, email: ing0001 at unt.edu |
|
Class hours:
|
TTh 12:30-01:50pm
|
|
Instructor office hours:
|
TTh 04:00-05:00pm or by appointment
|
|
TA office hours:
|
MW 8-10am, F205
|
|
Course description:
|
This course will cover traditional material, as well as recent advances in Information Retrieval (IR), the study of indexing, processing, and querying textual data. Basic retrieval models, algorithms, and IR system implementations will be covered. The course will also address more advanced topics in "intelligent" IR, including Natural Language Processing techniques, and "smart" Web agents.
|
|
Date
|
Lecture
|
Reading material
|
NB
|
|
01/20/2009
|
Course overview
(ppt)
|
-
|
-
|
|
01/22/2009
|
Introduction to IR models and methods [ppt]
|
-
|
-
|
|
01/27/2009
|
University closed because of weather conditions.
|
-
|
-
|
|
01/29/2009
|
Short Perl tutorial [ppt]
|
One of the tutorials below [see the "Links" section]
|
-
|
|
02/03/2009
|
Short Perl tutorial [ppt]
|
One of the tutorials below [see the "Links" section]
|
-
|
|
02/05/2009
|
Text processing [ppt]
|
Porter stemmer
Chap.2: The term vocabulary & postings lists
|
-
|
|
02/10/2009
|
Text properties [ppt]
Web Spidering [ppt]
|
Chap.2: The term vocabulary & postings lists
|
Assignment 1 issued
|
|
02/12/2009
|
Web Spidering [ppt]
Practical problems in web spidering [ppt]
|
Chap.5: Index compression, sect.5.1
Chap.20: Web crawling and indexes
Optional reading: Baeza-Yates chapter 6.3
|
-
|
|
02/17/2009
|
Boolean model and extensions [ppt]
|
Chap.1: Boolean retrieval
|
-
|
|
02/19/2009
|
Vector space model [ppt]
|
Chap.6: Scoring, term weighting and the vector space model
|
-
|
|
02/24/2009
|
Vector space model [ppt]
|
Chap.6: Scoring, term weighting and the vector space model
|
-
|
|
02/26/2009
|
Term weighting schemes
|
Chap.6: Scoring, term weighting and the vector space model
[Sparck-Jones] Term weigthing approaches, pg. 323
|
Assignment 2 issued
|
|
03/03/2009
|
Alternative IR models. [ppt]
|
Chap.11: Probabilistic IR
Chap.18: LSA
|
Guest lecture by Samer Hassan
|
|
03/05/2009
|
No class.
|
-
|
-
|
|
03/11/2009
|
Review: exam I preparation
|
All the material studied so far
|
-
|
|
03/13/2009
|
Exam I
|
-
|
-
|
|
03/18/2009
|
Spring break
|
-
|
-
|
|
03/20/2009
|
Spring break
|
-
|
-
|
|
03/24/2009
|
IR evaluation and IR test collections. [ppt]
|
Chap.8: Evaluation in information retrieval
|
-
|
|
03/26/2009
|
Relevance feedback. [ppt]
|
Chap.9: Relevance feedback and query expansion
|
Assignment 2 due.
|
|
03/31/2009
|
Text classification [ppt]
|
[CM] Chapter 13.
|
Assignment 3 issued.
|
|
03/02/2009
|
Text classification [ppt]
|
[CM] Chapter 13.
|
-
|
|
04/07/2009
|
Link analysis.
HITS. PageRank. [ppt]
|
[CM] Chapter 21.
Page L. et. al Page Rank Citation Ranking: Bringing Order to the Web
Also check this page.
|
-
|
|
04/09/2009
|
Topic Sensitive PageRank
|
Haveliwala. "Topic-Sensitive PageRank" [pdf]
|
-
|
|
04/14/2008
|
Question Answering [ppt]
|
Check the TREC Q&A site
|
-
|
|
04/16/2009
|
Cross language Information Retrieval (ppt)
|
Check the Cross Language Evaluation Forum CLEF
|
-
|
|
04/21/2009
|
Keyword Extraction (ppt)
|
-
|
Assignment 3 due
|
|
04/23/2009
|
Automatic Summarization
|
-
|
-
|
|
04/28/2009
|
Wrap-up. Review for second exam.
|
All material studied so far (papers included)
|
-
|
|
04/30/2009
|
Exam II
|
-
|
All material studied so far (papers included)
|
-
|
05/05/2009
|
Project presentations I.
|
-
|
-
|
05/07/2009
|
Project presentations II.
|
-
|
-
|