|
Instructor:
|
Rada Mihalcea, Research Park F228, email: rada at cs.unt.edu |
|
TA:
|
Ziming Zhang, ZimingZhang at my.unt.edu
|
|
Class hours:
|
TTh 12:30-01:50pm
|
|
Instructor office hours:
|
T 03:00-05:00pm or by appointment
|
|
TA office hours:
|
Th 02:30-04:30pm
|
|
Course description:
|
This course will cover traditional material, as well as recent advances in Information Retrieval (IR), the study of indexing, processing, and querying textual data. Basic retrieval models, algorithms, and IR system implementations will be covered. The course will also address more advanced topics in "intelligent" IR, including Natural Language Processing techniques, and "smart" Web agents.
|
|
Date
|
Lecture
|
Reading material
|
NB
|
|
01/15/2013
|
Course overview
(pptx)
|
-
|
-
|
|
01/17/2013
|
Introduction to IR models and methods [pptx]
|
-
|
-
|
|
01/22/2013
|
Text properties [pptx]
|
Chap.2: The term vocabulary & postings lists
|
-
|
|
01/24/2013 (11am)
|
Short Perl tutorial [pptx]
|
One of the tutorials below [see the "Links" section]
|
-
|
|
01/24/2013
|
Text processing [pptx]
|
Porter stemmer
Chap.2: The term vocabulary & postings lists
|
Assignment 1 issued
|
|
01/29/2013
|
Web Spidering [pptx]
Practical problems in web spidering [pptx]
|
Chap.5: Index compression, sect.5.1
Chap.20: Web crawling and indexes
Optional reading: Baeza-Yates chapter 6.3
|
-
|
|
01/30/2013 (11am, F219)
|
Machine Learning and Natural Language Processing at Samsung, Speaker: Po-Hsiang Lai, Gabriel Nicolae
|
-
|
-
|
|
01/31/2013 (11am)
|
Short Perl tutorial [pptx]
|
One of the tutorials below [see the "Links" section]
|
-
|
|
01/31/2013
|
Boolean model and extensions [pptx]
|
Chap.1: Boolean retrieval
|
-
|
|
02/05/2013
|
Vector space model [pptx]
|
Chap.6: Scoring, term weighting and the vector space model
|
-
|
|
02/07/2013
|
Vector space model [pptx]
Term weighting schemes
|
Chap.6: Scoring, term weighting and the vector space model
[Sparck-Jones] Term weigthing approaches, pg. 323
|
Assignment 1 due.
Assignment 2 issued
|
|
02/11/2013 (4:30pm, F223)
|
Information Retrieval and Social Media, Speaker: Douglas Oard
|
-
|
-
|
|
02/12/2013
|
Alternative IR models. [pptx]
|
Chap.11: Probabilistic IR
Chap.18: LSA
|
-
|
|
02/14/2013 (11am)
|
Keyword Extraction (pptx)
|
-
|
-
|
|
02/14/2013
|
IR evaluation and IR test collections. [pptx]
|
Chap.8: Evaluation in information retrieval
|
-
|
|
02/19/2013
|
Exam I preparation
|
-
|
-
|
|
02/21/2013
|
Exam I
|
-
|
-
|
|
02/26/2013
|
Relevance feedback. [pptx]
|
Chap.9: Relevance feedback and query expansion
|
-
|
|
02/28/2013 (11am)
|
Question Answering [ppt]
|
Check the TREC Q&A site
|
-
|
|
02/28/2013
|
Relevance feedback. [pptx]
|
Chap.9: Relevance feedback and query expansion
|
-
|
|
03/05/2013
|
Text classification [pptx]
|
[CM] Chapter 13.
|
Assignment 3 issued.
|
|
03/07/2013
|
Text classification [pptx]
|
[CM] Chapter 13.
|
Assignment 2 due.
|
|
03/08/2013 (11am, F223)
|
Psycholinguistics, Speaker: James Pennebaker
|
-
|
-
|
|
03/12/2013
|
Spring break
|
-
|
-
|
|
03/14/2013
|
Spring break
|
-
|
-
|
|
03/19/2013
|
Link analysis.
HITS. PageRank. [pptx]
|
[CM] Chapter 21.
Page L. et. al Page Rank Citation Ranking: Bringing Order to the Web
Also check this page.
|
-
|
|
03/21/2013
|
Topic Sensitive PageRank
|
Haveliwala. "Topic-Sensitive PageRank" [pdf]
|
Assignment 3 due on 03/22
|
|
03/26/2013
|
Exam II
|
All the material studied so far (papers/seminars included)
|
-
|
|
04/04/2013 (11am, F223)
|
Machine Learning, Speaker: Ray Mooney
|
-
|
-
|
|
04/19/2013
|
Project materials due.
|
-
|
-
|