Home | Research | Publications | Teaching | Curriculum Vitae | Links

Keyphrase Extraction in Document Networks

Keyphrases for a document concisely describe the document using a small set of phrases. For example, the keyphrases "social networks" and "interest targeting" quickly provide us with a high-level topic description (i.e., a summary) of a document focused on targeting interest for recommending services such as products and news to users, in the context of social networks. Given today’s very large collections of documents, these keyphrases are extremely important not only for summarizing a document, but also for the search and retrieval of relevant information. However, keyphrases are not always available directly. Instead, they need to be gleaned from the many details in documents. This project addresses the problem of automatic keyphrase extraction from research papers, which are enablers of the sharing and dissemination of scientific discoveries. The goal of the project is to explore accurate approaches that automatically discover and extract keyphrases in documents, using document networks, which will help handle and digest more information in less time during these "big data" times.

Although much research to date has been done on automatic keyphrase extraction, no previous approaches have captured the impact of documents on one another via the citation relation that connects documents in a network. This project will investigate models that take into consideration the linkage between citing and cited documents in a document network and will explore various qualitative and quantitative aspects of the question: "What are the key phrases or concepts in a document?" We will design and develop scalable iterative algorithms that capture different aspects of documents (e.g., topics or concepts), as well as the impact of one document on another (e.g., influence or topic evolution) in a document network. The results of this research will have a direct pipeline to the CiteSeerX digital library.


Invited Talks and Presentations

International Summer Schools

Related Workshops


This research is supported by the National Science Foundation under awards #1423337 and #1422951.