Search from over 60,000 research works

Advanced Search

Automatic keyphrase extraction: a comparison of methods

Full text not archived in this repository.
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Hussey, R., Williams, S. and Mitchell, R. (2012) Automatic keyphrase extraction: a comparison of methods. In: eKNOW, Proceedings of The Fourth International Conference on Information Process, and Knowledge Management , Valencia, Spain, pp. 18-23.

Abstract/Summary

There are many published methods available for creating keyphrases for documents. Previous work in the field has shown that in a significant proportion of cases author selected keyphrases are not appropriate for the document they accompany. This requires the use of such automated methods to improve the use of keyphrases. Often the keyphrases are not updated when the focus of a paper changes or include keyphrases that are more classificatory than explanatory. The published methods are all evaluated using different corpora, typically one relevant to their field of study. This not only makes it difficult to incorporate the useful elements of algorithms in future work but also makes comparing the results of each method inefficient and ineffective. This paper describes the work undertaken to compare five methods across a common baseline of six corpora. The methods chosen were term frequency, inverse document frequency, the C-Value, the NC-Value, and a synonym based approach. These methods were compared to evaluate performance and quality of results, and to provide a future benchmark. It is shown that, with the comparison metric used for this study Term Frequency and Inverse Document Frequency were the best algorithms, with the synonym based approach following them. Further work in the area is required to determine an appropriate (or more appropriate) comparison metric.

Additional Information ISBN 9781612081816
Item Type Conference or Workshop Item (Paper)
URI https://reading-clone.eprints-hosting.org/id/eprint/27770
Item Type Conference or Workshop Item
Refereed Yes
Divisions Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Uncontrolled Keywords Term Frequency, Inverse Document Frequency, C-Value, NC-Value, Synonyms, Comparisons, Automated Keyphrase Extraction, Document Classification
Additional Information ISBN 9781612081816
Publisher Statement Publisher makes available at http://www.thinkmind.org/index.php?view=article&articleid=eknow_2012_1_40_60072 as stated at http://www.iaria.org/conferences2012/eKNOW12.html
Download/View statistics View download statistics for this item

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar