Identifying problematic classes in text classification

Download

Full text not archived in this repository.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Roberts, P., Howroyd, J., Mitchell, R. and Ruiz, V. (2010) Identifying problematic classes in text classification. In: 9th IEEE International Conference on Cybernetic Intelligent Systems, 1-2 Sept 2010, Reading, UK, pp. 136-141.

Abstract/Summary

Real-world text classification tasks often suffer from poor class structure with many overlapping classes and blurred boundaries. Training data pooled from multiple sources tend to be inconsistent and contain erroneous labelling, leading to poor performance of standard text classifiers. The classification of health service products to specialized procurement classes is used to examine and quantify the extent of these problems. A novel method is presented to analyze the labelled data by selectively merging classes where there is not enough information for the classifier to distinguish them. Initial results show the method can identify the most problematic classes, which can be used either as a focus to improve the training data or to merge classes to increase confidence in the predicted results of the classifier.

Additional Information	IEEE Conference ID Number: 17717
Item Type	Conference or Workshop Item (Paper)
URI	https://reading-clone.eprints-hosting.org/id/eprint/17362
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Additional Information	IEEE Conference ID Number: 17717
Download/View statistics	View download statistics for this item

Related URLs

Deposit Details

Date Deposited:	01 Feb 2011 11:50	Date item deposited into CentAUR
Last Modified:	09 Jul 2021 08:08	Date item last modified

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar