Pavlopoulou, N., Abushwashi, A., Stahl, F. ORCID: https://orcid.org/0000-0002-4860-0203 and Scibetta, V.
(2017)
A text mining framework for Big Data.
Expert Update, 17 (1).
ISSN 1465-4091
(Special Issue on the 1st BCS SGAI Workshop on Data Stream Mining Techniques and Applications)
Abstract/Summary
Text Mining is the ability to generate knowledge (insight) from text. This is a challenging task, especially when the target text databases are very large. Big Data has attracted much attention lately, both from academia and industry. A number of distributed databases, search engines and frameworks have been developed to handle the memory and time constraints, which are required to process a large amount of data. However, there is no open-source end-to-end framework that can combinenearreal-timeandbatchprocessingofingestedbigtextualdataalongwith user-defined options and provision of specific, reliable insight from the data. This is important as this way new unstructured information is made accessible in near real-time, more personalised customer products can be created and novel unusual patterns can be found and actioned on quickly. This work focuses on a proprietary complete near real-time automated classification framework for unstructured data with the use of Natural Language Processing and Machine Learning algorithms on Apache Spark. The evaluation of our framework shows that it achieves a comparable accuracy with respect to some of the best approaches presented in the literature.
Item Type | Article |
URI | https://reading-clone.eprints-hosting.org/id/eprint/70108 |
Item Type | Article |
Refereed | Yes |
Divisions | Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science |
Publisher | BCS Specialist Group on Artifical Intelligence |
Download/View statistics | View download statistics for this item |
Downloads
Downloads per month over past year
University Staff: Request a correction | Centaur Editors: Update this record