Shergill, J. S., Pravin, C. and Ojha, V. ORCID: https://orcid.org/0000-0002-9256-1192
(2021)
Accent and gender recognition from English language speech and audio using signal processing and deep learning.
In: International Conference on Hybrid Intelligent Systems, 14-16 Dec 2020, pp. 62-72.
doi: 10.1007/978-3-030-73050-5_7
Abstract/Summary
This research is concerned with taking user input in the form of speech data to classify and then predict which region of the United Kingdom the user is from and their gender. This research was conducted on regional accents, data preprocessing, Fourier transforms, and deep learning modeling. Due to the lack of publicly available datasets for this type of research, a dataset was created from scratch (12 regions with a 1:1 gender ratio). In this paper, we propose modeling the human’s voice accent and voice gender recognition as a classification task. We used a deep convolution neural network, and experimentally developed an architecture that maximized the classification accuracy of the mentioned tasks simultaneously. We also tested the model on publicly available spoken digit detests. We find that the gender classification is relatively easier to predict with high accuracy than the accent in our proposed multi-class classification model. Accent classification was found difficult because of the regional accent’s overlapping that prevents it from being classified with high accuracy.
Altmetric Badge
Item Type | Conference or Workshop Item (Paper) |
URI | https://reading-clone.eprints-hosting.org/id/eprint/97785 |
Item Type | Conference or Workshop Item |
Refereed | Yes |
Divisions | Interdisciplinary Research Centres (IDRCs) > Centre for the Mathematics of Planet Earth (CMPE) Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science |
Download/View statistics | View download statistics for this item |
Downloads
Downloads per month over past year
University Staff: Request a correction | Centaur Editors: Update this record