A corpus-based developmental investigation of linguistic complexity in children's writing

[thumbnail of Open Access]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.
| Preview
Available under license: Creative Commons Attribution
[thumbnail of Writing_complexity_preprint.pdf]
Text - Accepted Version
· Restricted to Repository staff only
Restricted to Repository staff only

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Hsiao, Y. orcid id iconORCID: https://orcid.org/0000-0003-3986-5178, Dawson, N., Banerji, N. and Nation, K. (2024) A corpus-based developmental investigation of linguistic complexity in children's writing. Applied Corpus Linguistics, 4 (1). 100084. ISSN 2666-7991 doi: 10.1016/j.acorp.2024.100084

Abstract/Summary

Writing proficiency is associated with linguistic complexity. We used measures of linguistic complexity to investigate the development of children's narrative writing using a large corpus of short stories (N>100,000) written by children aged 5–13 in the UK. Linguistic complexity was assessed using both lexical (N = 30) and syntactic (N = 14) measures. Most measures were associated with age, with writing by older children showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50 % of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across the age range, there was wider variation in syntactic complexity than in lexical diversity, suggesting that syntactic development is subject to more individual differences than the ability to use a diverse set of lexical items. Our findings quantify the nature and content of children's writing through mid-childhood, and we discuss the utility of analysing children's writing using a computational, data-driven approach.

Altmetric Badge

Item Type Article
URI https://reading-clone.eprints-hosting.org/id/eprint/121027
Identification Number/DOI 10.1016/j.acorp.2024.100084
Refereed Yes
Divisions No Reading authors. Back catalogue items
Life Sciences > School of Psychology and Clinical Language Sciences > Department of Psychology
Publisher Elsevier
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar