Spoken Learner Corpus

About the project

The Spoken Learner Corpus (SLC) Project is a collaboration between Trinity and the Centre for Corpus Approaches to Social Science (CASS) at Lancaster University.


The aim of the project has been to create a large corpus of learner (and examiner) speech which can be used in a wide range of research contexts including Second Language Acquisition, language testing, L2 pedagogy and materials development, etc.

The corpus currently stands at over 4 million words. It has been created from recordings of Trinity’s Graded Examinations in Spoken English (GESE) across a range of grades from B1–C2 on the CEFR scale. It represents language used in a variety of speaking tasks which reflect speech events in the world outside the test and covers multiple different language backgrounds.

How can we use the corpus?

As a unique research resource the Trinity Lancaster corpus enables the investigation of learner speech at different proficiency levels (advanced, intermediate and lower intermediate/threshold) and analysis of spoken learner production across different tasks (both monologic and interactive). The corpus samples language of learners with a variety of L1 backgrounds, representing English speakers from Italy, Spain, Mexico, Argentina, Brazil, China, India, Sri Lanka and Russia, which will allow us to report back to those learners on their specific proficiencies and needs for development. It also facilitates the development of locally focused teaching materials and test support activities. See our current range of Corpus-informed teaching resources.

Corpora analysis is likely to become more sophisticated in the future, especially with multiple layers of corpus annotation that allows searching according to different linguistic and background criteria. The Trinity Lancaster Corpus has an aspiration to become a leading research tool in this respect. 

What is a language corpus?

A language corpus is a collection of texts, either written or spoken, which is compiled digitally for the purpose of language analysis. Advances in computer technology mean that it is now possible to create very large corpora (millions of words), store them in digital form, and analyse them automatically or semi-automatically.

The recorded speech is entered and coded with a variety of tags so that users can examine all the texts in the corpus, or a sample of them, in order to determine how language is used in particular contexts (eg in formal or informal situations), by specific groups of people (eg different ages, different mother tongues), for specific purposes (eg for academic purposes, for social purposes), etc. The findings of such analyses can be used for many real-world purposes such as devising teaching materials, constructing tests and other assessment procedures, compiling accurate dictionaries or improving communication amongst different social or cultural groups.

The nature of the GESE test – one which focuses on communicative skills and allows test takers choice in their contributions – means that the Trinity Lancaster Corpus can offer unique insights into how learners choose to manage interaction and build meaning based on their own identify rather than being overly constrained by the test task.

Further information


Featured corpus research

Some featured papers and key articles

  • Gablasova, D. & Brezina, V. (2015) 'Does speaker role affect the choice of epistemic adverbials in L2 speech?: evidence from the Trinity Lancaster Corpus' Yearbook of Corpus Linguistics and Pragmatics. Romero-Trillo, J. (ed.). Springer.

From the Spoken Learner Corpus blog

Many speakers use English as their non-native language (L2) to communicate in a variety of situations: at school, at work or in other everyday situations. As well as needing to master the grammar and vocabulary of the English language, L2 users of English need to know how to react appropriately in different communicative situations. In linguistics, this aspect of language is studied under the label of “pragmatics”.

This briefing offers an exploration of the pragmatic features of L2 speech in the Trinity Lancaster Corpus of spoken L2 production.


Corpus resources

The corpus method in language teaching

Traditionally, language teaching focused on vocabulary and grammar as two separate components of linguistic skills. It was believed that once learners acquire lexical items and internalise grammatical rules, they will be able to combine these components and apply them in communicative situations. However, there is a growing body of evidence that shows that in order to communicate successfully, learners also need to acquire expressions and structures that lie between lexis and grammar, so called lexico-grammar (collocations are one example of a lexico-grammatical feature).

Download an information sheet

Areas of difference between more succesful and less successful GESE candidates

The findings in this section are based on corpus analysis of 14 successful and 14 less successful Grade 7 & 8 candidates from L1 Spanish or Italian background. The successful candidates were defined as those who were awarded mark ‘A’ on their performance while the less successful candidates were awarded mark ‘C’ or ‘D’. All of the findings are based on the Interactive task.

Read more

Learning from assessment

We have produced a range of classroom activity worksheets and accompanying teacher notes based on corpus data consisting of over 1,500 scripts of speaking tests from speakers with a variety of language backgrounds.

Samples from the corpus demonstrate many aspects of the test takers’ communication skills in the speaking tests. These include pragmatic and strategic aspects which are essential to communicate effectively and which are associated with success in the tests. These worksheets use extracts from the corpus and focus on activities to practise these features — which often need much more attention in class than that typically given in coursebooks.

By focusing on these aspects of interaction, the worksheet activities will also help students improve their speaking skills in all situations, including in the world outside the test in both informal and more formal conversations.

Some of the worksheets also make reference to cultural norms. It may be useful for teachers to discuss any differences from the students’ own cultural norms, to enable them to accommodate any differences when taking part in conversations with English speakers.

The following classroom activities are also available within the GESE and ISE resource sections of our website.

Classroom activities for speaking

Back to top