The Survey of English Usage
Annual Report 2004
1. Research
Creating a parsed and searchable diachronic corpus of present-day spoken English (ESRC R000239643)
The DCPSE project was graded 'Outstanding'
by the ESRC, which "indicates that a project has fully met its objectives
and has provided an exceptional research contribution well above
average or very high in relation to the level of award". A full
set of referees' comments can be found here.
In linguistics a distinction is traditionally made between diachronic
and synchronic approaches to the study of language. The first considers
language through time, whereas the latter takes a 'snapshot' look
at language viewed from the present. This dichotomy has recently
been questioned by some linguists, who have argued that the distinction
is an artificial one. They claim that languages change all the time,
even synchronically. As a result of these new attitudes to language
development there is an emerging research impetus in linguistics,
which concerns itself with recent change.
The aim of this project was the construction of a diachronic corpus
of spontaneous spoken English containing directly comparable material
from the London-Lund Corpus (LLC) and the British Component
of the International Corpus of English (ICE-GB). The resource has been made fully
searchable with the International Corpus of English Corpus Utility
Program (ICECUP) exploration software. Main results:
-
We selected a total of 800,000 words of spoken English from comparable categories in the LLC and in ICE-GB (400,000 words from each corpus). The design of these corpora is similar, and it will thus be possible to study the linguistic features of analogous categories of spontaneous spoken English over time. As noted, in each case we have selected matching texts, and we cross-checked the structural markup and tagging in the LLC. We integrated the LLC and ICE-GB material. Very long monologue utterances (over 1,000 words) were broken into segments. These could then be read into ICECUP and indexed in an integrated fashion. ICECUP was originally developed to operate on ICE-GB. We modified it to handle the combined data in DCSPE. We parsed the LLC material. This was carried out automatically to phrasal level and then corrected the trees by hand. This involved a number of technical challenges (for more information, see the final report on the link below). We manually checked the results of the automatic parsing process.
-
We have written documentation for use with the corpus and software.
The new corpus will provide linguists interested in recent linguistic
changes in English with a new, innovative and searchable database
containing spoken English covering a period of 25-30 years. We will
disseminate the corpus via the Survey of English Usage website later
this year. We believe that this resource offers unprecedented possibilities
for new research into changes in English. For full details, see
www.ucl.ac.uk/english-usage/projects/dcpse
A sample corpus is available for download from this site, packaged
with ICECUP.
ICECUP
A pre-release version of ICECUP 3.1 is available, with ICE-GB and
DCPSE sample corpora, for download from our website. See: www.ucl.ac.uk/english-usage/projects/ice-gb/beta
This beta-release is available with a set of release notes which
explains these additions to the software which were not included
in the handbook (Nelson, Wallis and Aarts, 2002).
ICECUP 3.1 is an evolutionary advance on ICECUP 3.0. A user who
has learned how to use ICECUP 3.0 should feel entirely at home with
the new software. The software has been extended in a number of
important respects. The new ICECUP includes:
- An integrated lexicon derived from the corpus
- An integrated grammaticon of nodes
- Simple drag-and-drop statistics Enhanced Fuzzy Tree Fragments,
with:
- logical combinations of lexical wild cards
- logic within nodes
- additional structural features
- improved user interface with 'floating' node properties/inspector window
- An improved FTF wizard
- Manual sentence selection 'query' control
- Improved browsing:
- word wrapping
- context options
- enhanced concordancing
- integrated speech playback (with optional separately available sound files)
- Improved user interface, including:
- new tree editor with zooming and panning
- new quick-find commands
- faster and parallel searching
We are grateful for the comments and suggestions from beta-reviewers.
In many cases these have led to additional facilities in the software.
If you are interested in reviewing the software, please have a look
at our site.
There are currently a small number of outstanding issues with the
software, which will be tackled prior to release later this year,
including some compatibility issues with Windows XP. Naturally,
we have no intention of releasing any version of the software unless
it is extremely stable.
Finally, our programmer, Sean Wallis, is looking to the future in
a new proposal that would extend ICECUP to support cycles of formally
defined experiments in linguistics. ICECUP 3.1 consists of a series
of enhancements to the environment and the expressivity of searches,
but the only specifically new tools are a grammaticon and lexicon.
ICECUP 3.2 would, in our plan, provide a number of inter-related
tools that would allow a researcher to define and carry out any
number of experiments in grammar on the corpus.
ICE-GB
Sound recordings for the 300 spoken texts of ICE-GB (around 75 hours of speech) are available from us by order, as a set of CDs, in three formats (all 16kHz mono).
SET 1 | 12 CDs | one file per text | uncompressed wave files |
SET 2 | 11 CDs | one file per sentence/group | uncompressed wave files |
SET 3 | 5 CDs | one file per sentence/group | compressed (mp3) files |
These are currently available as standalone datasets at an equivalent cost to the computerised ICE-GB data. We also plan to release an integrated package, with sound files, ICE-GB Version II and ICECUP 3.1, later this year. This will permit the playback of sentences or groups of sentences from the corpus. The cost of an advance purchase of sound recordings will be subtracted from the cost of this 'ICE-GB+sound' package.
ICE worldwide
Professor K. K. Luke and his students visited UCL and the Survey
in July 2004. Gerry Nelson gave them a talk about the ICE project.
Gerry Nelson visited Hong Kong in August, in connection with the
ICE-HK project.
Gerry Nelson edited a special volume of World Englishes (May 2004,
Volume 23, Issue 2) on The International Corpus of English. The
table of contents is shown below:
Introduction G. Nelson | |
How to trace structural nativization: particle verbs in world Englishes | E. W. Schneider |
Cultural discourse in the Corpus of East African English and beyond: possibilities and problems of lexical and collocational research in a one million-word corpus. | J. Schmied |
Conceptualization specifics in East African English: quantitative arguments from the ICE-East Africa corpus | C. Haase |
Emphasizer now in colloquial South African English | C. Jeffery and B. van Rooy |
Shared morpho-syntactic features in contact varieties of English: article use | A. Sand |
Negation of lexical have in conversational English | G. Nelson |
Comparing world Englishes: a research guide | H. Fallon. |
For further details, see here.
The English Noun Phrase: an empirical study (AHRB B/RG/AN5308/APN10614)
We are pleased to report that Evelien Keizer’s research on this project will be published by Cambridge University Press in the monograph series Studies in English Language.
The London-Lund Corpus
The sound files of the London-Lund Corpus are now available upon request at the Survey. Please contact Christine Bowles (c.bowles@ucl.ac.uk).
2. Staff
Gerry Nelson was appointed Deputy Director of the Survey.
Christine Bowles joined the Survey on 1 November as part-time administrator.
She works on Mondays and Thursdays. Should you wish to contact her,
her email address is c.bowles@ucl.ac.uk
Sean Wallis continues as Principal Senior Research Fellow. He has
been working on the ESRC project and on the new version of ICECUP.
He is seconded part-time to the Human Resources Department at UCL.
Isaac Hallegua continues as Systems Administrator.
Our principal Research Assistants are Yordanka Kostadinova-Kavalova
and Gabriel Ozón. They were joined for shorter periods by
Dr Dirk Bury, Dr Amela Camdžic, Leslie Kirk, Dr Ann Law and
Kate Scott.
We congratulate Mariangela Spinillo on successfully defending her
PhD thesis, entitled 'Reconceptualising the English determiner class'.
Two people have left the Survey. Marie Gibney has retired as administrator
after working in the Survey for 21 years, first with Sidney Greenbaum,
then with Bas Aarts. She has done a wonderful job running the SEU
for so many years. We held a farewell party for her which was also
attended by many members of the English Department. Toshihiko Kubota
will leave the Survey in April after having spent two years as a
Visiting Scholar at the Survey. We wish him luck returning to his
teaching position in Japan.
3. Publications, conference presentations, talks, theses and other studies using Survey material
Please let us know if you would like us to include your publications based on SEU material. We will appreciate it if you send us offprints of any such publications.
Aarts, Bas (2004) Fuzzy grammar: a reader. Oxford: Oxford
University Press. (Edited with David Denison, Evelien Keizer and
Gergana Popova.)
Aarts, Bas (2004) Fuzzy grammar: the nature of grammatical categories
and their representation. 2004. (With David Denison, Evelien Keizer
and Gergana Popova.) In: Bas Aarts, David Denison, Evelien Keizer
and Gergana Popova Fuzzy grammar: a reader. Oxford: Oxford
University Press.
Aarts, Bas (2004) Modelling linguistic gradience. Studies in
Language 28.1. 1-49.
Aarts, Bas (2004) Grammatici certant. Review Article of Rodney Huddleston
and Geoffrey Pullum (2002) The Cambridge grammar of the English
language. Journal of Linguistics 40.2.
Aarts, Bas (2004) Conceptions of gradience in the history of linguistics.
Language Sciences 26.
Aarts, Bas (2004) Messy or orderly: the nature of grammatical categories.
Plenary lecture at the fiftieth anniversary meeting of the English
Language and Literature Association of Korea, Seoul.
Aarts, Bas (2004) Recent developments in corpus linguistics. Academy
of Korean Studies, Seoul and Pusan National University.
Aarts, Bas (2004) English Language and Linguistics. (With
David Denison and Richard Hogg.) Cambridge University Press. Volumes
8.1 and 8.2.
Aijmer, Karin and Anne-Marie Simon-Vandenbergen (2004) Modal adverbs
of certainty in the ICE-GB corpus. Paper presented at the 25th ICAME
conference, Verona.
De Clerck, Bernard (2004) Imperative subjects in English: a corpus-based
pragmatic analysis. Paper presented at the 25th ICAME conference,
Verona.
Depraetere, Ilse and Ann Verhulst (2004) Must and have
to in ICE-GB: a survey of its meanings. Paper presented at the
25th ICAME conference, Verona.
Fallon, Helen (2004) Comparing world Englishes: a research guide.
In: Gerald Nelson (2004)(ed.) World English 23.2: Special issue
on the International Corpus of English. 309-316.
Gesuato, Sara (2004) To be going, to be doing. Paper presented
at the 25th ICAME conference, Verona.
Gilquin, Gaëtanelle (2004) A corpus-based cognitive study
of the main English causative verbs: a ssyntactic, semantic, lexical
and stylistic approach. Unpublished PhD Thesis. Louvain-la-Neuve:
Centre for English Corpus Linguistics, Université Catholique
de Louvain.
Hasselgård, Hilde (2004) The placement of adjuncts in clause-medial
position. Paper presented at the 25th ICAME conference, Verona.
Jeffery, Chris and Bertus van Rooy (2004) Emphasizer now
in colloquial South African English. In: Gerald Nelson (2004)(ed.)
World English 23.2: Special issue on the International Corpus
of English. 269-280.
Kaltenböck, Gunther (2004) It-extraposition and non-extraposition
in English: a study of syntax in spoken and written English.
Vienna: Braumüller.
Keizer, Evelien (2004) Postnominal PP complements and modifiers:
a cognitive distinction. English Language and Linguistics
8.2. 323-350.
Kirk, John M. Kirk, Jeffrey L. Kallen, Orla Lowry and Anne Rooney
(2004) Standard Irish English: the four hypotheses. Paper presented
at the 25th ICAME conference, Verona.
Kostadinova-Kavalova, Yordanka (2004) Integrated parentheticals
and discourse parentheticals. Paper presented at the 25th ICAME
conference, Verona.
Kostadinova-Kavalova, Yordanka (2004) Niche-filling: completing
a parsed corpus through evolution. Paper presented at the sixth
conference of General Linguistics, Santiago de Compostela, Spain.
(With Gabriel Ozón and Sean Wallis.)
Leech Geoffrey (2004) A new Gray's anatomy of English grammar. Review
article of Rodney Huddleston and Geoffrey K. Pullum (2002) The
Cambridge grammar of the English language. English Language
and Linguistics 8.1, 121-147.
Martinez-Insua, A. E. and I. M. Palacios-Martinez (2003) A corpus-based
approach to non-concord in present day English there constructions.
English Studies 3. 362-383.
Meunier, Fanny (2004) Native corpora, learner corpora and ELT: the
winning team? Paper presented at the 25th ICAME conference, Verona.
Meyer, Charles F. and Hongyin Tao (2004) Grammar, pragmatics, introspection
and corpus linguistics: a critique of Newmeyer's 'Grammar is grammar
and usage is usage'. Paper presented at the 25th ICAME conference,
Verona.
Monschau, Jacqueline, Rolf Kreier and Joybrato Mukherjee (2004)
Syntax and semantics at tone unit boundaries. Anglia: Zeitschrift
für Englische Philologie 121.4. 581-609.
Mukherjee, Joybrato (2004) The state of the art in corpus linguistics:
three book-length perspectives. English Language and Linguistics
8.1, 103-119.
Nelson, Gerald (2004)(ed.) World English 23.2: Special issue
on the International Corpus of English. Oxford: Blackwell Publishers.
Nelson, Gerald (2004) Negation of lexical have in conversational
English. In: Gerald Nelson (2004)(ed.) World English 23.2: Special
issue on the International Corpus of English. 299-308.
Ni, Yibin (2003) Noun phrases in media texts: a quantificational
approach. In: Jean Aitchison and Diana M. Lewis New media language.
London: Routledge. 159-168.
Ozón, Gabriel (2004) Ditransitive alternation: a weighty
account? A corpus-based study using ICECUP. Paper presented at the
25th ICAME conference, Verona.
Ozón, Gabriel (2004) Niche-filling: completing a parsed corpus
through evolution. Paper presented at the sixth conference of General
Linguistics, Santiago de Compostela, Spain. (With Yordanka Kostadinova-Kavalova
and Sean Wallis.)
Paradis, Carita (2004) On the importance of corpora to lexical semantic
theory: adjective-noun combinations in ICE-GB. Paper presented at
the 25th ICAME conference, Verona.
Paradis, Carita (2004) Where does metonymy stop? Senses, facets
and active zones. Metaphor & Symbol, 19.4, 245-264.
Sand, Andrea (2004) Shared morpho-syntactic features in contact
varieties of English: article use. In: Gerald Nelson (2004)(ed.)
World English 23.2: Special issue on the International Corpus
of English. 281-298.
Schmied, Josef (2004) Cultural discourse in the Corpus of East African
English and beyond: possibilities and problems of lexical and collocational
research in a one million-word corpus. In: Gerald Nelson (2004)(ed.)
World English 23.2: Special issue on the International Corpus
of English. 251-260.
Schneider, Edgar (2004) How to trace structural nativization: particle
verbs in world Englishes. In: Gerald Nelson (2004)(ed.) World
English 23.2: Special issue on the International Corpus of English.
227-249.
Spinillo, Mariangela (2004) Reconceptualising the English determiner
class. PhD thesis, English Department, University College London.
Trudgill, Peter (2004) New-dialect formation: the inevitability
of colonial Englishes. Edinburgh: Edinburgh University Press.
Wallis, Sean (2004) ICECUP 3.1: a sneak preview. Paper presented
at the 25th ICAME conference, Verona.
Wallis, Sean (2004) Niche-filling: completing a parsed corpus through
evolution. Paper presented at the sixth conference of General Linguistics,
Santiago de Compostela, Spain. (With Gabriel Ozón and Yordanka
Kostadinova-Kavalova.)
Bas Aarts
Director
January 2005
This page last modified 17 February, 2023 by Survey Web Administrator.