XClose

UCL Psychology and Language Sciences

Home
Menu

LLSD Research Tools

Three versions of the corpus are available below:

  1. Unigram (The most basic version, with OLD20 (Orthographical Neighbourhood Density)
  2. Lemmatised and Part Of Speech Tagged (Useful for finding the frequency of lemmas and their relative forms and Part of Speech)
  3. Bigram (Useful for collocation frequency and identifying compounds)

[Feel free to email kevin.tang.10@ucl.ac.uk for the details of these different versions]

  • Wuggy Brazilian Portuguese Module: Pseudo-word generator for Brazilian Portuguese using SUBTLEX-PT-BR word list. [Coming soon]
 
  • FindUniqueCharacters is a very short script to tell you which unique characters you have in a file. (Useful for checking whether there's anything missing from the transcription convention you have, for example.) This .exe requires .Net (Windows computers will already have it, other OSs can use Mono). [Download it here] [Creator: Elizabeth Eden, elizabeth.eden.11@ucl.ac.uk]
  • LatexifyUnicodeIPA takes a file with a list of words in Unicode and outputs a file with them in TIPA (Latex format), or vice versa. It can handle diacritics and other multi-character Unicode formats. It currently cannot handle multiple words on the same line. This .exe requires .Net (Windows computers will already have it, other OSs can use Mono). [Download it here] [Creator: Elizabeth Eden, elizabeth.eden.11@ucl.ac.uk]
  • CheckPinyin takes an input file with a list of words (separated by spaces if more than one per line) and checks that each one can be found in a reference file, ignoring case. (As the name suggests, written for Pinyin.) It outputs 2 files: a list of valid and a list of invalid lines. [Download it here] [Creator: Elizabeth Eden, elizabeth.eden.11@ucl.ac.uk]
  • Cleaning tool for Hayes' Phonotactic Learner [Coming soon]