SketchEngine
The Sketch Engine is for anyone wanting to research how words behave. It is a Corpus Query System incorporating word sketches, one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour.
Not open source, not free.
The Sketch Engine is a product of Lexical Computing– a small research company, founded by Adam Kilgarriff in 2003. Itworks at the intersection of corpus and computational linguistics, andis committed to an empiricist approach to the study of language in whichcorpora play a central role: for a very wide range of linguisticquestions, if a suitable corpus is available, it will help ourunderstanding. Its strap line is ‘corpora for all’.
To be able to provide corpus services, LCL needs corpora. As at May2013 we have large corpora for 52 languages. (‘Large’ meaning over 1million words; in most cases corpora are over 100 million words.) Forthe most part these are collected from the web – LCL is a lead player inthe ‘web as corpus’ initiative – and have involved collaborations withlanguage experts for the languages in question, for example:
- with Paul Thompson, Hilary Nesi and colleagues at the Universities of Warwick, Reading, Birmingham and Coventry over the BASE and BAWE corpora of Academic English
- with Silvia Bernardini and colleagues at SSLMIT, University of Bologna, for their very large (ca 2 billion word) web corpora of German, Italian, English, French (DeWaC, ItWaC, UKWaC, FrWaC)
- with Simon Krek and colleagues at Ljubljana University, on corpora, lemmatisation, part-of-speech tagging and the Sketch Grammar for Slovene
- with Phuong Le-Hong for lemmatisation, part-of-speech tagging and the Sketch Grammar for Vietnamese