SketchEngine

The Sketch Engine is for anyone wanting to research how words behave. It is a Corpus Query System incorporating word sketches, one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour.

 
Not open source, not free.
 

 The Sketch Engine is a product of Lexical Computing– a small research company, founded by Adam Kilgarriff in 2003. Itworks at the intersection of corpus and computational linguistics, andis committed to an empiricist approach to the study of language in whichcorpora play a central role: for a very wide range of linguisticquestions, if a suitable corpus is available, it will help ourunderstanding. Its strap line is ‘corpora for all’.

To be able to provide corpus services, LCL needs corpora. As at May2013 we have large corpora for 52 languages. (‘Large’ meaning over 1million words; in most cases corpora are over 100 million words.) Forthe most part these are collected from the web – LCL is a lead player inthe ‘web as corpus’ initiative – and have involved collaborations withlanguage experts for the languages in question, for example:

  • with Paul Thompson, Hilary Nesi and colleagues at the Universities of Warwick, Reading, Birmingham and Coventry over the BASE and BAWE corpora of Academic English
  • with Silvia Bernardini and colleagues at SSLMIT, University of Bologna, for their very large (ca 2 billion word) web corpora of German, Italian, English, French (DeWaC, ItWaC, UKWaC, FrWaC)
  • with Simon Krek and colleagues at Ljubljana University, on corpora, lemmatisation, part-of-speech tagging and the Sketch Grammar for Slovene
  • with Phuong Le-Hong for lemmatisation, part-of-speech tagging and the Sketch Grammar for Vietnamese 
RELATED ARTICLESExplain
OpenSherlock Project
Resources
Harvesting Process Support
SketchEngine
HTML Processing
NLP - Natural Language Processing
Topic Modeling
Word Meaning Analysis
ACE - Automatic Content Extraction
Berkeley Data Analytics Stack (BDAS)
Domeo Annotation Toolkit
FreeEed Open-source eDiscovery engine
H2O Big Data Prediction Engine
LanguageTool Style and Grammar Checker
Lingpipe
Link Grammar Parser
nlp2rdf
OpenDMAP
OpenSextant
RelEx Dependency Relationship Extractor
ReVerb (Github)
SIREn Semantic Information Retrieval Engine
Taming Text Book (code)
Triplify
Graph of this discussion
Enter the title of your article


Enter a short (max 500 characters) summation of your article
Enter the main body of your article
Lock
+Comments (0)
+Citations (0)
+About
Enter comment

Select article text to quote
welcome text

First name   Last name 

Email

Skip