FreeEed Open-source eDiscovery engine

FreeEed™ is an open source project published by SHMsoft™ and released under the Apache 2.0 License. It is based on Hadoop and other Big Data technologies. FreeEed™ is intended for use in eDiscovery, as an engine and a kernel for the company's search application, or as an investigator's tool. It works on a Windows, Mac, or Linux workstation, on a Hadoop cluster.

 
How it works
 
Processing is organized by the Hadoop framework.  The input data is staged by zipping it in archives of a set size. Then in processing each file is read from the archive, assigned a unique ID, and processed with Tika, which extracts text and metadata. Metadata, text, and the file itself are delivered as processed results.
The current and future building blocks of the system are HDFS, Hadoop, Tika, Lucene, Hive.
 
Indexing
 
Each FreeEed project will create its own Lucene/Solr index for later searches.
 
Output
 
Metadata results are output as a CSV file, while the native files and the extracted text are stored in a zip file(s). The end results can be used for culling and producing native files for legal review.
With the compilation and professional support available for enterprise use, FreeEed brings high performance, scalability and reliability to data processing at a fraction of the cost of proprietary products.
 
Supported file formats
MS Office and other formats  (over 300)
PST processing.
 
Other capabilities
PST processing.
Text extraction
Data culling
Native/Text/Metadata results delivery
Optical Character Recognition (OCR)
Imaging (PDF creation)
Instant search
Deduplication (configurable for emails)
RELATED ARTICLESExplain
OpenSherlock Project
Resources
Harvesting Process Support
FreeEed Open-source eDiscovery engine
HTML Processing
NLP - Natural Language Processing
Topic Modeling
Word Meaning Analysis
ACE - Automatic Content Extraction
Berkeley Data Analytics Stack (BDAS)
Domeo Annotation Toolkit
H2O Big Data Prediction Engine
LanguageTool Style and Grammar Checker
Lingpipe
Link Grammar Parser
nlp2rdf
OpenDMAP
OpenSextant
RelEx Dependency Relationship Extractor
ReVerb (Github)
SIREn Semantic Information Retrieval Engine
SketchEngine
Taming Text Book (code)
Triplify
Graph of this discussion
Enter the title of your article


Enter a short (max 500 characters) summation of your article
Enter the main body of your article
Lock
+Comments (0)
+Citations (0)
+About
Enter comment

Select article text to quote
welcome text

First name   Last name 

Email

Skip