Details view: Scalable Knowledge Harvesting with High Precision and High Recall

Views
Add
Edit
- Edit page
- Delete page
Share
Login
- Member login
- Register now for a free account
🔎

Scalable Knowledge Harvesting with High Precision and High Recall Position1 #295376

CONTEXT(Help)

OpenSherlock Project »OpenSherlock ProjectOpenSherlock Project☜Fabricating possibly many open source variants along the lines of IBMs Watson. The projects name has changed from SolrDrWatson to SolrSherlock, with thanks to Tom Munnecke. Migrating to OpenSherlock concept where we generalize beyond Solr as the platform core.☜F1CEB7

▲

References »ReferencesReferences☜Links to resources related to the SolrDrWatson project☜59C6EF

▲

Conference Papers »Conference PapersConference Papers☜☜FFB597

■

Scalable Knowledge Harvesting with High Precision and High RecallScalable Knowledge Harvesting with High Precision and High Recall☜☜59C6EF

◄

Leveraging Knowledge Bases in Web Text Processing »Leveraging Knowledge Bases in Web Text ProcessingLeveraging Knowledge Bases in Web Text Processing☜☜FFFACD

Heading

Summary

Click the button to enter task scheduling information

Details

Enter task details

Message text

Select assignee(s)

Due date (click calendar)

RadDatePicker

Open the calendar popup.

Calendar

Title and navigation
Title and navigation
<<	<	July 2025	>	<<

July 2025
	S	M	T	W	T	F	S
27	29	30	1	2	3	4	5
28	6	7	8	9	10	11	12
29	13	14	15	16	17	18	19
30	20	21	22	23	24	25	26
31	27	28	29	30	31	1	2
32	3	4	5	6	7	8	9

Reminder

Ready to post Copy to text

Task assignment(s) have been emailed and cannot now be altered

Lock

Comment graphing options

Choose comments:Comment onlyWhole threadAll comments

Choose location:To a new mapTo this map

New map options

Select map ontology Graph to private map

+Comments (0)

- CommentsAdd a commentNewest firstOldest firstShow threads

+Citations (1)

- CitationsAdd new citationList by: CiterankMap

Link[1] Scalable Knowledge Harvesting with High Precision and High Recall

Author: Ndapandula Nakashole, Martin Theobald, Gerhard Weikum
Publication info: WSDM’11, February 9–12, 2011
Cited by: Jack Park 7:00 PM 6 November 2013 GMT
URL:

http://www.mpi-inf.mpg.de/~nnakasho/papers/prospera-wsdm11.pdf

Excerpt / Summary

Harvesting relational facts from Web sources has received great attention for automatically constructing large knowledge bases. Stateof- the-art approaches combine pattern-based gathering of fact candidates with constraint-based reasoning. However, they still face major challenges regarding the trade-offs between precision, recall, and scalability. Techniques that scale well are susceptible to noisy patterns that degrade precision, while techniques that employ deep reasoning for high precision cannot cope with Web-scale data. This paper presents a scalable system, called PROSPERA, for high-quality knowledge harvesting. We propose a new notion of ngram- itemsets for richer patterns, and use MaxSat-based constraint reasoning on both the quality of patterns and the validity of fact candidates.We compute pattern-occurrence statistics for two benefits: they serve to prune the hypotheses space and to derive informative weights of clauses for the reasoner. The paper shows how to incorporate these building blocks into a scalable architecture that can parallelize all phases on a Hadoop-based distributed platform. Our experiments with the ClueWeb09 corpus include comparisons to the recent ReadTheWeb experiment. We substantially outperform these prior results in terms of recall, with the same precision, while having low run-times.

+About

- About

Entered by:- Jack Park
NodeID: #295376
Node type: Position
Entry date (GMT): 11/6/2013 6:58:00 PM
Last edit date (GMT): 11/6/2013 6:58:00 PM
Show other editors
Incoming cross-relations: 1
Outgoing cross-relations: 0
Average rating: 0 by 0 users

x Select file to upload