Inductive logic programming

Inductive logic programming (ILP) is a subfield of machine learning which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Inductive logic programming (ILP) is a subfield of machine learning which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Schema: positive examples + negative examples + background knowledge => hypothesis.

Inductive logic programming is particularly useful in bioinformatics and natural language processing. The term Inductive Logic Programmingwas first introduced[1] in a paper by Stephen Muggleton in 1991.[2] The term "inductive" here refers to philosophical (i.e. suggesting a theory to explain observed facts) rather than mathematical (i.e. proving a property for all members of a well-ordered set) induction.

Contents

  [hide
  • 1 Formal definition
  • 2 Example
  • 3 Implementations
  • 4 See also
  • 5 References
  • 6 Further reading

Formal definition[edit]

The background knowledge is given as a logical proposition B, commonly in the form of Horn clauses used in logic programming. Thepositive and negative examples are given as a conjunction F^+ and F^- of unnegated and negated ground literals, respectively. A hypothesis h is a logical proposition satisfying the following requirements. [3]

Necessity: B  \not\models  F^+
Sufficiency: B \land h  \models  F^+
Weak consistency: B \land h  \not\models  \textit{false}
Strong consistency: B \land h \land F^-  \not\models  \textit{false}

"Necessity" does not impose a restriction on h, but forbids any generation of a hypothesis as long as the positive facts are explainable without it. "Sufficiency" requires any generated hypothesis h to explain all positive examples F^+. "Weak consistency" forbids generation of any hypothesis h that contradicts the background knowledge B. "Strong consistency" also forbids generation of any hypothesis h that is inconsistent with the negative examples F^-, given the background knowledge B; it implies "Weak consistency"; if no negative examples are given, both requirements coincide. Džeroski [4] requires only "Sufficiency" (called "Completeness" there) and "Strong consistency".

Example[edit]

Assumed family relations in section "Example"

The following well-known example about learning definitions of family relations uses the abbreviations \textit{par}: \textit{parent}\textit{fem}: \textit{female}\textit{dau}: \textit{daughter}g:\textit{George}h:\textit{Helen}m:\textit{Mary}t:\textit{Tom}n:\textit{Nancy}, and e:\textit{Eve}. It starts from the background knowledge (cf. picture)

\textit{par}(h,m) \land \textit{par}(h,t) \land \textit{par}(g,m) \land \textit{par}(t,e) \land \textit{par}(n,e) \land \textit{fem}(h) \land \textit{fem}(m) \land \textit{fem}(n) \land \textit{fem}(e),

the positive examples

\textit{dau}(m,h) \land \textit{dau}(e,t),

and the trivial proposition \textit{true} to denote the absence of negative examples.

Plotkin's [5][6] "relative least general generalization (rlgg)" approach to inductive logic programming shall be used to obtain a suggestion about how to formally define the daughter relation \textit{dau}.

This approach uses the following steps.

  • Relativize each positive example literal with the complete background knowledge:
    • \textit{dau}(m,h) \leftarrow \textit{par}(h,m) \land \textit{par}(h,t) \land \textit{par}(g,m) \land \textit{par}(t,e) \land \textit{par}(n,e) \land \textit{fem}(h) \land \textit{fem}(m) \land \textit{fem}(n) \land \textit{fem}(e)
    • \textit{dau}(e,t) \leftarrow \textit{par}(h,m) \land \textit{par}(h,t) \land \textit{par}(g,m) \land \textit{par}(t,e) \land \textit{par}(n,e) \land \textit{fem}(h) \land \textit{fem}(m) \land \textit{fem}(n) \land \textit{fem}(e),
  • Convert into clause normal form:
    • \textit{dau}(m,h) \lor \lnot \textit{par}(h,m) \lor \lnot \textit{par}(h,t) \lor \lnot \textit{par}(g,m) \lor \lnot \textit{par}(t,e) \lor \lnot \textit{par}(n,e) \lor \lnot \textit{fem}(h) \lor \lnot \textit{fem}(m) \lor \lnot \textit{fem}(n) \lor \lnot \textit{fem}(e)
    • \textit{dau}(e,t) \lor \lnot \textit{par}(h,m) \lor \lnot \textit{par}(h,t) \lor \lnot \textit{par}(g,m) \lor \lnot \textit{par}(t,e) \lor \lnot \textit{par}(n,e) \lor \lnot \textit{fem}(h) \lor \lnot \textit{fem}(m) \lor \lnot \textit{fem}(n) \lor \lnot \textit{fem}(e),
  • Anti-unify each compatible [7] pair [8] of literals:
    • \textit{dau}(x_{me},x_{ht}) from \textit{dau}(m,h) and \textit{dau}(e,t),
    • \lnot \textit{par}(x_{ht},x_{me}) from \lnot \textit{par}(h,m) and \lnot \textit{par}(t,e),
    • \lnot \textit{fem}(x_{me}) from \lnot \textit{fem}(m) and \lnot \textit{fem}(e),
    • \lnot \textit{par}(g,m) from \lnot \textit{par}(g,m) and \lnot \textit{par}(g,m), similar for all other background-knowledge literals
    • \lnot \textit{par}(x_{gt},x_{me}) from \lnot \textit{par}(g,m) and \lnot \textit{par}(t,e), and many more negated literals
  • Delete all negated literals containing variables that don't occur in a positive literal:
    • after deleting all negated literals containing other variables than x_{me},x_{ht}, only \textit{dau}(x_{me},x_{ht}) \lor \lnot \textit{par}(x_{ht},x_{me}) \lor \lnot \textit{fem}(x_{me}) remains, together with all ground literals from the background knowledge
  • Convert clauses back to Horn form:
    • \textit{dau}(x_{me},x_{ht}) \leftarrow \textit{par}(x_{ht},x_{me}) \land \textit{fem}(x_{me}) \land (\text{all background knowledge facts})

The resulting Horn clause is the hypothesis h obtained by the rlgg approach. Ignoring the background knowledge facts, the clause informally reads "x_{me} is called a daughter of x_{ht} if x_{ht} is the parent of x_{me} and x_{me} is female", which is a commonly accepted definition.

Concerning the above requirements, "Necessity" was satisfied because the predicate \textit{dau} doesn't appear in the background knowledge, which hence cannot imply any property containing this predicate, such as the positive examples are. "Sufficiency" is satisfied by the computed hypothesis h, since it, together with \textit{par}(h,m) \land \textit{fem}(m) from the background knowledge, implies the first positive example \textit{dau}(m,h), and similarly h and \textit{par}(t,e) \land \textit{fem}(e) from the background knowledge implies the second positive example \textit{dau}(e,t). "Weak consistency" is satisfied by h, since h holds in the (finite) Herbrand structure described by the background knowledge; similar for "Strong consistency".

The common definition of the grandmother relation, viz. \textit{gra}(x,z) \leftarrow \textit{fem}(x) \land \textit{par}(x,y) \land \textit{par}(y,z), cannot be learned using the above approach, since the variable yoccurs in the clause body only; the corresponding literals would have been deleted in the 4th step of the approach. To overcome this flaw, that step has to be modified such that it can be parametrized with different literal post-selection heuristics. Historically, the GOLEM implementation is based on the rlgg approach.

Implementations[edit]

See also[edit]

References[edit]

  1. Jump up^ Luc De Raedt. A Perspective on Inductive Logic Programming. The Workshop on Current and Future Trends in Logic Programming, Shakertown, to appear in Springer LNCS, 1999.CiteSeerX10.1.1.56.1790
  2. Jump up^ Muggleton, S. (1991). "Inductive logic programming". New Generation Computing 8 (4): 295–318. doi:10.1007/BF03037089. edit
  3. Jump up^ Muggleton, Stephen (1999). "Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic". Artificial Intelligence 114: 283–296.; here: Sect.2.1
  4. Jump up^ Džeroski, Sašo (1996), "Inductive Logic Programming and Knowledge Discovery in Databases", in Fayyad, U.M.; Piatetsky-Shapiro, G.; Smith, P. et al., Advances in Knowledge Discovery and Data Mining, MIT Press, pp. 117–152 ; here: Sect.5.2.4
  5. Jump up^ Plotkin, Gordon D. (1970). "A Note on Inductive Generalization". In Meltzer, B.; Michie, D. Machine Intelligence (Edinburgh University Press) 5: 153–163.
  6. Jump up^ Plotkin, Gordon D. (1971). "A Further Note on Inductive Generalization". In Meltzer, B.; Michie, D. Machine Intelligence (Edinburgh University Press) 6: 101–124.
  7. Jump up^ i.e. sharing the same predicate symbol and negated/unnegated status
  8. Jump up^ in general: n-tuple when n positive example literals are given

Further reading[edit]

RELATED ARTICLESExplain
Machine Learning Methods & Algorithms
Supervised learning
Inductive logic programming
AODE - Averaged one-dependence estimators
Artificial neural network
Bayesian statistics
Case-based reasoning
Conditional Random Field
Decision tree learning
Ensemble learning
Gaussian process regression (Kriging)
Gene expression programming
Group method of data handling
Information Fuzzy Networks (IFN)
Instance-based learning
Large-Scale Supervised Sparse Principal Component Analysis
Lazy learning
Learning automata
Learning Vector Quantization
Logistic Model Tree
Minimum message length
Minimum redundancy feature selection
Ordinal classification
Probably approximately correct learning
Random Forests
Regression analysis
Ripple-down rules
Statistical classification
Subsymbolic machine learning
Support vector machines
Symbolic machine learning
Graph of this discussion
Enter the title of your article


Enter a short (max 500 characters) summation of your article
Enter the main body of your article
Lock
+Comments (0)
+Citations (1)
+About
Enter comment

Select article text to quote
welcome text

First name   Last name 

Email

Skip