Eclat algorithm

Eclat is a depth-first search algorithm using set intersection. The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains.

The Eclat Algorithm[edit]

The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains.

The basic idea for the eclat algorithm is use tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree.

Algorithm[edit]

The Eclat algorithm is defined recursively. The initial call use all the single items with their tidsets. In each recursive call, the function IntersectTidsets verifies each itemset-tidset pair \left\langle {X,t(X)} \right\rangle with all the others pairs \left\langle {Y,t(Y)} \right\rangle to generate new candidates N_{XY}. If the new candidate is frequent, it is added to the set P_{X}. Then, recursively, it finds all the frequent itemsets in the X branch. The algorithm searchs in a DFS manner to find all the frequent sets.

Implementation[edit]

The eclat algorithm can be found in the arule package of R system.

   * R package: arules   * Method: eclat   * Documentation : arules package

Usage

eclat(data, parameter = NULL, control = NULL)

Arguments

data 
object of class transactions or any data structure which can be coerced into transactions (e.g., binary matrix, data.frame).
parameter 
object of class ECparameter or named list (default values are: support 0.1 and maxlen 5)
control 
object of class ECcontrol or named list for algorithmic controls.

Value

Returns an object of class itemsets

 

Example

data("Adult") ## Mine itemsets with minimum support of 0.1.itemsets <- eclat(Adult, parameter = list(supp = 0.1, maxlen = 15))

Visualization[edit]

The arules package implements some visualization methods for itemsets, which are the return type for the eclat algorithm. Here are some examples:

Example 1

data("Adult")  ## Mine frequent itemsets with Eclat.fsets <- eclat(Adult, parameter = list(supp = 0.5)) ## Display the 5 itemsets with the highest support.fsets.top5 <- SORT(fsets)1:5?inspect(fsets.top5) ## Get the itemsets as a listas(items(fsets.top5), "list") ## Get the itemsets as a binary matrixas(items(fsets.top5), "matrix") ## Get the itemsets as a sparse matrix, a ngCMatrix from package Matrix.## Warning: for efficiency reasons, the ngCMatrix you get is transposedas(items(fsets.top5), "ngCMatrix")

Example 2

## Create transaction data set.data <- list(  c("a","b","c"),  c("a","b"),  c("a","b","d"),  c("b","e"),  c("b","c","e"),  c("a","d","e"),  c("a","c"),  c("a","b","d"),  c("c","e"),  c("a","b","d","e")) t <- as(data, "transactions") ## Mine itemsets with tidLists.f <- eclat(data, parameter = list(support = 0, tidLists = TRUE)) ## Get dimensions of the tidLists.dim(tidLists(f)) ## Coerce tidLists to list.as(tidLists(f), "list") ## Inspect visually.image(tidLists(f)) ##Show the Frequent itemsets and respectives supportsinspect(f)
# items support
1 {b, c, e} 0.1
2 {a, b, c} 0.1
3 {a, c} 0.2
4 {b, c} 0.2
5 {c, e} 0.2
6 {a, b,d, e} 0.1
7 {a, d, e} 0.2
8 {b, d, e} 0.1
9 {a, b, d} 0.3
10 {a, d} 0.4
11 {b, d} 0.3
12 {d, e} 0.2
13 {a, b, e} 0.1
14 {a, e} 0.2
15 {b, e} 0.3
16 {a, b} 0.5
17 {a} 0.7
18 {b} 0.7
19 {e} 0.5
20 {d} 0.4
21 {c} 0.4

Use Case[edit]

To see some real example of the use of the Eclat algorithm it will be used some data from the northwind database. The northwind database is freely available for download and represents data from an enterprise. In this example it will be used the table order details from the database. The order details table is used to relate the orders with products (in a n to n relationship). The Eclat algorithm will be used to find frequent patterns from this data to see if there are any products that are bought together.

Scenario[edit]

Given the data from the order details table from the northwind database, find all the frequent itemsets with support = 0.1 and length of at least 2.

Input Data[edit]

The order details table has the fields:

ID
primary key
Order ID
foreign key from table Orders
Product ID
foreign key from table Products
Quantity
the quantity bought
Discount
the discount offered
Unit Price
the unit price of the product

To use the data, some pre-processing is necessary. The table may have many rows that belongs to the same order, so the table was converted in a way that all the rows for one order became only one row in the new table containing the product id's of the products belonging to that order. The fields ID, order id, quantity, discount and unit price was discarded. The data was saved in a txt file called northwind-orders.txt. The file was scripted in a way ready to be loaded as a list object in R.

Implementation[edit]

To run the example the package arules need to be loaded in R.

First, the data is loaded in a list object in R

## 1041 transactions is loaded, the listing below was shortened## some duplicates transactions was introduced to produce some results for the algorithmdata = list( c("2","3","4","6","7","8","10","12","13","14","16","20","23","32","39","41","46","52","55","60","64","66","73","75","77"), c("11","42","72"), c("14","51"), c("41","51","65"), c("22","57","65"), ...)

Second, the eclat algorithm is used.

itemsets <- eclat(data, parameter = list(support = 0.1, minlen=2, tidLists = TRUE, target="frequent itemsets"))
parameter specification: tidLists support minlen maxlen            target   ext     TRUE     0.1      2      5 frequent itemsets FALSEalgorithmic control: sparse sort verbose      7   -2    TRUEeclat - find frequent item sets with the eclat algorithmversion 2.6 (2004.08.16)         (c) 2002-2004   Christian Borgeltcreate itemset ... set transactions ...[78 item(s), 1041 transaction(s)] done [0.00s].sorting and recoding items ... [3 item(s)] done [0.00s].creating bit matrix ... [3 row(s), 1041 column(s)] done [0.00s].writing  ... [4 set(s)] done [0.00s].Creating S4 object  ... done [0.00s].

 

Output Data[edit]

The itemsets object holds the output of the execution of the eclat algorithm. As can be seen above, 4 sets was generated. To see the results it can be used:

inspect(itemsets)
  items   support1 {11,              42,              72}  0.19404422 {42,              72}  0.19404423 {11,              42}  0.19404424 {11,              72}  0.1959654

 

Analysis[edit]

As can be seen above, there are 4 frequent itemsets as result of the eclat algorithm. This output was induced by the replication of the transaction {11, 42, 72} many times in the data. This result shows that the tuples {11,42,72},{42,72} and {11,42} has a support of 19,40%; and the tuple {11,72} has a support of 19,60%.

The product id's 11, 42 and 72 represents the products Queso CabralesSingaporean Hokkien Fried Mee and Mozzarella di Giovanni, respectively. So, the output of the eclat algorithm suggests a strong frequent shop pattern of buying this items together.

References[edit]

  • Hahsler, M.; Buchta, C.; Gruen, B. & Hornik, K. arules: Mining Association Rules and Frequent Itemsets. 2009.
  • Hahsler, M.; Gruen, B. & Hornik, K. arules -- A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 2005, 14, 1-25
  • Zaki, M. J.;Meira Jr., W. Fundamentals of Data Mining Algorithms. Cambridge University Press, 2009
  • Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. New algorithms for fast discovery of association rules. Technical Report 651, Computer Science Department, University of Rochester, Rochester, NY 14627. 1997.
  • Christian Borgelt (2003) Efficient Implementations of Apriori and Eclat. Workshop of Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL, USA).
RELATED ARTICLESExplain
Machine Learning Methods & Algorithms
Unsupervised learning
Association rule learning
Eclat algorithm
Apriori algorithm
FP-growth algorithm
Graph of this discussion
Enter the title of your article


Enter a short (max 500 characters) summation of your article
Enter the main body of your article
Lock
+Comments (0)
+Citations (0)
+About
Enter comment

Select article text to quote
welcome text

First name   Last name 

Email

Skip