Association
Rules (in data mining)
Association
rules are if/then statements that help uncover relationships between seemingly
unrelated data in a relational database or other information repository. An
example of an association rule would be "If a customer buys a dozen eggs,
he is 80% likely to also purchase milk."
An
association rule has two parts, an antecedent (if) and a consequent (then). An
antecedent is an item found in the data. A consequent is an item that is found
in combination with the antecedent. Association rules are created by analyzing
data for frequent if/then patterns and using the criteria support and
confidence to identify the most important relationships. Support is an
indication of how frequently the items appear in the database. Confidence
indicates the number of times the if/then statements have been found to be
true. In data mining, association rules are useful for analyzing and predicting
customer behavior. They play an important part in shopping basket data
analysis, product clustering, catalog design and store layout. Programmers use
association rules to build programs capable of machine learning. Machine
learning is a type of artificial intelligence (AI) that seeks to build programs
with the ability to become more efficient without being explicitly programmed.
TID
|
ITEMS
|
1
|
Kids,
adult, teenager,old woman
|
2
|
teenager,
adult, old man, child, old woman
|
3
|
old
man, adult, old woman, child
|
4
|
old
woman, adult, teenager
|
5
|
teenager
|
6
|
Teenager,
old woman
|
7
|
adult,
Teenager, child
|
8
|
old
woman, old man
|
The
FP-Growth Algorithm is an alternative way to find frequent itemsets without
using candidate generations, thus improving performance. For so much it uses a
divide-and-conquer strategy [17]. The core of this method is the usage of a
special data structure named frequent-pattern tree (FP-tree), which retains the
itemset association information. In simple words, this algorithm works as
follows: first it compresses the input database creating an FP-tree instance to
represent frequent items. After this first step it divides the compressed
database into a set of conditional databases, each one associated with one
frequent pattern. Finally, each such database is mined separately. Using this
strategy, the FP-Growth reduces the search costs looking for short patterns
recursively and then concatenating them in the long frequent patterns, offering
good selectivity. In large databases, it’s not possible to hold the FP-tree in
the main memory. A strategy to cope with this problem is to firstly partition
the database into a set of smaller databases (called projected databases), and
then construct an FP-tree from each of these smaller databases. The next
subsections describe the FP-tree structure and FP-Growth Algorithm, finally an
example is presented to make it easier to understand these
concepts.
Apriori
Principle and FP-Growth
Take
minimum support as 30%.
TID
|
ITEMS
|
1
|
Kids,
adult, teenager,old woman
|
2
|
teenager,
adult, old man, child, old woman
|
3
|
old
man, adult, old woman, child
|
4
|
old
woman, adult, teenager
|
5
|
teenager
|
6
|
Teenager,
old woman
|
7
|
adult,
Teenager, child
|
8
|
old
woman, old man
|
Step
1 - Calculate Minimum support
First
should calculate the minimum support count. Question says minimum support
should be 30%. It calculate as follows:
Minimum
support count(30/100 * 8) = 2.4
As
a result, 2.4 appears but to empower the easy calculation it can be rounded to
to the ceiling value. Now, Minimum support count is ceiling(30/100 * 8) = 3
Step
2 - Find frequency of occurrence
Now
time to find the frequency of occurrence of each item in the Database table.
For example, item A occurs in row 1,row 2,row 3,row 4 and row 7. Totally 5
times occurs in the Database table. You can see the counted frequency of
occurrence of each item in Table 2.
Items
|
Frequency
|
A
|
5
|
B
|
6
|
C
|
3
|
D
|
6
|
E
|
4
|
Step
3 - Prioritize the items
Items
|
Frequency
|
Priority
|
A
|
5
|
3
|
B
|
6
|
1
|
C
|
3
|
5
|
D
|
6
|
2
|
E
|
4
|
4
|
Order
the items according to the priority
TID
|
Items
|
Ordered Items
|
1
|
Kids,
adult, teenager,old woman
|
old woman, teenager, adult, child
|
2
|
teenager,
adult, old man, child, old woman
|
old woman, teenager, adult, child
Storm, old man
|
3
|
old
man, adult, old woman, child
|
old woman, adult, child, old man
|
4
|
old
woman, adult, teenager
|
old woman, teenager, adult
|
5
|
teenager
|
teenager
|
6
|
Teenager,
old woman
|
old woman, teenager
|
7
|
adult,
Teenager, child
|
teenager , adult, child
|
8
|
old
woman, old man
|
old woman, old man
|
FP-Tree
Validation
reference : https://books.google.co.id/books?id=1SylCgAAQBAJ&pg=PA79&lpg=PA79&dq=Association+Rules+to+Predict+the+Weather&source=bl&ots=8HGRwepxt9&sig=uaLGBk8AOQWge7uYBZNilGa-hB8&hl=jv&sa=X&redir_esc=y#v=onepage&q=Association%20Rules%20to%20Predict%20the%20Weather&f=false