Minggu, 20 November 2016

Big Data: Association Rule in the age whos leading a product maket



Association Rules (in data mining)
Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository. An example of an association rule would be "If a customer buys a dozen eggs, he is 80% likely to also purchase milk."
An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent. Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true. In data mining, association rules are useful for analyzing and predicting customer behavior. They play an important part in shopping basket data analysis, product clustering, catalog design and store layout. Programmers use association rules to build programs capable of machine learning.  Machine learning is a type of artificial intelligence (AI) that seeks to build programs with the ability to become more efficient without being explicitly programmed.

TID
ITEMS
1
Kids, adult, teenager,old woman
2
teenager, adult, old man, child, old woman
3
old man, adult, old woman, child
4
old woman, adult, teenager
5
teenager
6
Teenager, old woman
7
adult, Teenager, child
8
old woman, old man


The FP-Growth Algorithm is an alternative way to find frequent itemsets without using candidate generations, thus improving performance. For so much it uses a divide-and-conquer strategy [17]. The core of this method is the usage of a special data structure named frequent-pattern tree (FP-tree), which retains the itemset association information. In simple words, this algorithm works as follows: first it compresses the input database creating an FP-tree instance to represent frequent items. After this first step it divides the compressed database into a set of conditional databases, each one associated with one frequent pattern. Finally, each such database is mined separately. Using this strategy, the FP-Growth reduces the search costs looking for short patterns recursively and then concatenating them in the long frequent patterns, offering good selectivity. In large databases, it’s not possible to hold the FP-tree in the main memory. A strategy to cope with this problem is to firstly partition the database into a set of smaller databases (called projected databases), and then construct an FP-tree from each of these smaller databases. The next subsections describe the FP-tree structure and FP-Growth Algorithm, finally an example is presented to make it easier to understand these 
concepts.

Apriori Principle and FP-Growth
Take minimum support as 30%.
TID
ITEMS
1
Kids, adult, teenager,old woman
2
teenager, adult, old man, child, old woman
3
old man, adult, old woman, child
4
old woman, adult, teenager
5
teenager
6
Teenager, old woman
7
adult, Teenager, child
8
old woman, old man

Step 1 - Calculate Minimum support
First should calculate the minimum support count. Question says minimum support should be 30%. It calculate as follows:
Minimum support count(30/100 * 8) = 2.4
As a result, 2.4 appears but to empower the easy calculation it can be rounded to to the ceiling value. Now, Minimum support count is ceiling(30/100 * 8) = 3

Step 2 - Find frequency of occurrence
Now time to find the frequency of occurrence of each item in the Database table. For example, item A occurs in row 1,row 2,row 3,row 4 and row 7. Totally 5 times occurs in the Database table. You can see the counted frequency of occurrence of each item in Table 2.

Items
Frequency
A
5
B
6
C
3
D
6
E
4

Step 3 - Prioritize the items
Items
Frequency
Priority
A
5
3
B
6
1
C
3
5
D
6
2
E
4
4

Order the items according to the priority
TID
Items
Ordered Items
1
Kids, adult, teenager,old woman
old woman, teenager, adult, child
2
teenager, adult, old man, child, old woman
old woman, teenager, adult, child Storm, old man
3
old man, adult, old woman, child
old woman, adult, child, old man
4
old woman, adult, teenager
old woman, teenager, adult
5
teenager
teenager
6
Teenager, old woman
old woman, teenager
7
adult, Teenager, child
teenager , adult, child
8
old woman, old man
old woman, old man

FP-Tree
 
 Validation
 

 reference : https://books.google.co.id/books?id=1SylCgAAQBAJ&pg=PA79&lpg=PA79&dq=Association+Rules+to+Predict+the+Weather&source=bl&ots=8HGRwepxt9&sig=uaLGBk8AOQWge7uYBZNilGa-hB8&hl=jv&sa=X&redir_esc=y#v=onepage&q=Association%20Rules%20to%20Predict%20the%20Weather&f=false

Tidak ada komentar:

Posting Komentar