BIG DATA : Oktober 2016

I use RapidMiner because hundreds of thousands of active users for its flagship RapidMiner predictive analytics tool. Used by business analysts and scientists around the world, RapidMiner looks beyond what has happened, and helps predict what might happen - customer churn, factory breakdowns, better results from changes in advertising and marketing campaigns, as examples. But in this prediction, I'll use the “prediksi elektabilitas caleg”

to know whether the candidates are accepted or not .

I use three main algorithms of RapidMiner which are; Decision Tree (C4.5), Naïve Bayes (NB), and K-Nearest Neighbor (K-NN).

1. DECISION TREE
Decision tree the goal are to create a model that predicts the value of a target variable based on several input variables. An example is shown in the diagram at right. Each interior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf.

then here we can see the decision tree shows the vote affect the result of YA & TIDAK. Which the accuracy of this model is 93.16% with the prediction of TIDAK and true TIDAK is 362 while true YA is 14, and the prediction of YA and true TIDAk is 15 while true YA is 34.

2. NAIVE BAYES

A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be 'independent feature model'. In simple terms, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class (i.e. attribute) is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a Naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.

The advantage of the Naive Bayes classifier is that it only requires a small amount of training data to estimate the means and variances of the variables necessary for classification. Because independent variables are assumed, only the variances of the variables for each label need to be determined and not the entire covariance matrix.

3. K-Nearest Neighbor (KNN)

The k-Nearest Neighbor algorithm is based on learning by analogy, that is, by comparing a given test example with training examples that are similar to it. The training examples are described by n attributes. Each example represents a point in an n-dimensional space. In this way, all of the training examples are stored in an n-dimensional pattern space. When given an unknown example, a k-nearest neighbor algorithm searches the pattern space for the k training examples that are closest to the unknown example. These k training examples are the k "nearest neighbors" of the unknown example. "Closeness" is defined in terms of a distance metric, such as the Euclidean distance.

Finally we know the prediction of YA of true TIDAK is 19 and true YA is 23. In this model is 89.63% with prediction of TIDAK- true TIDAK 358, and with true YA 25.

BIG DATA

Minggu, 23 Oktober 2016

Prediction Model Using Data Training on “prediksi elektabilitas caleg” With RapidMiner

then here we can see the decision tree shows the vote affect the result of YA & TIDAK. Which the accuracy of this model is 93.16% with the prediction of TIDAK and true TIDAK is 362 while true YA is 14, and the prediction of YA and true TIDAk is 15 while true YA is 34.

3. K-Nearest Neighbor (KNN)