I use RapidMiner because hundreds of thousands of active
users for its flagship RapidMiner predictive analytics tool. Used by business
analysts and scientists around the world, RapidMiner looks beyond what has
happened, and helps predict what might happen - customer churn, factory
breakdowns, better results from changes in advertising and marketing campaigns,
as examples. But in this prediction, I'll use the “prediksi elektabilitas
caleg”
to know whether the candidates are accepted or not
.
I use three main
algorithms of RapidMiner which are; Decision Tree (C4.5), Naïve Bayes
(NB), and K-Nearest Neighbor (K-NN).
1. DECISION TREE
Decision tree the goal are to create a model that predicts the value of a target
variable based on several input variables. An example is shown in the
diagram at right. Each interior node
corresponds to one of the input variables; there are edges to children
for each of the possible values of that input variable. Each leaf
represents a value of the target variable given the values of the input
variables represented by the path from the root to the leaf.
A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be 'independent feature model'. In simple terms, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class (i.e. attribute) is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a Naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.
The advantage of the Naive Bayes classifier is that it only requires a small amount of training data to estimate the means and variances of the variables necessary for classification. Because independent variables are assumed, only the variances of the variables for each label need to be determined and not the entire covariance matrix.
3. K-Nearest Neighbor (KNN)
The k-Nearest Neighbor algorithm is based on learning by analogy, that is, by comparing a given test example with training examples that are similar to it. The training examples are described by n attributes. Each example represents a point in an n-dimensional space. In this way, all of the training examples are stored in an n-dimensional pattern space. When given an unknown example, a k-nearest neighbor algorithm searches the pattern space for the k training examples that are closest to the unknown example. These k training examples are the k "nearest neighbors" of the unknown example. "Closeness" is defined in terms of a distance metric, such as the Euclidean distance.
Finally we know the prediction of YA of true
TIDAK is 19 and true YA is 23. In this model is 89.63% with prediction of
TIDAK- true TIDAK 358, and with true YA 25.