Naive Bayes
Overview
Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. In all trainers, prior probabilities can be preset or calculated. Also, there is an option to use equal probabilities.
Gaussian Naive Bayes
Gaussian Naive Bayes algorithm is based on this information.
When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a normal (or Gaussian) distribution
The model predicts the result value y belongs to a class C_k, k in [0..K] as
Where
The model returns the number (index) of the most possible class. The trainer counts means and variances for each class.
GaussianNaiveBayesTrainer trainer = new GaussianNaiveBayesTrainer();
GaussianNaiveBayesModel mdl = trainer.fit(ignite, dataCache, vectorizer);
The full example could be found here.
Discrete (Bernoulli) Naive Bayes
Naive Bayes algorithm over Bernoulli or multinomial distribution based on next information.
It can be used for non-continuous features. The thresholds to convert a feature to a discrete value should be set to a trainer. If the features are binary, the discrete Bayes becomes Bernoulli.
The model predicts the result value y belongs to a class C_k, k in [0..K] as
Where x_i is a discrete feature, p_ki is a prior probability of class p(C_k).
The model returns the number (index) of the most possible class.
double[][] thresholds = new double[][] {{.5}, {.5}, {.5}, {.5}, {.5}};
DiscreteNaiveBayesTrainer trainer = new DiscreteNaiveBayesTrainer()
.setBucketThresholds(thresholds);
DiscreteNaiveBayesModel mdl = trainer.fit(ignite, dataCache, vectorizer);
The full example could be found here.
Compound Naive Bayes
Compound Naive Bayes is a composition of several Naive Bayes classifiers where each classifier represents subset of features of one type.
The model contains both Gaussian and Discrete Bayes. A user can select which set of features will be trained on each model.
The model returns the number (index) of the most possible class.
double[] priorProbabilities = new double[] {.5, .5};
double[][] thresholds = new double[][] {{.5}, {.5}, {.5}, {.5}, {.5}};
CompoundNaiveBayesTrainer trainer = new CompoundNaiveBayesTrainer()
.withPriorProbabilities(priorProbabilities)
.withGaussianNaiveBayesTrainer(new GaussianNaiveBayesTrainer())
.withGaussianFeatureIdsToSkip(asList(3, 4, 5, 6, 7))
.withDiscreteNaiveBayesTrainer(new DiscreteNaiveBayesTrainer()
.setBucketThresholds(thresholds))
.withDiscreteFeatureIdsToSkip(asList(0, 1, 2));
CompoundNaiveBayesModel mdl = trainer.fit(ignite, dataCache, vectorizer);
The full example could be found here.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.