October 20th, Q&A session: Get you issues solved and questions answered!

GitHub logo
Edit

Naive Bayes

Overview

Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. In all trainers, prior probabilities can be preset or calculated. Also, there is an option to use equal probabilities.

Gaussian Naive Bayes

Gaussian Naive Bayes algorithm is based on this information.

When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a normal (or Gaussian) distribution

The model predicts the result value y belongs to a class C_k, k in [0..K] as

naive bayes

Where

naive bayes2

The model returns the number (index) of the most possible class. The trainer counts means and variances for each class.

GaussianNaiveBayesTrainer trainer = new GaussianNaiveBayesTrainer();

GaussianNaiveBayesModel mdl = trainer.fit(ignite, dataCache, vectorizer);

The full example could be found here.

Discrete (Bernoulli) Naive Bayes

Naive Bayes algorithm over Bernoulli or multinomial distribution based on next information.

It can be used for non-continuous features. The thresholds to convert a feature to a discrete value should be set to a trainer. If the features are binary, the discrete Bayes becomes Bernoulli.

The model predicts the result value y belongs to a class C_k, k in [0..K] as

naive bayes3

Where x_i is a discrete feature, p_ki is a prior probability of class p(C_k).

The model returns the number (index) of the most possible class.

double[][] thresholds = new double[][] {{.5}, {.5}, {.5}, {.5}, {.5}};

DiscreteNaiveBayesTrainer trainer = new DiscreteNaiveBayesTrainer()
  .setBucketThresholds(thresholds);

 DiscreteNaiveBayesModel mdl = trainer.fit(ignite, dataCache, vectorizer);

The full example could be found here.

Compound Naive Bayes

Compound Naive Bayes is a composition of several Naive Bayes classifiers where each classifier represents subset of features of one type.

The model contains both Gaussian and Discrete Bayes. A user can select which set of features will be trained on each model.

The model returns the number (index) of the most possible class.

double[] priorProbabilities = new double[] {.5, .5};

double[][] thresholds = new double[][] {{.5}, {.5}, {.5}, {.5}, {.5}};

CompoundNaiveBayesTrainer trainer = new CompoundNaiveBayesTrainer()
  .withPriorProbabilities(priorProbabilities)
  .withGaussianNaiveBayesTrainer(new GaussianNaiveBayesTrainer())
  .withGaussianFeatureIdsToSkip(asList(3, 4, 5, 6, 7))
  .withDiscreteNaiveBayesTrainer(new DiscreteNaiveBayesTrainer()
                                 .setBucketThresholds(thresholds))
  .withDiscreteFeatureIdsToSkip(asList(0, 1, 2));

  CompoundNaiveBayesModel mdl = trainer.fit(ignite, dataCache, vectorizer);

The full example could be found here.