October 20th, Q&A session: Get you issues solved and questions answered!

GitHub logo
Edit

Evaluator

Apache Ignite ML comes with a number of machine learning algorithms that can be used to learn from and make predictions on data. When these algorithms are applied to build machine learning models, there is a need to evaluate the performance of the model on some criteria, which depends on the application and its requirements. Apache Ignite ML also provides a suite of classification and regression metrics for the purpose of evaluating the performance of machine learning models.

Classification model evaluation

While there are many different types of classification algorithms, the evaluation of classification models all share similar principles. In a supervised classification problem, there exists a true output and a model-generated predicted output for each data point. For this reason, the results for each data point can be assigned to one of four categories:

  • True Positive (TP) - label is positive and prediction is also positive

  • True Negative (TN) - label is negative and prediction is also negative

  • False Positive (FP) - label is negative but prediction is positive

  • False Negative (FN) - label is positive but prediction is negative

Especially, these metrics are important for binary classification.

Caution
Multiclass classification evalution is not supported yet in Apache Ignite ML.

The full list of binary classification metrics supported in Apache Ignite ML is next:

  • Accuracy

  • Balanced accuracy

  • F-Measure

  • FallOut

  • FN

  • FP

  • FDR

  • MissRate

  • NPV

  • Precision

  • Recall

  • Specificity

  • TN

  • TP

The explanation and formulas for these metrics can be found here.

// Define the vectorizer.
Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
   .labeled(Vectorizer.LabelCoordinate.FIRST);

// Define the trainer.
SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer();

// Train the model.
SVMLinearClassificationModel mdl = trainer.fit(ignite, dataCache, vectorizer);

// Calculate all classification metrics.
EvaluationResult res = Evaluator
  .evaluateBinaryClassification(dataCache, mdl, vectorizer);

double accuracy = res.get(MetricName.ACCURACY)

Regression model evaluation

Regression analysis is used when predicting a continuous output variable from a number of independent variables.

The full list of regression metrics supported in Apache Ignite ML is as follows:

  • MAE

  • R2

  • RMSE

  • RSS

  • MSE

// Define the vectorizer.
Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
   .labeled(Vectorizer.LabelCoordinate.FIRST);

// Define the trainer.
KNNRegressionTrainer trainer = new KNNRegressionTrainer()
    .withK(5)
    .withDistanceMeasure(new ManhattanDistance())
    .withIdxType(SpatialIndexType.BALL_TREE)
    .withWeighted(true);

// Train the model.
KNNRegressionModel knnMdl = trainer.fit(ignite, dataCache, vectorizer);

// Calculate all classification metrics.
EvaluationResult res = Evaluator
  .evaluateRegression(dataCache, mdl, vectorizer);

double mse = res.get(MetricName.MSE);