Machine Learning

Apache Ignite Machine Learning (ML) is a set of simple, scalable and efficient tools that allow building predictive machine learning models without costly data transfers.

The rationale for adding machine and deep learning (DL) to Apache Ignite is quite simple. Today's data scientists have to deal with two major factors that keep ML from mainstream adoption.

Problem #1: Constant Data Movement (ETL)

First, the models are trained and deployed (after the training is over) in different systems. The data scientists have to wait for ETL or some other data transfer process to move the data into a system like Apache Mahout or Apache Spark for a training purpose. Then they have to wait while this process completes and redeploy the models in a production environment. The whole process can take hours moving terabytes of data from one system to another. Moreover, the training part usually happens over the old data set.

Problem #2: Lack of Horizontal Scalability

The second factor is related to scalability. ML and DL algorithms that have to process data sets which no longer fit within a single server unit are constantly growing. This urges the data scientist to come up with sophisticated solutions o​r turn to distributed computing platforms such as Apache Spark and TensorFlow. However, those platforms mostly solve only a part of the puzzle which is the models training, making it a burden of the developers to decide how do deploy the models in production later.

Zero ETL and Massive Scalability

Ignite Machine Learning relies on Ignite's memory-centric storage that brings massive scalability for ML and DL tasks and eliminates the wait imposed by ETL between the different systems. For instance, it allows users to run ML/DL training and inference directly on data stored across memory and disk in an Ignite cluster. Next, Ignite provides a host of ML and DL algorithms that are optimized for Ignite's collocated distributed processing. These implementations deliver in-memory speed and unlimited horizontal scalability when running in place against massive data sets or incrementally against incoming data streams, without requiring the data to be moved into another store. By eliminating the data movement and the long processing wait times, Ignite Machine learning enables continuous learning that can improve decisions based on the latest data as it arrives in real-time.

Fault Tolerance and Continuous Learning

Apache Ignite Machine Learning is tolerant to node failures. This means that in the case of node failures during the learning process, all recovery procedures will be transparent to the user, learning processes won't be interrupted, and we will get results in the time similar to the case when all nodes work fine.

Read more

Genetic Algorithms

Machine learning component goes with a set of genetic algorithms (GA) which is a method of solving optimization problems by simulating the process of biological evolution.

GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include: automotive design, computer gaming, robotics, investments, traffic/shipment routing and more.

 

Click on the image to view full size.