Apache Hadoop Performance Acceleration

Apache Ignite® enables real-time analytics across Apache™ Hadoop® operational and historical data silos. The Ignite in-memory computing platform provides low-latency and high-throughput operations while Hadoop continues to be used for long-running OLAP workloads.

Apache Hadoop Performance Acceleration

As the architecture diagram on the right suggests, you can achieve the performance acceleration of Hadoop-based systems by deploying Ignite as a separate distributed storage that maintains the data sets required for your low-latency operations or real-time reports.

First, depending on the data volume and available memory capacity, you can enable Ignite native persistence to store historical data sets on disk while dedicating a memory space for operational records. You can continue to use Hadoop as storage for less frequently used data or for long-running and ad-hoc analytical queries.

Next, your applications and services should use Ignite native APIs to process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce), and machine learning APIs for various data processing needs.

Finally, consider using Apache Spark DataFrames APIs if an application needs to run federated or cross-database queries across Ignite and Hadoop clusters. Ignite is integrated with Spark, which natively supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of scenarios when neither Ignite nor Hadoop contains the entire data set.

How to split data and operations between Ignite and Hadoop?

Consider using this approach:

Getting Started Checklist

Follow the steps below to implement the discussed architecture in practice: