Quick Start Guide

ARCHITECTURAL BACKBONE

Memory-First + MVCC + Distributed Replication

CORE FEATURES

DATA PLACEMENT & PROCESSING

OPERATIONAL EXCELLENCE

View all

PRIMARY USE CASES

ADDITIONAL USE CASES

SUPPORTING PATTERNS

View all

Project Info

ARCHITECTURAL BACKBONE

Memory-First + MVCC + Distributed Replication

CORE FEATURES

DATA PLACEMENT & PROCESSING

OPERATIONAL EXCELLENCE

View all

PRIMARY USE CASES

ADDITIONAL USE CASES

SUPPORTING PATTERNS

Accelerate Existing Hadoop Deployments

Accelerate the performance of Hadoop-based applications with Ignite
as a high-performance data access layer

Start Coding

Apache Ignite 2.x Use Case

This use case applies to Apache Ignite 2.x. For Ignite 3, see the new use cases.

How Does Apache Ignite Acceleration Work?

To achieve the performance acceleration of Hadoop-based systems, deploy Ignite as a separate distributed storage that maintains the data sets required for your low-latency operations or real-time reports

There are 3 basic steps:

Depending on the data volume and available memory capacity, you can enable Ignite native persistence to store historical data sets on disk while dedicating a memory space for operational records.

You can continue to use Hadoop as storage for less frequently-used data or for long-running and ad-hoc analytical queries.

Your applications and services should use Ignite native APIs to process the data residing in the in-memory cluster. Ignite provides SQL, compute (aka. map-reduce), and machine learning APIs for various data processing needs.

Consider using Apache Spark DataFrames APIs if an application needs to run federated or cross-database queries across Ignite and Hadoop clusters.

Ignite is integrated with Spark, which natively supports Hive/Hadoop. Cross-database queries should be considered only for a limited number of scenarios when neither Ignite nor Hadoop contains the entire data set.

How Can You Split Data And Operations Between Ignite And Hadoop?

Use Apache Ignite for tasks that require:
– Low-latency response time (microseconds, milliseconds, seconds)

– High-throughput operations (thousands and millions of operations per second)
– Real-time processing

Continue using Apache Hadoop for:
— High-latency operations (dozens of seconds, minutes, hours)
— Batch processing

5 Steps To Implement The Architecture In Practice

Download and install Apache Ignite to your system.

Select a list of operations for Ignite.

The best operations are those that require low-latency response time, high-throughput, and real-time analytics.

Consider enabling Ignite native persistence, or use Ignite as a pure in-memory cache, or in-memory data grid that persists changes to Hadoop or another external database.

Update your applications

Ensure they use Ignite native APIs to process Ignite data and Spark for federated queries.

If you need to replicate changes between Ignite and Hadoop clusters, use existing change-data-capture solutions:

Debezium
Kafka

GridGain Data Lake Accelerator
Oracle GoldenGate

To write-through changes to Hadoop directly,
implement Ignite's CacheStore interface.

Ready to Start?

Discover our quick start guide and build your first
application in 5-10 minutes

Quick Start Guide

Want to Learn More?

Read the Apache Spark acceleration article

Apache Spark Acceleration Article