Memory-Centric Storage

Apache Ignite is based on distributed memory-centric architecture that combines the performance and scale of in-memory computing together with the disk durability and strong consistency in one system.

When native persistence is turned on, Ignite functions as a memory-centric system-of-record, where most of the processing happens in memory on cached data, but the superset of data and indexes gets persisted to disk.

When persistence is turned off, Ignite functions as a memory-only store, in which case it can be treated as a Distributed Cache, In-Memory Database (IMDB) or In-Memory Data Grid (IMDG).

Database and Caching in One

One of the main advantages of Ignite is that it comes with a distributed in-memory cache and a distributed on-disk storage in one platform. In other words, Ignite users get both, a distributed cache and a distributed database together.

In partitioned (not replicated) mode, the data is partitioned across multiple servers, with each server responsible for a subset of the data. Collectively, the full data set is stored across all servers. Each server has its subset persisted on disk. Depending on how much memory is available, each server also has either the whole subset or a portion of it cached in memory. Such combination of memory and disk creates a distributed memory-centric storage.

The following memory and disk usage modes are supported:

Mode Description
In-Memory

The whole data set is stored in memory. In order to survive node failures, it is recommended to configure a number of redundant backup copies (aka. replication factor) across the cluster.

Use cases: in-memory caches, in-memory data grids, in-memory computations, web-session caching, real-time processing of continuous data streams.

In-Memory + 3rd party database

Ignite can be used as a caching layer (aka. data grid) over any existing 3rd party database. This mode is used to accelerate and scale the underlying database. Automatic integration is provided with most known databases, like Oracle, MySQL, PostgreSQL, Apache Cassandra, etc.

Use cases: Ignite as In-Memory Data Grid - adds acceleration and scale to existing database deployments (RDBMS, NoSQL, etc).

In-Memory + Full Copy on Disk

The whole data set is stored in memory and on disk. The disk is used as a memory-offload for data recovery purposes, in case of full cluster crashes and restarts.

Use cases: Ignite as an In-Memory Database - provides SQL, key-value and collocated processing APIs over in-memory data.

100% on Disk + In-Memory Cache

100% of data is persisted to disk and the same or smaller amount is cached in memory. The more data is cached, the faster is the performance. The disk serves as the primary storage that survives any type of cluster failures and restarts.

Use cases: Ignite as a Memory-Centric Distributed Database - provides distributed database with SQL, key-value and collocated processing APIs.

Collocated vs Client-Server Processing

The disk-centric systems, like RDBMS or NoSQL, generally utilize the classic client-server approach, where the data is brought from the server to the client side where it gets processed and then is usually discarded. This approach does not scale well as moving the data over the network is the most expensive operation in a distributed system.

A much more scalable approach is collocated processing that reverses the flow by bringing the computations to the servers where the data actually resides. This approach allows you to execute advanced logic or distributed SQL with JOINs exactly where the data is stored avoiding expensive serialization and network trips.

Partitioning & Replication

Depending on the configuration, Ignite can either partition or replicate data across its memory-centric storage. Unlike REPLICATED mode, where data is fully replicated across all nodes in the cluster, in PARTITIONED mode Ignite will equally split the data across multiple cluster nodes, allowing for storing TBs of data both in memory and on disk.

Durability

Ignite provides strong ACID durability guarantees to the data:

  • Committed transactions will always survive any failures.
  • The cluster can always be recovered to the latest successfully committed transaction.
  • The cluster restarts are very fast.

Redundancy

Ignite also allows to configure multiple backup copies to guarantee data resiliency in case of failures.

Consistency

Regardless of which replication scheme is used, Ignite guarantees data consistency across all cluster members.

Write-Ahead Log

Every time the data is updated in memory, the update will be appended to the tail of the write-ahead log (WAL). The purpose of the WAL is to propagate updates to disk in the fastest way possible and provide a consistent recovery mechanism that supports full cluster failures.

Checkpointing

As WAL grows, it periodically gets checkpointed to the main storage. Checkpointing is the process of copying dirty pages from memory to the partition files on disk. A dirty page is a page that was updated in memory, was appended to WAL, but was not written to a respective partition file on disk yet.

Persistence Configuration

To enable Ignite persistence, add the following configuration parameter to the cluster's node configuration:

                        
                            
                            
                                
                                    
                                        
                                            
                                        
                                    
                                
                            

                            

                        
                    

More on Memory-Centric Storage

Feature Description
Persistence

Ignite native persistence is a distributed, ACID, and SQL-compliant disk store that transparently integrates with Ignite memory-centric storage:

Partitioning & Replication

Depending on the configuration, Ignite can either partition or replicate data. Unlike REPLICATED mode, where data is fully replicated across all nodes in the cluster, in PARTITIONED mode Ignite will equally split the data across multiple cluster nodes.

Distributed Database

Apache Ignite can be used as all-in-one distributed database that supports SQL, key-value, compute, machine learning and other data processing APIs:

In-Memory Database

Apache Ignite can be used as a distributed and horizontally scalable in-memory database (IMDB):

Data Grid

Ignite can act as a data grid that is a distributed, transactional key-value store. Unlike other in-memory data grids (IMDG), Ignite enables storing data both, in memory and on disk, and therefore is able to store more data than can fit in physical memory:

Database Caching

Ignite is used as a caching layer (aka. data grid) above 3rd party databases such as RDBMS, Apache Cassandra, MongoDB: