Apache Ignite is based on distributed memory-centric architecture that combines the performance and scale of in-memory computing together with the disk durability and strong consistency in one system.
When native persistence is turned on, Ignite functions as a
When persistence is turned off, Ignite functions as a memory-only store, in which case it can be treated as a Distributed Cache, In-Memory Database (IMDB) or In-Memory Data Grid (IMDG).
One of the main advantages of Ignite is that it comes with a distributed in-memory cache and a distributed on-disk storage in one platform. In other words, Ignite users get both, a distributed cache and a distributed database together.
In partitioned (not replicated) mode, the data is partitioned across multiple servers, with each server responsible for a subset of the data. Collectively, the full data set is stored across all servers. Each server has its subset persisted on disk. Depending on how much memory is available, each server also has either the whole subset or a portion of it cached in memory. Such combination of memory and disk creates a distributed memory-centric storage.
The following memory and disk usage modes are supported:
The whole data set is stored in memory. In order to survive node failures, it is recommended to configure a number of redundant backup copies (aka. replication factor) across the cluster.
Use cases: in-memory caches, in-memory data grids, in-memory computations, web-session caching, real-time processing of continuous data streams.
|In-Memory + 3rd party database||
Ignite can be used as a caching layer (aka. data grid) over any existing 3rd party database. This mode is used to accelerate and scale the underlying database. Automatic integration is provided with most known databases, like Oracle, MySQL, PostgreSQL, Apache Cassandra, etc.
Use cases: Ignite as In-Memory Data Grid - adds acceleration and scale to existing database deployments (RDBMS, NoSQL, etc).
|In-Memory + Full Copy on Disk||
The whole data set is stored in memory and on disk. The disk is used as a memory-offload for data recovery purposes, in case of full cluster crashes and restarts.
Use cases: Ignite as an
|100% on Disk + In-Memory Cache||
100% of data is persisted to disk and the same or smaller amount is cached in memory. The more data is cached, the faster is the performance. The disk serves as the primary storage that survives any type of cluster failures and restarts.
Use cases: Ignite as a Memory-Centric Distributed Database - provides distributed database with SQL, key-value and collocated processing APIs.
The disk-centric systems, like RDBMS or NoSQL, generally utilize the classic client-server approach, where the data is brought from the server to the client side where it gets processed and then is usually discarded. This approach does not scale well as moving the data over the network is the most expensive operation in a distributed system.
A much more scalable approach is
collocated processing that reverses the flow by bringing the computations to the
servers where the data actually resides. This approach allows you to execute advanced logic or distributed SQL with JOINs
exactly where the data is stored avoiding expensive serialization and network trips.
Depending on the configuration, Ignite can either partition or replicate data across its memory-centric
REPLICATED mode, where data is fully replicated across all nodes
in the cluster, in
PARTITIONED mode Ignite will equally split the data across
multiple cluster nodes, allowing for storing TBs of data both in memory and on disk.
Ignite provides strong ACID durability guarantees to the data:
- Committed transactions will always survive any failures.
- The cluster can always be recovered to the latest successfully committed transaction.
- The cluster restarts are very fast.
Ignite also allows to configure multiple backup copies to guarantee data resiliency in case of failures.
Regardless of which replication scheme is used, Ignite guarantees data consistency across all cluster members.
Every time the data is updated in memory, the update will be appended to the tail of the write-ahead log (WAL). The purpose of the WAL is to propagate updates to disk in the fastest way possible and provide a consistent recovery mechanism that supports full cluster failures.
As WAL grows, it periodically gets checkpointed to the main storage. Checkpointing is the process of copying dirty pages from memory to the partition files on disk. A dirty page is a page that was updated in memory, was appended to WAL, but was not written to a respective partition file on disk yet.
To enable Ignite persistence, add the following configuration parameter to the cluster's node configuration:
Ignite native persistence is a distributed, ACID, and
|Partitioning & Replication||
Depending on the configuration, Ignite can either partition or replicate
Apache Ignite can be used as all-in-one distributed database that supports SQL, key-value, compute, machine learning and other data processing APIs:
Apache Ignite can be used as a distributed and horizontally scalable in-memory database (IMDB):
Ignite can act as a data grid that is a distributed, transactional key-value store. Unlike other in-memory data grids (IMDG), Ignite enables storing data both, in memory and on disk, and therefore is able to store more data than can fit in physical memory:
Ignite is used as a caching layer (aka. data grid) above 3rd party databases such as RDBMS, Apache Cassandra, MongoDB: