Change Data Capture
Overview
Change Data Capture (CDC) is a data processing pattern used to asynchronously receive entries that have been changed on the local node so that action can be taken using the changed entry.
Warning
|
CDC is an experimental feature. API or design architecture might be changed. |
Below are some CDC use cases:
-
Streaming changes in Warehouse;
-
Updating search index;
-
Calculating statistics (streaming queries);
-
Auditing logs;
-
Async interaction with extenal system: Moderation, business process invocation, etc.
Ignite implements CDC with the ignite-cdc.sh
application and Java API.
Below are the CDC application and the Ignite node integrated via WAL archive segments:
When CDC is enabled, the Ignite server node creates a hard link to each WAL archive segment in the special db/cdc/{consistency_id}
directory.
The ignite-cdc.sh
application runs on a different JVM and processes newly archived WAL segments.
When the segment is fully processed by ignite-cdc.sh
, it is removed. The actual disk space is free when both links (archive and CDC) are removed.
State of consumption is a pointer to the last processed event.
Consumer can tell to ignite-cdc.sh
to save the consumption state.
On startup event processing will be continued from the last saved state.
Configuration
Ignite Node
Name | Description | Default value |
---|---|---|
|
Flag to enable CDC on the server node. |
|
|
Path to the CDC directory |
|
|
Timeout to forcefully archive the WAL segment even it is not complete. |
|
CDC Application
CDC is configured in the same way as the Ignite node - via the spring XML file:
-
ignite-cdc.sh
requires both Ignite and CDC configurations to start; -
IgniteConfiguration
is used to determine common options like a path to the CDC directory, node consistent id, and other parameters; -
CdcConfiguration
containsignite-cdc.sh
-specific options.
Name | Description | Default value |
---|---|---|
|
Timeout to wait for lock acquiring. CDC locks directory on a startup to ensure there is no concurrent |
1000 milliseconds. |
|
Amount of time application sleeps between subsequent checks when no new files available. |
1000 milliseconds. |
|
Flag to specify if key and value of changed entries should be provided in binary format. |
|
|
Implementation of |
null |
|
Array of SPI’s to export CDC metrics. See metrics documentation, also. |
null |
API
org.apache.ignite.cdc.CdcEvent
Below is a single change of the data reflected by CdcEvent
.
Name | Description |
---|---|
|
Key for the changed entry. |
|
Value for the changed entry. This method will return |
|
ID of the cache where the change happens. The value is equal to the |
|
Partition of the changed entry. |
|
Flag to distinguish if operation happens on the primary or a backup node. |
|
|
org.apache.ignite.cdc.CdcConsumer
The consumer of change events. It should be implemented by the user.
Name | Description |
---|---|
|
Invoked one-time at the start of the CDC application. |
|
The main method that processes changes. When this method returns |
|
Invokes one-time at the stop of the CDC application. |
Metrics
ignite-cdc.sh
uses the same SPI to export metrics as Ignite does.
The following metrics are provided by the application (additional metrics can be provided by the consumer):
Name |
Description |
CurrentSegmentIndex |
Index of the currently processing WAL segment. |
CommittedSegmentIndex |
Index of the WAL segment that contains the last committed state. |
CommittedSegmentOffset |
Committed offset in bytes inside the WAL segment. |
LastSegmentConsumptionTime |
Timestamp (in milliseconds) indicating the last segment processing start. |
BinaryMetaDir |
Binary meta-directory the application reads data from. |
MarshallerDir |
Marshaller directory the application reads data from. |
CdcDir |
The CDC directory the application reads data from. |
Logging
ignite-cdc.sh
uses the same logging configuration as the Ignite node does. The only difference is that the log is written in the"ignite-cdc.log" file.
Lifecycle
Important
|
ignite-cdc.sh implements the fail-fast approach. It just fails in case of any error. The restart procedure should be configured with the OS tools.
|
-
Find the required shared directories. Take the values from the provided
IgniteConfiguration
. -
Lock the CDC directory.
-
Load the saved state.
-
Start the consumer.
-
Infinitely wait for the newly available segment and process it.
-
Stop the consumer in case of a failure or a received stop signal.
cdc-ext
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.