Handling Exceptions | Ignite Documentation
Edit

Handling Exceptions

This section outlines basic exceptions that can be generated by Ignite, and explains how to set up and use the critical failures handler.

Handling Ignite Exceptions

Exceptions supported by the Ignite API and actions you can take related to these exceptions are described below. Please see the Javadoc throws clause for checked exceptions.

Exception Description Action Runtime exception

CacheInvalidStateException

Thrown when you try to perform an operation on a cache in which some partitions have been lost. Depending on the partition loss policy configured for the cache, this exception is thrown either on read and/or write operations. See Partition Loss Policy for details.

Reset lost partitions. You may want to restore the data by returning the nodes that caused the partition loss to the cluster.

Yes

IgniteException

Indicates an error condition in the cluster.

Operation failed. Exit from the method.

Yes

IgniteClientDisconnectedException

Thrown by the Ignite API when a client node gets disconnected from cluster. Thrown from Cache operations, compute API, and data structures.

Wait and use retry logic.

Yes

IgniteAuthenticationException

Thrown when there is either a node authentication failure or security authentication failure.

Operation failed. Exit from the method.

No

IgniteClientException

Can be thrown from Cache operations.

Check exception message for the action to be taken.

Yes

IgniteDeploymentException

Thrown when the Ignite API fails to deploy a job or task on a node. Thrown from the Compute API.

Operation failed. Exit from the method.

Yes

IgniteInterruptedException

Used to wrap the standard InterruptedException into IgniteException.

Retry after clearing the interrupted flag.

Yes

IgniteSpiException

Thrown by various SPI (CollisionSpi, LoadBalancingSpi, TcpDiscoveryIpFinder, FailoverSpi, UriDeploymentSpi, etc.)

Operation failed. Exit from the method.

Yes

IgniteSQLException

Thrown when there is a SQL query processing error. This exception also provides query specific error codes.

Operation failed. Exit from the method.

Yes

IgniteAccessControlException

Thrown when there is an authentication / authorization failure.

Operation failed. Exit from the method.

No

IgniteCacheRestartingException

Thrown from Ignite cache API if a cache is restarting.

Wait and use retry logic.

Yes

IgniteFutureTimeoutException

Thrown when a future computation is timed out.

Either increase timeout limit or exit from the method.

Yes

IgniteFutureCancelledException

Thrown when a future computation cannot be retrieved because it was cancelled.

Use retry logic.

Yes

IgniteIllegalStateException

Indicates that the Ignite instance is in an invalid state for the requested operation.

Operation failed. Exit from the method.

Yes

IgniteNeedReconnectException

Indicates that a node should try to reconnect to the cluster.

Use retry logic.

No

IgniteDataIntegrityViolationException

Thrown if a data integrity violation is found.

Operation failed. Exit from the method.

Yes

IgniteOutOfMemoryException

Thrown when the system does not have enough memory to process Ignite operations. Thrown from Cache operations.

Operation failed. Exit from the method.

Yes

IgniteTxOptimisticCheckedException

Thrown when a transaction fails optimistically.

Use retry logic.

No

IgniteTxRollbackCheckedException

Thrown when a transaction has been automatically rolled back.

Use retry logic.

No

IgniteTxTimeoutCheckedException

Thrown when a transaction times out.

Use retry logic.

No

ClusterTopologyException

Indicates an error with the cluster topology (e.g. crashed node, etc.). Thrown from Compute and Events API

Wait on future and use retry logic.

Yes

Critical Failures Handling

Ignite is a robust and fault tolerant system. But in the real world, some unpredictable issues and problems arise that can affect the state of both an individual node as well as the whole cluster. Such issues can be detected at runtime and handled accordingly using a preconfigured critical failure handler.

Critical Failures

The following failures are treated as critical:

  • System critical errors (e.g. OutOfMemoryError).

  • Unintentional system worker termination (e.g. due to an unhandled exception).

  • System workers hanging.

  • Cluster nodes segmentation.

A system critical error is an error which leads to the system’s inoperability. For example:

  • File I/O errors - usually IOException is thrown by file read/write operations. It’s possible when Ignite native persistence is enabled (e.g., in cases when no space is left or on a device error), and also for in-memory mode because Ignite uses disk storage for keeping some metadata (e.g., in cases when the file descriptors limit is exceeded or file access is prohibited).

  • Out of memory error - when Ignite memory management system fails to allocate more space (IgniteOutOfMemoryException).

  • Out of memory error - when a cluster node runs out of Java heap (OutOfMemoryError).

Failures Handling

When Ignite detects a critical failure, it handles the failure according to a preconfigured failure handler. The failure handler can be configured as follows:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="failureHandler">
        <bean class="org.apache.ignite.failure.StopNodeFailureHandler"/>
    </property>
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setFailureHandler(new StopNodeFailureHandler());
Ignite ignite = Ignition.start(cfg);

Ignite support following failure handlers:

Class Description

NoOpFailureHandler

Ignores any failures. Useful for testing and debugging.

RestartProcessFailureHandler

A specific implementation that can be used only with ignite.sh|bat. The process must be terminated by using the Ignition.restart(true) method.

StopNodeFailureHandler

Stops the node in case of critical errors by calling the Ignition.stop(true) or Ignition.stop(nodeName, true) methods.

StopNodeOrHaltFailureHandler

This is the default handler, which tries to stop a node. If the node can’t be stopped, then the handler terminates the JVM process.

Critical Workers Health Check

Ignite has a number of internal workers that are essential for the cluster to function correctly. If one of them is terminated, the node can become inoperative.

The following system workers are considered mission critical:

  • Discovery worker - discovery events handling.

  • TCP communication worker - peer-to-peer communication between nodes.

  • Exchange worker - partition map exchange.

  • Workers of the system’s striped pool.

  • Data Streamer striped pool workers.

  • Timeout worker - timeouts handling.

  • Checkpoint thread - check-pointing in Ignite persistence.

  • WAL workers - write-ahead logging, segments archiving, and compression.

  • Expiration worker - TTL based expiration.

  • NIO workers - base networking.

Ignite has an internal mechanism for verifying that critical workers are operational. Each worker is regularly checked to confirm that it is alive and updating its heartbeat timestamp. If a worker is not alive and updating, the worker is regarded as blocked and Ignite will print a message to the log file. You can set the period of inactivity via the IgniteConfiguration.systemWorkerBlockedTimeout property.

Even though Ignite considers an unresponsive system worker to be a critical error, it doesn’t handle this situation automatically, other than printing out a message to the log file. If you want to enable a particular failure handler for unresponsive system workers of all the types, clear the ignoredFailureTypes property of the handler as shown below:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">

    <property name="systemWorkerBlockedTimeout" value="#{60 * 60 * 1000}"/>

    <property name="failureHandler">
        <bean class="org.apache.ignite.failure.StopNodeFailureHandler">

          <!-- Enable this handler to react to unresponsive critical workers occasions. -->
          <property name="ignoredFailureTypes">
            <list>
            </list>
          </property>

      </bean>

    </property>
</bean>
StopNodeFailureHandler failureHandler = new StopNodeFailureHandler();
failureHandler.setIgnoredFailureTypes(Collections.EMPTY_SET);

IgniteConfiguration cfg = new IgniteConfiguration().setFailureHandler(failureHandler);

Ignite ignite = Ignition.start(cfg);