Interface IgniteDataStreamer<K,​V>

  • All Superinterfaces:
    AutoCloseable

    public interface IgniteDataStreamer<K,​V>
    extends AutoCloseable
    Data streamer is responsible for streaming external data into cache. It achieves it by properly buffering updates and properly mapping keys to nodes responsible for the data to make sure that there is the least amount of data movement possible and optimal network and memory utilization.

    Note that data streamer data manipulation methods do not support transactions. When updating data with allowOverwrite() set to false new entry is created on primary and backups if it has not existed. If allowOverwrite() is true then batches are applied with regular cache.put(..) methods starting implicit transactions if streamer is targeted to a transactional cache.

    However, explicit transactional updates inside are possible with custom StreamReceiver. This way batches can be applied within transaction(s) on target node. See receiver(StreamReceiver) for details.

    Data streamer doesn’t guarantee:

    • Data order. Data records may be loaded into a cache in a different order compared to putting into the streamer;
    • Immediate data loading. Data can be kept for a while before loading;
    • By default, data consistency until successfully finished;
    • By default, working with external storages.

    If allowOverwrite() setting is false (default), consider:

    • You should not have the same keys repeating in the data being streamed;
    • Streamer cancelation or streamer node failure can cause data inconsistency;
    • If loading into a persistent cache, concurrently created snapshot may contain inconsistent data and might not be restored entirely.
    Most important behaviour of data streamer is defined by StreamReceiver and allowOverwrite() property.

    Also note that IgniteDataStreamer is not the only way to add data into cache. Alternatively you can use IgniteCache.loadCache(IgniteBiPredicate, Object...) method to add data from underlying data store. You can also use standard cache put(...) and putAll(...) operations as well, but they most likely will not perform as well as this class for adding data. And finally, data can be added from underlying data store on demand, whenever it is accessed - for this no explicit data adding step is needed.

    IgniteDataStreamer supports the following configuration properties:

    • perNodeBufferSize(int) - when entries are added to data streamer via addData(Object, Object) method, they are not sent to in-memory data grid right away and are buffered internally for better performance and network utilization. This setting controls the size of internal per-node buffer before buffered data is sent to remote node. Default is defined by DFLT_PER_NODE_BUFFER_SIZE value.
    • perNodeParallelOperations(int) - sometimes data may be added to the data streamer via addData(Object, Object) method faster than it can be put in cache. In this case, new buffered stream messages are sent to remote nodes before responses from previous ones are received. This could cause unlimited heap memory utilization growth on local and remote nodes. To control memory utilization, this setting limits maximum allowed number of parallel buffered stream messages that are being processed on remote nodes. If this number is exceeded, then addData(Object, Object) method will block to control memory utilization. Default is equal to CPU count on remote node multiply by DFLT_PARALLEL_OPS_MULTIPLIER.
    • autoFlushFrequency(long) - automatic flush frequency in milliseconds. Essentially, this is the time after which the streamer will make an attempt to submit all data added so far to remote nodes. Note that there is no guarantee that data will be delivered after this concrete attempt (e.g., it can fail when topology is changing), but it won't be lost anyway. Disabled by default (default value is 0).
    • allowOverwrite(boolean) - Sets flag enabling overwriting existing values in cache. Data streamer will perform better if this flag is disabled, which is the default setting.
    • receiver(StreamReceiver) - defines how cache will be updated with added entries. It allows to provide user-defined custom logic to update the cache in the most effective and flexible way.
    • deployClass(Class) - optional deploy class for peer deployment. All classes streamed by a data streamer must be class-loadable from the same class-loader. Ignite will make the best effort to detect the most suitable class-loader for data loading. However, in complex cases, where compound or deeply nested class-loaders are used, it is best to specify a deploy class which can be any class loaded by the class-loader for given data.
    • Method Detail

      • cacheName

        String cacheName()
        Name of cache to stream data to.
        Returns:
        Cache name or null for default cache.
      • allowOverwrite

        boolean allowOverwrite()
        Gets flag enabling overwriting existing values in cache. Data streamer will perform better if this flag is disabled.

        This flag is disabled by default (default is false).

        Returns:
        True if overwriting is allowed or if receiver is changed by receiver(StreamReceiver). False otherwise.
      • allowOverwrite

        void allowOverwrite​(boolean allowOverwrite)
                     throws javax.cache.CacheException
        Sets flag enabling overwriting existing values in cache. Data streamer will perform better if this flag is disabled. Note that when this flag is false, updates will not be propagated to the cache store (i.e. skipStore() flag will be set to true implicitly).

        This flag is disabled by default (default is false).

        The flag has no effect when custom cache receiver set using receiver(StreamReceiver) method.

        Parameters:
        allowOverwrite - Flag value.
        Throws:
        javax.cache.CacheException - If failed.
      • skipStore

        boolean skipStore()
        Gets flag indicating that write-through behavior should be disabled for data streaming. Default is false.
        Returns:
        Skip store flag.
      • skipStore

        void skipStore​(boolean skipStore)
        Sets flag indicating that write-through behavior should be disabled for data streaming. Default is false.
        Parameters:
        skipStore - Skip store flag.
      • keepBinary

        boolean keepBinary()
        Gets flag indicating that objects should be kept in binary format when passed to the stream receiver. Default is false.
        Returns:
        Skip store flag.
      • keepBinary

        void keepBinary​(boolean keepBinary)
        Sets flag indicating that objects should be kept in binary format when passes to the steam receiver. Default is false.
        Parameters:
        keepBinary - Keep binary flag.
      • perNodeBufferSize

        int perNodeBufferSize()
        Gets size of per node key-value pairs buffer.
        Returns:
        Per node buffer size.
      • perNodeBufferSize

        void perNodeBufferSize​(int bufSize)
        Sets size of per node key-value pairs buffer.

        This method should be called prior to addData(Object, Object) call.

        If not provided, default value is DFLT_PER_NODE_BUFFER_SIZE.

        Parameters:
        bufSize - Per node buffer size.
      • perNodeParallelOperations

        int perNodeParallelOperations()
        Gets maximum number of parallel stream operations for a single node.
        Returns:
        Maximum number of parallel stream operations for a single node.
      • perThreadBufferSize

        void perThreadBufferSize​(int size)
        Allows to set buffer size for thread in case of stream by addData(Object, Object) call.
        Parameters:
        size - Size of buffer.
      • perThreadBufferSize

        int perThreadBufferSize()
        Gets buffer size set by perThreadBufferSize(int).
        Returns:
        Buffer size.
      • timeout

        void timeout​(long timeout)
        Sets the timeout that is used in the following cases:
        • any data addition method can be blocked when all per node parallel operations are exhausted. The timeout defines the max time you will be blocked waiting for a permit to add a chunk of data into the streamer;
        • Total timeout time for flush() operation;
        • Total timeout time for close() operation.
        By default the timeout is disabled.
        Parameters:
        timeout - Timeout in milliseconds.
        Throws:
        IllegalArgumentException - If timeout is zero or less than -1.
      • timeout

        long timeout()
        Gets timeout set by timeout(long).
        Returns:
        Timeout in milliseconds.
      • autoFlushFrequency

        long autoFlushFrequency()
        Gets automatic flush frequency. Essentially, this is the time after which the streamer will make an attempt to submit all data added so far to remote nodes. Note that there is no guarantee that data will be delivered after this concrete attempt (e.g., it can fail when topology is changing), but it won't be lost anyway.

        If set to 0, automatic flush is disabled.

        Automatic flush is disabled by default (default value is 0).

        Returns:
        Flush frequency or 0 if automatic flush is disabled.
        See Also:
        flush()
      • autoFlushFrequency

        void autoFlushFrequency​(long autoFlushFreq)
        Sets automatic flush frequency. Essentially, this is the time after which the streamer will make an attempt to submit all data added so far to remote nodes. Note that there is no guarantee that data will be delivered after this concrete attempt (e.g., it can fail when topology is changing), but it won't be lost anyway.

        If set to 0, automatic flush is disabled.

        Automatic flush is disabled by default (default value is 0).

        Parameters:
        autoFlushFreq - Flush frequency or 0 to disable automatic flush.
        See Also:
        flush()
      • future

        IgniteFuture<?> future()
        Gets future for this streaming process. This future completes whenever method close(boolean) completes. By attaching listeners to this future it is possible to get asynchronous notifications for completion of this streaming process.
        Returns:
        Future for this streaming process.
      • deployClass

        void deployClass​(Class<?> depCls)
        Optional deploy class for peer deployment. All classes added by a data streamer must be class-loadable from the same class-loader. Ignite will make the best effort to detect the most suitable class-loader for data loading. However, in complex cases, where compound or deeply nested class-loaders are used, it is best to specify a deploy class which can be any class loaded by the class-loader for given data.
        Parameters:
        depCls - Any class loaded by the class-loader for given data.