Backpressured Streaming - Apache Ignite

Reactive Streaming with Backpressure

Publisher Interface

DataStreamer implements reactive Publisher patterns. Producers publish data items. The system requests items based on cluster capacity. This pull-based approach prevents overwhelming the cluster with data faster than it can process.

Automatic Rate Coordination

The streaming system signals producers when ready for more data. Producers slow down when cluster capacity decreases. Producers speed up when capacity increases. This dynamic coordination happens automatically without manual tuning.

Buffer Management

DataStreamer maintains internal buffers sized based on cluster capacity. Buffers absorb temporary rate mismatches. The system applies backpressure before buffers overflow. This prevents out-of-memory conditions during ingestion spikes.

Flow Control

The Publisher interface provides natural flow control. Applications receive signals when the cluster needs more data. Applications pause data generation when backpressure applies. This coordination works across network boundaries.

High-Throughput Ingestion

Batching and Buffering

DataStreamer groups individual items into batches automatically. Batch sizes adapt to network conditions and cluster load. Larger batches reduce network overhead. Smaller batches reduce latency. The system balances throughput and latency dynamically.

Partition-Aware Distribution

The streamer routes data items to partition owners directly. Single-hop writes avoid coordinator overhead. Items for the same partition group together in batches. This optimization delivers maximum throughput for partitioned data.

Parallel Processing

DataStreamer processes multiple batches in parallel across cluster nodes. Each node processes its partition data independently. This parallelism scales linearly with cluster size. Adding nodes increases total ingestion throughput proportionally.

Memory-First Writes

Streamed data writes directly to memory. No disk I/O during ingestion. Replication handles durability through distributed consensus. This memory-first approach delivers the throughput needed for high-velocity streams.

Integration with Transactions and MVCC

Transactional Streaming

DataStreamer supports transactional writes. Batches commit atomically. Failures trigger automatic rollback. This ensures consistency for streamed data. Applications choose between throughput-optimized non-transactional mode or consistency-optimized transactional mode.

MVCC Compatibility

Streaming writes create new MVCC versions. Concurrent queries see consistent snapshots. Long-running aggregations don't block streaming ingestion. Readers never block writers. This enables mixed streaming and analytical workloads.

Upsert Semantics

DataStreamer supports upsert operations. Inserts new records. Updates existing records. Applications specify keys for conflict resolution. This handles duplicate events in streaming scenarios without application-level deduplication logic.

Ordered Processing

The system preserves ordering within partitions. Events for the same key process in order. Events across partitions process in parallel. This ordering guarantee simplifies event stream processing logic.

Use Cases

Event Stream Processing

Ingest event streams at high velocity. Process events with transactional guarantees. Update multiple aggregations atomically. Backpressure prevents data loss during spikes. MVCC enables concurrent analytics on streaming data.

View Use Case

IoT Data Ingestion

Stream sensor data from millions of devices. Partition-aware routing delivers maximum throughput. Memory-first writes provide minimal latency. Backpressure protects cluster during device bursts. Upsert semantics handle sensor state updates.

View Use Case

Real-Time Aggregations

Stream events into base tables. Update materialized aggregations on write. Transactional streaming ensures consistent aggregates. Queries run against current aggregated state. No batch processing delays.

View Use Case

How Streaming Connects to the Foundation

Memory-First Ingestion

DataStreamer writes directly to memory without disk I/O. Distributed replication provides durability. This memory-first approach delivers the throughput needed for high-velocity event streams.

Learn About Storage

Transactional Guarantees

Streaming writes support full ACID transactions. Batches commit atomically. MVCC enables concurrent queries during ingestion. This provides consistency without sacrificing streaming throughput.

Learn About Transactions

Partition-Aware Routing

DataStreamer routes items to partition owners directly. Single-hop writes eliminate coordinator overhead. Items group by partition in batches. This optimization scales linearly with cluster size.

Learn About Colocation

Ready to Start?

Discover our quick start guide and build your first application in 5-10 minutes

Quick Start Guide

Read Documentation

Learn about DataStreamer configuration, batching strategies, and reactive patterns

Streaming Documentation