Processing Where Data Lives

Colocated Compute

Data Locality

The Compute API schedules jobs on nodes holding relevant data partitions. No data movement across the network. Jobs read and write local memory directly. This eliminates the network bottleneck that limits traditional distributed processing.

Key-Based Routing

Submit jobs with specific keys. The system routes jobs to nodes holding those keys. Works with colocation to ensure jobs and data reside together. This single-hop execution delivers minimal latency for targeted operations.

Partition-Wide Operations

Execute jobs across entire partitions. The job receives all rows in the partition as input. Process partition data sequentially or build in-memory indexes. This enables operations that require full partition visibility.

Broadcast Execution

Broadcast jobs to all nodes for cluster-wide operations. Each node processes its local partitions independently. Results aggregate at the coordinator. This pattern works for parallel aggregations and distributed transformations.

Compute Job Patterns

Stateless Jobs

Submit jobs that read data, perform calculations, and return results. No state persists between invocations. Jobs implement simple Java methods. The system handles serialization, routing, and result collection automatically.

MapReduce Operations

Implement map-reduce patterns with compute jobs. Map phase executes on data-holding nodes. Reduce phase aggregates results. The framework handles distribution and coordination. This provides map-reduce semantics without separate systems.

Async Execution

Compute API returns CompletableFuture for non-blocking operations. Submit multiple jobs in parallel. Compose operations with async combinators. This enables high-concurrency compute workloads without thread exhaustion.

Error Handling

Jobs execute within try-catch blocks. Exceptions propagate to caller as CompletionException. The system handles node failures transparently. Failed jobs retry on other nodes holding the same data partitions.

Integration with Data Layer

Table API Access

Compute jobs access tables through RecordView and KeyValueView. Same partition-aware semantics as client access. Local reads avoid network overhead. This provides consistent programming model across client and compute layers.

SQL Execution

Compute jobs can execute SQL queries on local partitions. Filter and aggregate local data with SQL. Combine procedural logic with declarative queries. This enables complex business logic at the data layer.

Transaction Support

Compute jobs execute within transactions. Begin transactions in compute code. Read and write data transactionally. Commit or rollback based on business logic. This ensures consistency for complex multi-step operations.

Memory-First Performance

Compute jobs operate on memory-resident data. No disk I/O during execution. MVCC provides snapshot isolation for read operations. This delivers the performance needed for real-time compute workloads.

Use Cases

Real-Time Aggregations

Execute aggregation jobs on data-holding nodes. Process millions of rows in memory. Return aggregated results without data movement. Scale horizontally by adding nodes. Each node processes its partitions independently.

View Use Case

Complex Business Logic

Implement multi-step business rules in compute jobs. Access related data locally through colocation. Execute validation, transformation, and enrichment. Combine procedural and declarative logic at the data layer.

View Use Case

Stream Processing

Process event streams with colocated compute. Execute windowing and aggregation logic where data lands. Update derived tables and materialized views. Maintain complex state in memory for stateful stream processing.

View Use Case

How Compute Connects to the Foundation

Colocation Enables Local Processing

Compute jobs execute on nodes holding colocated data. Schema-driven placement ensures data and compute reside together. This eliminates network overhead for complex operations.

Learn About Data Placement

Memory-First Execution

Compute jobs access data directly from memory. No disk I/O during execution. This memory-first approach delivers the performance needed for real-time compute workloads at scale.

Learn About Storage

Transactional Compute

Compute jobs execute within ACID transactions. MVCC provides snapshot isolation for read operations. This ensures consistency for complex multi-step operations executing at the data layer.

Learn About Transactions

Ready to Start?

Discover our quick start guide and build your first application in 5-10 minutes

Quick Start Guide

Read Documentation

Learn about compute job submission, execution patterns, and colocated processing

Compute Documentation