Schema-Driven Data Placement

Colocation Through Schema Annotations

The colocateBy Annotation

Table definitions specify colocation keys using the colocateBy annotation. All rows with the same colocation key values store on the same partition. The system calculates partition assignments from colocation key hashes. This deterministic placement enables partition-aware operations.

Common Colocation Patterns

Colocate orders with customers. Colocate line items with orders. Colocate related business entities. The pattern: parent entity key becomes child entity colocation key. This keeps hierarchies together for local joins.

Local Joins

Joins between colocated tables execute entirely on the node holding the data. No network traffic for join operations. Query execution happens in memory on a single node. This delivers join performance comparable to single-node databases at distributed scale.

Partition-Aware Routing

The Table API calculates partition ownership from keys. Operations route directly to nodes holding the data. Single-hop access eliminates coordinator overhead. This works for point lookups, batch operations, and colocated queries.

Distribution Zones

Replica Configuration

Distribution zones define replica counts for table groups. Tables in the same zone share replication settings. Specify replica counts from 1 to cluster size. Higher replica counts increase availability and read throughput at the cost of write amplification.

Node Filters

Distribution zones support node filters based on attributes. Restrict data to specific node subsets. Place hot data on high-memory nodes. Place archive data on cost-optimized nodes. This enables heterogeneous cluster configurations.

Data Rebalancing

The system rebalances data automatically when topology changes. Adding nodes triggers partition migration. Removing nodes redistributes data. Rebalancing maintains target replica counts and respects node filters throughout topology changes.

Zone Management

Create zones with SQL DDL or management APIs. Assign tables to zones during table creation. Modify zone settings without recreating tables. The system applies changes atomically across the cluster.

Performance Characteristics

Network Elimination

Colocated operations execute without network I/O. Joins process in memory on single nodes. Aggregations work on local partitions. This eliminates the network bottleneck that limits distributed query performance.

Cache Efficiency

Related data residing together improves CPU cache efficiency. Sequential scans benefit from memory locality. Index lookups access fewer cache lines. This memory-level optimization compounds with network elimination for maximum throughput.

Scalability

Colocation scales linearly with cluster size. Each node processes its partition data independently. Adding nodes increases total cluster capacity proportionally. No centralized coordination limits throughput.

Trade-offs

Colocation requires careful schema design. Choose colocation keys based on query patterns. Non-colocated joins require data shuffling. The system optimizes for colocated operations, accepting higher cost for cross-partition queries.

Use Cases

Multi-Tenant Applications

Colocate all tenant data by tenant ID. Tenant queries execute locally without cross-node traffic. Achieve single-tenant performance in multi-tenant systems. Scale tenants horizontally by adding nodes.

View Use Case

E-Commerce Systems

Colocate orders, line items, and shipping records by order ID. Order processing queries execute locally. Inventory checks, pricing calculations, and order totals compute in memory. No network overhead for transaction processing.

View Use Case

Time-Series Analytics

Colocate metrics by device ID or customer ID. Time-range queries execute locally per device. Aggregations compute on single nodes. This pattern works for IoT telemetry, financial tick data, and application metrics.

View Use Case

How Data Placement Connects to the Foundation

Memory-First Operations

Colocated operations execute against in-memory data. Local joins access memory without disk I/O. This memory-first approach combined with colocation delivers the performance needed for complex queries at scale.

Learn About Storage

Distributed Replication for Availability

Distribution zones use Raft-based replication. Each partition replicates across configured replica count. Data remains available during node failures. Colocation works transparently with replication.

Learn About Distributed Replication

SQL and Table API Integration

Colocation works identically for SQL and Table API. Schema annotations drive placement for both access patterns. This unified approach simplifies application development.

Learn About Access Patterns

Ready to Start?

Discover our quick start guide and build your first application in 5-10 minutes

Quick Start Guide

Read Documentation

Learn about colocation keys, distribution zones, and data placement strategies

Colocation Documentation