Discover our quick start guide and build your first application in 5-10 minutes
Quick Start GuideApache Ignite uses schema-driven colocation to keep related data together. The colocateBy annotation in table definitions specifies which columns control data placement. Related rows store on the same node. This transforms cross-node queries into local memory operations.
Table definitions specify colocation keys using the colocateBy annotation. All rows with the same colocation key values store on the same partition. The system calculates partition assignments from colocation key hashes. This deterministic placement enables partition-aware operations.
Colocate orders with customers. Colocate line items with orders. Colocate related business entities. The pattern: parent entity key becomes child entity colocation key. This keeps hierarchies together for local joins.
Joins between colocated tables execute entirely on the node holding the data. No network traffic for join operations. Query execution happens in memory on a single node. This delivers join performance comparable to single-node databases at distributed scale.
The Table API calculates partition ownership from keys. Operations route directly to nodes holding the data. Single-hop access eliminates coordinator overhead. This works for point lookups, batch operations, and colocated queries.
Distribution zones define replica counts for table groups. Tables in the same zone share replication settings. Specify replica counts from 1 to cluster size. Higher replica counts increase availability and read throughput at the cost of write amplification.
Distribution zones support node filters based on attributes. Restrict data to specific node subsets. Place hot data on high-memory nodes. Place archive data on cost-optimized nodes. This enables heterogeneous cluster configurations.
The system rebalances data automatically when topology changes. Adding nodes triggers partition migration. Removing nodes redistributes data. Rebalancing maintains target replica counts and respects node filters throughout topology changes.
Create zones with SQL DDL or management APIs. Assign tables to zones during table creation. Modify zone settings without recreating tables. The system applies changes atomically across the cluster.
Colocated operations execute without network I/O. Joins process in memory on single nodes. Aggregations work on local partitions. This eliminates the network bottleneck that limits distributed query performance.
Related data residing together improves CPU cache efficiency. Sequential scans benefit from memory locality. Index lookups access fewer cache lines. This memory-level optimization compounds with network elimination for maximum throughput.
Colocation scales linearly with cluster size. Each node processes its partition data independently. Adding nodes increases total cluster capacity proportionally. No centralized coordination limits throughput.
Colocation requires careful schema design. Choose colocation keys based on query patterns. Non-colocated joins require data shuffling. The system optimizes for colocated operations, accepting higher cost for cross-partition queries.
Discover our quick start guide and build your first application in 5-10 minutes
Quick Start GuideLearn about colocation keys, distribution zones, and data placement strategies
Colocation Documentation