October 20th, Q&A session: Get you issues solved and questions answered!

GitHub logo
Edit

Distributed Joins

A distributed join is a SQL statement with a join clause that combines two or more partitioned tables. If the tables are joined on the partitioning column (affinity key), the join is called a colocated join. Otherwise, it is called a non-colocated join.

Colocated joins are more efficient because they can be effectively distributed between the cluster nodes.

By default, Ignite treats each join query as if it is a colocated join and executes it accordingly (see the corresponding section below).

Warning
If your query is non-colocated, you have to enable the non-colocated mode of query execution by setting SqlFieldsQuery.setDistributedJoins(true); otherwise, the results of the query execution may be incorrect.
Caution

If you often join tables, we recommend that you partition your tables on the same column (on which you join the tables).

Non-colocated joins should be reserved for cases when it’s impossible to use colocated joins.

Colocated Joins

The following image illustrates the procedure of executing a colocated join. A colocated join (Q) is sent to all the nodes that store the data matching the query condition. Then the query is executed over the local data set on each node (E(Q)). The results (R) are aggregated on the node that initiated the query (the client node).

collocated joins

Non-colocated Joins

If you execute a query in a non-colocated mode, the SQL Engine executes the query locally on all the nodes that store the data matching the query condition. But because the data is not colocated, each node will request missing data (that is not present locally) from other nodes by sending either broadcast or unicast requests. This process is depicted on the image below.

non collocated joins

If the join is done on the primary or affinity key, the nodes send unicast requests because in this case the nodes know the location of the missing data. Otherwise, nodes send broadcast requests. For performance reasons, both broadcast and unicast requests are aggregated into batches.

Enable the non-colocated mode of query execution by setting a JDBC/ODBC parameter or, if you use SQL API, by calling SqlFieldsQuery.setDistributedJoins(true).

Warning
If you use a non-collocated join on a column from a replicated table, the column must have an index. Otherwise, you will get an exception.

Hash Joins

To boost performance of join queries, Ignite supports the hash join algorithm. Hash joins can be more efficient than nested loop joins for many scenarios, except when the probe side of the join is very small. However, hash joins can only be used with equi-joins, i.e. a type of join with equality comparison in the join-predicate.

To enforce the use of hash joins:

  1. Use the enforceJoinOrder option:

    SqlFieldsQuery query = new SqlFieldsQuery(
            "SELECT * FROM TABLE_A, TABLE_B USE INDEX(HASH_JOIN_IDX)"
                    + " WHERE TABLE_A.column1 = TABLE_B.column2").setEnforceJoinOrder(true);
    Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
    
    // Open the JDBC connection.
    Connection conn = DriverManager.getConnection("jdbc:ignite:thin://127.0.0.1?enforceJoinOrder=true");
    var query = new SqlFieldsQuery("SELECT * FROM TABLE_A, TABLE_B USE INDEX(HASH_JOIN_IDX) WHERE TABLE_A.column1 = TABLE_B.column2")
    {
        EnforceJoinOrder = true
    };
    SqlFieldsQuery query = SqlFieldsQuery("SELECT * FROM TABLE_A, TABLE_B USE INDEX(HASH_JOIN_IDX) WHERE TABLE_A.column1 = TABLE_B.column2");
    query.SetEnforceJoinOrder(true);
  2. Specify USE INDEX(HASH_JOIN_IDX) on the table for which you want to create the hash-join index:

    SELECT * FROM TABLE_A, TABLE_B USE INDEX(HASH_JOIN_IDX) WHERE TABLE_A.column1 = TABLE_B.column2