How to Improve Queries With EXPLAIN Command
The SQL EXPLAIN command is a powerful tool used to analyze and understand the execution plan of a query without actually executing it.
When you use the EXPLAIN command, it returns the query execution plan, which includes details such as:
-
The order in which tables are accessed.
-
The type of join operations used (for example, nested loops, hash joins, or merge joins).
-
Any indexes that are used to speed up data retrieval.
-
Estimated costs and row counts for different parts of the query.
This information is crucial for optimizing query performance, identifying bottlenecks, and making informed decisions about database schema design and indexing strategies.
EXPLAIN Command Syntax
Apache Ignite supports two variations of the EXPLAIN command:
EXPLAIN [PLAN | MAPPING] FOR <query>
If neither PLAN nor MAPPING is specified, then PLAN is implicit.
Parameters:
-
PLAN- explains query in terms of relational operators tree. This representation is suitable for investigation of performance issues related to the optimizer. -
MAPPING- explains query in terms of mapping of query fragment to a particular node of the cluster. This representation is suitable for investigation of performance issues related to the data colocation.
Examples:
EXPLAIN SELECT * FROM lineitem;
EXPLAIN PLAN FOR SELECT * FROM lineitem;
EXPLAIN MAPPING FOR SELECT * FROM lineitem;
Understanding The Output
Each query plan is represented as a tree-like structure composed of relational operators.
A node in the plan includes:
-
A name, indicating the relational operator (e.g.,
TableScan,IndexScan,Sort,Jointypes) -
A set of attributes, relevant to that specific operator
OperatorName
attribute1: value1
attribute2: value2
Operator Naming
The operator name reflects the specific algorithm or strategy used. For example:
-
TableScan– Full scan of a base table. -
IndexScan– Access via index, possibly sorted. -
Sort– Explicit sorting step. -
HashJoin,MergeJoin,NestedLoopJoin– Types of join algorithms. -
Limit,Project,Exchange– Execution-related transformations and controls.
Hierarchical Plan Structure
The plan is structured as a tree, where:
-
Leaf nodes represent data sources (e.g.,
TableScan) -
Internal nodes represent data transformations (e.g.,
Join,Sort) -
The root node (topmost) is the final operator that produces the result
Common Query Optimization Issues
SQL EXPLAIN output analysis can help you optimize slow query execution. You can avoid common bottlenecks in SQL execution by following these guidelines:
-
Avoid scanning an entire table.
-
Avoid scanning non-optimal indexes.
-
Avoid suboptimal join ordering or join algorithm.
-
Ensure optimal data colocation for your queries.
In the following sections, we will see some common issues with queries and ways to identify and fix them.
Full Scan Instead of Index Scan
Suppose related sql execution flow looks like:
CREATE TABLE t (id INT PRIMARY KEY, col1 VARCHAR);
CREATE INDEX t_col1_idx ON t(col1);
SELECT id FROM t WHERE col1 = '1';
And possible EXPLAIN output:
TableScan
table: PUBLIC.T
predicate: =(COL1, _UTF-8'1')
fieldNames: [ID]
est: (rows=1)
|
Note
|
For simplicity, here and below, information that is not related to the example is omitted from the EXPLAIN output. |
We can see a full scan (TableScan operator) with predicate.
The execution planner chooses which scan implementation (TableScan or IndexScan) to use.
If you expect that index scan is preferable, you can use the FORCE_INDEX hint to manually force IndexScan approach:
SELECT /*+ FORCE_INDEX(t_col1_idx) */ id FROM t WHERE col1 = '1';
Will show a different plan, like:
IndexScan
table: PUBLIC.T
index: T_COL1_IDX
type: SORTED
predicate: =(COL1, _UTF-8'1')
searchBounds: [ExactBounds [bound=_UTF-8'1']]
ieldNames: [ID]
collation: []
est: (rows=1)
Suboptimal Indexes
Indexes with less prediction can be chosen, for example schema and query may look as follows:
CREATE TABLE t (id INT PRIMARY KEY, col1 VARCHAR, col2 VARCHAR);
CREATE INDEX t_col1_col2_idx ON T(col1, col2);
CREATE INDEX t_col1_idx ON t(col1);
SELECT id FROM t WHERE col1 = '1' AND col2 = '2';
and a possible plan would be:
IndexScan
table: PUBLIC.T
index: T_COL1_IDX
...
We can see that the execution uses the T_COL1_IDX index, through both predicates COL1 = '1' AND COL2 = '2' are involved and the T_COL1_COL2_IDX is preferable. In this case, the optimal plan would be:
IndexScan
table: PUBLIC.T
index: T_COL1_COL2_IDX
...
You can also use the FORCE_INDEX hint to achieve this:
SELECT /*+ FORCE_INDEX(t_col1_col2_idx) */ id FROM t WHERE col1 = '1' AND col2 = '2';
Unexpected Sort Operation
By default, sorted indexes store their entries in ascending order. You can adjust the ordering of a sorted index by including the options ASC or DESC.
Let’s suppose the schema and related query look like this:
CREATE TABLE t1 (id INT PRIMARY KEY, col1 VARCHAR);
CREATE TABLE t2 (id INT PRIMARY KEY, col1 VARCHAR);
CREATE INDEX t1_col1_desc_idx ON t1(col1 DESC);
CREATE INDEX t2_col1_idx ON t2(col1);
SELECT t1.id as t1id, t2.id as t2id FROM t1 JOIN t2 USING (col1);
And the possible execution plan looks like this:
MergeJoin
...
Sort
collation: [COL1 ASC]
...
IndexScan
index: T2_COL1_IDX
...
In the example above, the planner adds the Sort operation before performing the IndexScan, as the index used is sorted in descending order, while ascending order is required.
Extra Sort operations adds performance costs, and we can avoid it by creating an index with the appropriate sort ordering:
CREATE INDEX t1_col1_idx ON t1(col1);
And plan will no longer display the Sort operation, improving query execution speed:
MergeJoin
...
IndexScan
index: T1_COL1_IDX
...
IndexScan
index: T2_COL1_IDX
...
Performance Impact of Correlated Subqueries
The SQL-99 standard allows for nested subqueries at nearly all places within a query, so Ignite 3 supports nested subqueries, both correlated and not. Performance of certain complex correlated subqueries may be insufficient. Let`s consider a correlated query:
CREATE TABLE emp(dept_id INTEGER PRIMARY KEY, name VARCHAR, salary INTEGER);
CREATE TABLE dept(id INTEGER PRIMARY KEY, name VARCHAR);
SELECT emp.name, (SELECT dept.name FROM dept WHERE emp.dept_id=dept.id)
FROM emp
WHERE emp.salary > 1000;
We can see nested correlated subquery here, lets check the plan:
CorrelatedNestedLoopJoin
...
TableScan
table: PUBLIC.EMP
predicate: >(SALARY, 1000)
...
ColocatedHashAggregate
...
TableScan
table: PUBLIC.DEPT
predicate: =($cor1.DEPT_ID, ID)
...
The example above shows the slow CorrelatedNestedLoopJoin operation. Queries with this operation may cause a number of issues:
-
Such subqueries may become bottlenecks.
-
Queries can cause high CPU load.
-
Certain queries may perform slower than expected.
If performance issues are found in similar queries, it would be more efficient to rewrite the query without nested subqueries, for example:
SELECT emp.name, dept.name
FROM emp, dept
WHERE emp.salary > 1000 AND emp.dept_id=dept.id;
And new plan becomes:
HashJoin
predicate: =(DEPT_ID, ID)
...
TableScan
table: PUBLIC.EMP
predicate: >(SALARY, 1000)
...
TableScan
table: PUBLIC.DEPT
...
Without the CorrelatedNestedLoopJoin operation, the query should perform much better than the previous one.
Excessive Sorting
Lets explain we have an index involved two columns one of them is participate in predicate and other in ordering, or in sql terms:
CREATE TABLE emp(dept_id INTEGER PRIMARY KEY, name VARCHAR, salary INTEGER);
CREATE INDEX emp_salary_name_idx ON emp(salary, name);
SELECT dept_id FROM emp WHERE salary = 1 ORDER BY name;
Expectations:
-
Index need to be used here.
-
No additional sort is needed because index is ordered by name column is satisfies initial query ordering.
But the real execution plan shows a different result:
Sort
collation: [NAME ASC]
...
TableScan
table: PUBLIC.EMP
predicate: =(SALARY, 1)
...
We can see a redundant Sort operator. A bit query refactoring can help to avoid excessive sorting:
SELECT dept_id FROM emp WHERE salary = 1 ORDER BY salary, name;
And the plan becomes as follows:
IndexScan
table: PUBLIC.EMP
index: EMP_SALARY_NAME_IDX
predicate: =(SALARY, 1)
...
Select Count Optimization
Some queries can be optimized to use more optimal plans which brings performance speed up. For example, plan for:
SELECT COUNT(*) FROM emp;
Can look like this:
SelectCount
table: PUBLIC.EMP
est: (rows=43)
...
But there are numerous cases where such optimization is not applicable. In such a cases, a plan can be different and the execution may require more time.
The same query as above, but with explicit transaction may produce a different plan, for example:
ReduceSortAggregate
...
MapSortAggregate
...
TableScan
table: PUBLIC.EMP
est: (rows=43)
...
Index Scan Without Exact Search Bounds
Table scans are available in two implementations: direct table scan and scan through index. Index scans contain predicate and search bounds. Predicate provides final rows comparison. If search bounds are absentm the query degenerates into table scan through index scan (requiring an additional store look up), with further predicate comparison, that incurs additional performance overhead costs.
Let`s suppose we have schema and query like this:
CREATE TABLE t (id INTEGER PRIMARY KEY, col1 DECIMAL(5, 3));
CREATE INDEX t_col1_idx ON t(col1);
SELECT id FROM t WHERE col1 = 43;
And possible plan would look like this:
IndexScan
table: PUBLIC.T
index: T_COL1_IDX
predicate: =(CAST(COL1):DECIMAL(13, 3), 43.000)
...
We can see here - only predicate (and no searchBounds) which means that all rows from index will go through predicate and bring additional performance penalty.
Two type of solutions are possible here:
-
You can prohibit suboptimal index usage;
-
You can explicitly help the planner with type derivation.
Prohibit Index Usage
For the first approach, use the NO_INDEX hint to prohibit index usage:
SELECT /*+ NO_INDEX */ id FROM t WHERE col1 = 43;
-- or with direct index mention:
SELECT /*+ NO_INDEX(t_col1_idx) */ id FROM t WHERE col1 = 43;
As a result, you will have a plan similar to this:
TableScan
table: PUBLIC.T
predicate: =(CAST(COL1):DECIMAL(13, 3), 43.000)
...
Manual Type Casting
You can append additional cast to the same query to explicitly cast data as a specific type:
SELECT id FROM t WHERE col1 = 43::DECIMAL(5, 3);
IndexScan
table: PUBLIC.T
index: T_COL1_IDX
predicate: =(COL1, 43.000)
searchBounds: [ExactBounds [bound=43.000:DECIMAL(5, 3)]]
...
We can see here both searchBounds and predicate which means that only exact lookup through index will be involved.
The same case as above but for a bit complicated query:
CREATE TABLE t (id INT PRIMARY KEY, col1 INT);
CREATE INDEX t_col1_asc_idx ON t (col1);
SELECT * FROM t WHERE col1::varchar = SUBSTR(CURRENT_DATE::varchar, 4);
Possible plan:
IndexScan
table: PUBLIC.T
index: T_COL1_IDX
predicate: =(CAST(COL1):VARCHAR CHARACTER SET "UTF-8", SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4))
...
And we also can see that no search bounds are involved here.
Try to change it like:
SELECT * FROM t WHERE col1 = SUBSTR(CURRENT_DATE::varchar, 4)::int;
And the possible plan will become:
IndexScan
table: PUBLIC.T
index: T_COL1_ASC_IDX
predicate: =(COL1, CAST(SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4)):INTEGER NOT NULL)
searchBounds: [ExactBounds [bound=CAST(SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4)):INTEGER]]
...
We can see that searchBounds are present - thus more productive execution flow is expected here.
Colocation Usage
As mentioned above, right colocated columns choice plays a significant role in query execution performance. For example, if initially tables are created without any thoughts about further usage columns colocation you can have the following scenario:
-- by default, the table is implicitly colocated by PRIMARY KEY
CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name));
-- implicitly colocated by PRIMARY KEY
CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(name, id));
And query as follows:
SELECT emp.name, dept.name FROM emp JOIN dept ON emp.dept_id = dept.id AND emp.salary > 1000;
Bring plan like:
HashJoin
predicate: =(DEPT_ID, ID)
...
Exchange
...
TableScan
table: PUBLIC.EMP
...
Exchange
...
TableScan
table: PUBLIC.DEPT
...
We can see two Exchange operators, which means that all rows are transferred into a single node and then are joined. This execution flow brings a performance cost and slows down query execution.
Let`s try to improve it by adding explicit colocation for the dept table by the ID column:
-- implicitly colocated by PRIMARY KEY
CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name));
-- explicitly colocated by ID
CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(name, id)) COLOCATE BY (id);
Now the dependent rows from emp table are transferred into the appropriate node where dept holds the rows according to DEPT.ID distribution:
HashJoin
predicate: =(DEPT_ID, ID)
...
Exchange
distribution: table PUBLIC.DEPT in zone "Default" by [DEPT_ID]
...
TableScan
table: PUBLIC.EMP
...
TableScan
table: PUBLIC.DEPT
...
Only one Exchange operator for now, which, once again, mean only rows transferring from emp table to appropriate dept one.
And finally, both join predicate related columns are colocated:
|
Note
|
The following colocation example will only work if the emp and dept tables belong to the same distribution zone. |
-- explicitly colocated by DEPT_ID
CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name)) COLOCATE BY(dept_id);
-- explicitly colocated by ID
CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(id, name)) COLOCATE BY(id);
Now, the emp and dept tables are both colocated.
And the final plan will look like this:
HashJoin
predicate: =(DEPT_ID, ID)
...
TableScan
table: PUBLIC.EMP
...
TableScan
table: PUBLIC.DEPT
...
No Exchange operators are involved in the explanation, which means that no excessive rows transfer has occurred.
Comprehensive EXPLAIN Examples
Example: Complex Join Query
EXPLAIN PLAN FOR
SELECT
U.UserName, P.ProductName, R.ReviewText, R.Rating
FROM Users U, Reviews R, Products P
WHERE U.UserID = R.UserID
AND R.ProductID = P.ProductID
AND P.ProductName = 'Product_' || ?::varchar
The resulting output is:
Project
fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
est: (rows=16650)
HashJoin
predicate: =(USERID$0, USERID)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
type: inner
est: (rows=16650)
HashJoin
predicate: =(PRODUCTID, PRODUCTID$0)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
type: inner
est: (rows=16650)
Exchange
distribution: single
est: (rows=50000)
TableScan
table: PUBLIC.REVIEWS
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
est: (rows=50000)
Exchange
distribution: single
est: (rows=1665)
TableScan
table: PUBLIC.PRODUCTS
predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
fieldNames: [PRODUCTID, PRODUCTNAME]
est: (rows=1665)
Exchange
distribution: single
est: (rows=10000)
TableScan
table: PUBLIC.USERS
fieldNames: [USERID, USERNAME]
est: (rows=10000)
This execution plan represents a query that joins three tables: USERS, REVIEWS, and PRODUCTS, and selects four fields after filtering by product name.
-
Project (root node): Outputs the final selected fields —
USERNAME,PRODUCTNAME,REVIEWTEXT, andRATING. -
HashJoins (two levels): Perform the inner joins.
-
The first (bottom-most) joins
REVIEWSwithPRODUCTSonPRODUCTID. -
The second joins the result with
USERSonUSERID.
-
-
TableScans: Each table is scanned:
-
REVIEWSis fully scanned. -
PRODUCTSis scanned with a filter onPRODUCTNAME. -
USERSis fully scanned.
-
-
Exchange nodes: Indicate data redistribution between operators.
Each node includes:
-
fieldNames: Output columns at that stage. -
predicate: Join or filter condition. -
est: Estimated number of rows at that point in the plan.
Example: Query Mapping
A result of EXPLAIN MAPPING command includes additional metadata providing insight at how the query is mapped on cluster topology. So, for the command like below:
EXPLAIN MAPPING FOR
SELECT
U.UserName, P.ProductName, R.ReviewText, R.Rating
FROM Users U, Reviews R, Products P
WHERE U.UserID = R.UserID
AND R.ProductID = P.ProductID
AND P.ProductName = 'Product_' || ?::varchar
The resulting output is:
Fragment#0 root
distribution: single
executionNodes: [node_1]
tree:
Project
fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
est: (rows=1)
HashJoin
predicate: =(USERID$0, USERID)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
type: inner
est: (rows=1)
HashJoin
predicate: =(PRODUCTID, PRODUCTID$0)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
type: inner
est: (rows=1)
Receiver
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
sourceFragmentId: 1
est: (rows=1)
Receiver
fieldNames: [PRODUCTID, PRODUCTNAME]
sourceFragmentId: 2
est: (rows=1)
Receiver
fieldNames: [USERID, USERNAME]
sourceFragmentId: 3
est: (rows=1)
Fragment#1
distribution: random
executionNodes: [node_1, node_2, node_3]
partitions: [REVIEWS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
tree:
Sender
distribution: single
targetFragmentId: 0
est: (rows=50000)
TableScan
table: PUBLIC.REVIEWS
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
est: (rows=50000)
Fragment#2
distribution: table PUBLIC.PRODUCTS in zone "Default"
executionNodes: [node_1, node_2, node_3]
partitions: [PRODUCTS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
tree:
Sender
distribution: single
targetFragmentId: 0
est: (rows=1665)
TableScan
table: PUBLIC.PRODUCTS
predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
fieldNames: [PRODUCTID, PRODUCTNAME]
est: (rows=1665)
Fragment#3
distribution: table PUBLIC.USERS in zone "Default"
executionNodes: [node_1, node_2, node_3]
partitions: [USERS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
tree:
Sender
distribution: single
targetFragmentId: 0
est: (rows=10000)
TableScan
table: PUBLIC.USERS
fieldNames: [USERID, USERNAME]
est: (rows=10000)
Where:
-
Fragment#0 means fragment with id=0
-
A root marks a fragment which is considered as root fragment, i.e. a fragment which represents user’s cursor
-
A distribution attribute provides an insight into which mapping strategy was applied to this particular fragment
-
A executionNodes attribute provides a list of nodes this fragment will be executed on
-
A partitions attribute provides an insight into which partitions of which tables will be read from which nodes
-
A tree attribute specifies which part of the relational tree corresponds to this fragment
The output above shows how the query is broken into multiple execution fragments and distributed across the cluster. It gives insight into both the logical execution plan and how it maps to the physical topology.
The query starts execution in Fragment#0, which serves as the root of the plan — this is where the final result is produced. It runs on a single node (node_1) and contains the main logic of the query, including the projection and two nested hash joins. Instead of scanning tables directly, it receives data from other fragments through Receiver operators. These incoming streams correspond to the REVIEWS, PRODUCTS, and USERS tables.
The actual table scans happen in Fragments 1 through 3, each responsible for one of the involved tables. These fragments operate in parallel across the cluster. Each performs a scan on its respective table and then sends the results back to Fragment#0.
-
Fragment#1 handles the
REVIEWStable. It runs on all nodes and uses a random distribution strategy. Data is partitioned across nodes, and after scanning the table, results are sent upstream. -
Fragment#2 is in charge of the
PRODUCTStable. It also spans all nodes but follows a zone-based distribution linked to the table’s partitioning. There’s a filter applied toPRODUCTNAME, which limits the amount of data sent to the root. -
Fragment#3 covers the
USERStable. Like the others, it’s distributed and reads from table partitions spread across the cluster.
Each fragment includes metadata such as the nodes it’s executed on, how data is partitioned, and how results are sent between fragments. This layout provides a clear view of not only how the query is logically processed, but also how the workload is split and coordinated in a distributed environment.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.
