Calcite-based SQL Engine
Starting the 2.13 version, Apache Ignite includes a new SQL engine based on the Apache Calcite framework.
Apache Calcite is a dynamic data management framework, which mainly serves for mediating between applications, one or more data storage locations, and data processing engines. For more information on Apache Calcite, please see the product documentation.
The current H2-based SQL engine has a number of fundamental limitations of query execution in a distributed environment. To address these limitations, a new SQL engine was implemented. The new engine uses tools provided by Apache Calcite for parsing and planning queries. It also has a new query execution flow.
Caution
|
The Calcite-based query engine is currently in beta status. |
Calcite Module Libraries
To use a Calcite-based engine, please make sure that the Calcite module libraries are in a classpath.
Standalone Mode
When starting a standalone node, move optional/ignite-calcite
folder to the libs
folder before running ignite.{sh|bat}
script. In this case, the content of the module folder is added to the classpath.
Maven Configuration
If you are using Maven to manage dependencies of your project, you can add Calcite module dependency as follows: Replace ${ignite.version}
with the actual Apache Ignite version you are interested in:
<dependency>
<groupId>org.apache.ignite</groupId>
<artifactId>ignite-calcite</artifactId>
<version>${ignite.version}</version>
</dependency>
Configuring Query Engines
To enable engine, add the explicit CalciteQueryEngineConfiguration
instance to the SqlConfiguration.QueryEnginesConfiguration
property.
Below is a configuration example of two configured query engines (H2-based and Calcite-based engines) where the Calcite-based engine is chosen as a default one:
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="sqlConfiguration">
<bean class="org.apache.ignite.configuration.SqlConfiguration">
<property name="queryEnginesConfiguration">
<list>
<bean class="org.apache.ignite.indexing.IndexingQueryEngineConfiguration">
<property name="default" value="false"/>
</bean>
<bean class="org.apache.ignite.calcite.CalciteQueryEngineConfiguration">
<property name="default" value="true"/>
</bean>
</list>
</property>
</bean>
</property>
...
</bean>
IgniteConfiguration cfg = new IgniteConfiguration().setSqlConfiguration(
new SqlConfiguration().setQueryEnginesConfiguration(
new IndexingQueryEngineConfiguration(),
new CalciteQueryEngineConfiguration().setDefault(true)
)
);
Routing Queries to Query Engine
Normally, all queries are routed to the query engine that is configured by default. If more than one engine is configured through queryEnginesConfiguration
, it’s possible to use another engine instead of the one configured default for individual queries or for the whole connection.
JDBC
To choose a query engine for the JDBC connection, use the queryEngine
connection parameter:
jdbc:ignite:thin://127.0.0.1:10800?queryEngine=calcite
ODBC
To configure the query engine for the ODBC connection, use the QUERY_ENGINE
property:
[IGNITE_CALCITE]
DRIVER={Apache Ignite};
SERVER=127.0.0.1;
PORT=10800;
SCHEMA=PUBLIC;
QUERY_ENGINE=CALCITE
SQL Reference
DDL
Data definition language (DDL) statements are compliant with the old H2-based engine. You can find the DDL syntax description here.
DML
The new SQL engine mostly inherits data manipulation language (DML) statements syntax from the Apache Calcite framework. See the Apache Calcite SQL grammar description here.
In most cases, statement syntax is compliant with the old SQL engine. But there are still some differences between DML dialects in H2-based engine and Calcite-based engine. For example, note the MERGE
statement syntax has changed.
Supported Functions
The Calcite-based SQL engine currently supports:
Group | Functions list |
---|---|
Aggregate functions |
|
String functions |
|
Math functions |
|
Date and time functions |
|
XML functions |
|
JSON functions |
|
Other functions |
|
For more information on these functions, please see the Apache Calcite SQL language reference.
Supported Data Types
Below are the data types supported by the Calcite-based SQL engine:
Data type | Mapped to Java class |
---|---|
BOOLEAN |
|
DECIMAL |
|
DOUBLE |
|
REAL/FLOAT |
|
INT |
|
BIGINT |
|
SMALLINT |
|
TINYINT |
|
CHAR/VARCHAR |
|
DATE |
|
TIME |
|
TIMESTAMP |
|
INTERVAL YEAR TO MONTH |
|
INTERVAL DAY TO SECOND |
|
BINARY/VARBINARY |
|
UUID |
|
OTHER |
|
Optimizer hints
The query optimizer does its best to build the fastest excution plan. However, this is a far way to create an optimizer which is the most effective for each case. You can better know about the data design, application design or data distribution in the cluster. SQL hints can help the optimizer to make optimizations more rationally or build execution plan faster.
Note
|
SQL hints are optional to apply and might be skipped in some cases. |
Hints format
SQL hints are defined by a special comment /*+ HINT */ reffered as a hint block. Spaces before and after the hint name are required. The hint block is placed right after a relation operator, often after SELECT. Several hint blocks for one relation operator are not allowed.
Example:
SELECT /*+ NO_INDEX */ T1.* FROM TBL1 where T1.V1=? and T1.V2=?
It is allowed to define several hints for the same relation operator. To use several hints, separate them by comma (spaces are optional).
Example:
SELECT /*+ NO_INDEX, EXPAND_DISTINCT_AGG */ SUM(DISTINCT V1), AVG(DISTINCT V2) FROM TBL1 GROUP BY V3 WHERE V3=?
Hint parameters
Hint parameters, if required, are placed in brackets after the hint name and separated by commas.
The hint parameter can be quoted. Quoted parameter is case-sensitive. The quoted and unquoted parameters cannot be defined for the same hint.
Example:
SELECT /*+ FORCE_INDEX(TBL1_IDX2,TBL2_IDX1) */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;
SELECT /*+ FORCE_INDEX('TBL2_idx1') */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;
Hint scope
Hints are defined for a relation operator, usually for SELECT. Most of the hints are "visible" to their relation operators, for the following operators, queries and subqueries. The hints defined in the subquery are "visible" only for this subquery and its subqueries. Hint is not "visible" to the previous relation operator if it is defined after it.
Example:
SELECT /*+ NO_INDEX(TBL1_IDX2), FORCE_INDEX(TBL2_IDX2) */ T1.V1 FROM TBL1 T1 WHERE T1.V2 IN (SELECT T2.V2 FROM TBL2 T2 WHERE T2.V1=? AND T2.V2=?);
SELECT T1.V1 FROM TBL1 T1 WHERE T1.V2 IN (SELECT /*+ FORCE_INDEX(TBL2_IDX2) */ T2.V2 FROM TBL2 T2 WHERE T2.V1=? AND T2.V2=?);
Note that only the first query has a hint in such a case as:
SELECT /*+ FORCE_INDEX */ V1 FROM TBL1 WHERE V1=? AND V2=?
UNION ALL
SELECT V1 FROM TBL1 WHERE V3>?
But there are exceptions: hints of engine or optimizer level, such as DISABLE_RULE or QUERY_ENGINE. Such hints should be defined at the beginning of the query and are related to the whole query.
Hints errors
The optimizer tries to apply every hint and its parameters, if possible. But it skips the hint or hint parameter if:
-
There is no such supported hint.
-
Required hint parameters are not passed.
-
The hint parameters have been passed, but the hint does not support any parameter.
-
The hint parameter is incorrect or refers to a nonexistent object, such as a nonexistent index or table.
-
The current hints or current parameters are incompatible with the previous ones, such as forcing the use and disabling of the same index.
Supportted hints
FORCE_INDEX / NO_INDEX
Forces or disables index scan.
Parameters:
-
Empty. To force an index scan for every undelying table. Optimizer will choose any available index. Or to disable all indexes.
-
Single index name to use or skip exactly this index.
-
Several index names. They can relate to different tables. The optimizer will choose indexes for scanning or skip them all.
Example:
SELECT /*+ FORCE_INDEX */ T1.* FROM TBL1 T1 WHERE T1.V1 = T2.V1 AND T1.V2 > ?;
SELECT /*+ FORCE_INDEX(TBL1_IDX2, TBL2_IDX1) */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;
SELECT /*+ NO_INDEX */ T1.* FROM TBL1 T1 WHERE T1.V1 = T2.V1 AND T1.V2 > ?;
SELECT /*+ NO_INDEX(TBL1_IDX2, TBL2_IDX1) */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;
ENFORCE_JOIN_ORDER
Forces join order as appears in a query. Fastens building of joins plan.
Example:
SELECT /*+ ENFORCE_JOIN_ORDER */ T1.V1, T2.V1, T2.V2, T3.V1, T3.V2, T3.V3 FROM TBL1 T1 JOIN TBL2 T2 ON T1.V3=T2.V1 JOIN TBL3 T3 ON T2.V3=T3.V1 AND T2.V2=T3.V2
SELECT t1.v1, t3.v2 FROM TBL1 t1 JOIN TBL3 t3 on t1.v3=t3.v3 WHERE t1.v2 in (SELECT /*+ ENFORCE_JOIN_ORDER */ t2.v2 FROM TBL2 t2 JOIN TBL3 t3 ON t2.v1=t3.v1)
EXPAND_DISTINCT_AGG
If the optimizer wraps aggregation operations with a join, forces expanding of only distinct aggregates to the join. Removes duplicates before the joining and speeds up it.
Example:
SELECT /*+ EXPAND_DISTINCT_AGG */ SUM(DISTINCT V1), AVG(DISTINCT V2) FROM TBL1 GROUP BY V3
QUERY_ENGINE
Selects a particular engine to run individual queries. This is an engine level hint.
Parameters:
Single parameter required: the engine name.
Example:
SELECT /*+ QUERY_ENGINE('calcite') */ V1 FROM TBL1
DISABLE_RULE
Disables certain optimizer rules. This is an optimizer level hint.
Parameters:
-
One or more optimizer rules for skipping.
Example:
SELECT /*+ DISABLE_RULE('MergeJoinConverter') */ T1.* FROM TBL1 T1 JOIN TBL2 T2 ON T1.V1=T2.V1 WHERE T2.V2=?
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.