Calcite-based SQL Engine | Ignite Documentation

Ignite Summit 2024 — Call For Speakers Now Open — Learn more

Edit

Calcite-based SQL Engine

Starting the 2.13 version, Apache Ignite includes a new SQL engine based on the Apache Calcite framework.

Apache Calcite is a dynamic data management framework, which mainly serves for mediating between applications, one or more data storage locations, and data processing engines. For more information on Apache Calcite, please see the product documentation.

The current H2-based SQL engine has a number of fundamental limitations of query execution in a distributed environment. To address these limitations, a new SQL engine was implemented. The new engine uses tools provided by Apache Calcite for parsing and planning queries. It also has a new query execution flow.

Caution
The Calcite-based query engine is currently in beta status.

Calcite Module Libraries

To use a Calcite-based engine, please make sure that the Calcite module libraries are in a classpath.

Standalone Mode

When starting a standalone node, move optional/ignite-calcite folder to the libs folder before running ignite.{sh|bat} script. In this case, the content of the module folder is added to the classpath.

Maven Configuration

If you are using Maven to manage dependencies of your project, you can add Calcite module dependency as follows: Replace ${ignite.version} with the actual Apache Ignite version you are interested in:

<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-calcite</artifactId>
    <version>${ignite.version}</version>
</dependency>

Configuring Query Engines

To enable engine, add the explicit CalciteQueryEngineConfiguration instance to the SqlConfiguration.QueryEnginesConfiguration property.

Below is a configuration example of two configured query engines (H2-based and Calcite-based engines) where the Calcite-based engine is chosen as a default one:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="sqlConfiguration">
        <bean class="org.apache.ignite.configuration.SqlConfiguration">
            <property name="queryEnginesConfiguration">
                <list>
                    <bean class="org.apache.ignite.indexing.IndexingQueryEngineConfiguration">
                        <property name="default" value="false"/>
                    </bean>
                    <bean class="org.apache.ignite.calcite.CalciteQueryEngineConfiguration">
                        <property name="default" value="true"/>
                    </bean>
                </list>
            </property>
        </bean>
    </property>
    ...
</bean>
IgniteConfiguration cfg = new IgniteConfiguration().setSqlConfiguration(
    new SqlConfiguration().setQueryEnginesConfiguration(
        new IndexingQueryEngineConfiguration(),
        new CalciteQueryEngineConfiguration().setDefault(true)
    )
);

Routing Queries to Query Engine

Normally, all queries are routed to the query engine that is configured by default. If more than one engine is configured through queryEnginesConfiguration, it’s possible to use another engine instead of the one configured default for individual queries or for the whole connection.

JDBC

To choose a query engine for the JDBC connection, use the queryEngine connection parameter:

jdbc:ignite:thin://127.0.0.1:10800?queryEngine=calcite

ODBC

To configure the query engine for the ODBC connection, use the QUERY_ENGINE property:

[IGNITE_CALCITE]
DRIVER={Apache Ignite};
SERVER=127.0.0.1;
PORT=10800;
SCHEMA=PUBLIC;
QUERY_ENGINE=CALCITE

SQL Reference

DDL

Data definition language (DDL) statements are compliant with the old H2-based engine. You can find the DDL syntax description here.

DML

The new SQL engine mostly inherits data manipulation language (DML) statements syntax from the Apache Calcite framework. See the Apache Calcite SQL grammar description here.

In most cases, statement syntax is compliant with the old SQL engine. But there are still some differences between DML dialects in H2-based engine and Calcite-based engine. For example, note the MERGE statement syntax has changed.

Supported Functions

The Calcite-based SQL engine currently supports:

Group Functions list

Aggregate functions

COUNT, SUM, AVG, MIN, MAX, ANY_VALUE, LISTAGG, GROUP_CONCAT, STRING_AGG, ARRAY_AGG, ARRAY_CONCAT_AGG, EVERY, SOME

String functions

UPPER, LOWER, INITCAP, TO_BASE64, FROM_BASE64, MD5, SHA1, SUBSTRING, LEFT, RIGHT, REPLACE, TRANSLATE, CHR, CHAR_LENGTH, CHARACTER_LENGTH, LENGTH, CONCAT, OVERLAY, POSITION, ASCII, REPEAT, SPACE, STRCMP, SOUNDEX, DIFFERENCE, REVERSE, TRIM, LTRIM, RTRIM, REGEXP_REPLACE

Math functions

MOD, EXP, POWER, LN, LOG10, ABS, RAND, RAND_INTEGER, ACOS, ASIN, ATAN, ATAN2, SQRT, CBRT, COS, COSH, COT, DEGREES, RADIANS, ROUND, SIGN, SIN, SINH, TAN, TANH, TRUNCATE, PI

Date and time functions

EXTRACT, FLOOR, CEIL, TIMESTAMPADD, TIMESTAMPDIFF, LAST_DATE, DAYNAME, MONTHNAME, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, YEAR, QUARTER, MONTH, WEEK, HOUR, MINUTE, SECOND, TIMESTAMP_SECONDS, TIMESTAMP_MILLIS, TIMESTAMP_MICROS, UNIX_SECONDS, UNIX_MILLIS, UNIX_MICROS, UNIX_DATE, DATE_FROM_UNIX_DATE, DATE, TIME, DATETIME, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_DATE, LOCALTIME, LOCALTIMESTAMP

XML functions

EXTRACTVALUE, XMLTRANSFORM, EXTRACT, EXISTSNODE

JSON functions

JSON_VALUE, JSON_QUERY, JSON_TYPE, JSON_EXISTS, JSON_DEPTH, JSON_KEYS, JSON_PRETTY, JSON_LENGTH, JSON_REMOVE, JSON_STORAGE_SIZE, JSON_OBJECT, JSON_ARRAY

Other functions

ROW, CAST, COALESCE, NVL, NULLIF, CASE, DECODE, LEAST, GREATEST, COMPRESS, OCTET_LENGTH, TYPEOF, QUERY_ENGINE

For more information on these functions, please see the Apache Calcite SQL language reference.

Supported Data Types

Below are the data types supported by the Calcite-based SQL engine:

Data type Mapped to Java class

BOOLEAN

java.lang.Boolean

DECIMAL

java.math.BigDecimal

DOUBLE

java.lang.Double

REAL/FLOAT

java.lang.Float

INT

java.lang.Integer

BIGINT

java.lang.Long

SMALLINT

java.lang.Short

TINYINT

java.lang.Byte

CHAR/VARCHAR

java.lang.String

DATE

java.sql.Date

TIME

java.sql.Time

TIMESTAMP

java.sql.Timestamp

INTERVAL YEAR TO MONTH

java.time.Period

INTERVAL DAY TO SECOND

java.time.Duration

BINARY/VARBINARY

byte[]

UUID

java.util.UUID

OTHER

java.lang.Object

Optimizer hints

The query optimizer does its best to build the fastest excution plan. However, this is a far way to create an optimizer which is the most effective for each case. You can better know about the data design, application design or data distribution in the cluster. SQL hints can help the optimizer to make optimizations more rationally or build execution plan faster.

Note

SQL hints are optional to apply and might be skipped in some cases.

Hints format

SQL hints are defined by a special comment /*+ HINT */ reffered as a hint block. Spaces before and after the hint name are required. The hint block is placed right after a relation operator, often after SELECT. Several hint blocks for one relation operator are not allowed.

Example:

SELECT /*+ NO_INDEX */ T1.* FROM TBL1 where T1.V1=? and T1.V2=?

It is allowed to define several hints for the same relation operator. To use several hints, separate them by comma (spaces are optional).

Example:

SELECT /*+ NO_INDEX, EXPAND_DISTINCT_AGG */ SUM(DISTINCT V1), AVG(DISTINCT V2) FROM TBL1 GROUP BY V3 WHERE V3=?

Hint parameters

Hint parameters, if required, are placed in brackets after the hint name and separated by commas.

The hint parameter can be quoted. Quoted parameter is case-sensitive. The quoted and unquoted parameters cannot be defined for the same hint.

Example:

SELECT /*+ FORCE_INDEX(TBL1_IDX2,TBL2_IDX1) */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;

SELECT /*+ FORCE_INDEX('TBL2_idx1') */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;

Hint scope

Hints are defined for a relation operator, usually for SELECT. Most of the hints are "visible" to their relation operators, for the following operators, queries and subqueries. The hints defined in the subquery are "visible" only for this subquery and its subqueries. Hint is not "visible" to the previous relation operator if it is defined after it.

Example:

SELECT /*+ NO_INDEX(TBL1_IDX2), FORCE_INDEX(TBL2_IDX2) */ T1.V1 FROM TBL1 T1 WHERE T1.V2 IN (SELECT T2.V2 FROM TBL2 T2 WHERE T2.V1=? AND T2.V2=?);

SELECT T1.V1 FROM TBL1 T1 WHERE T1.V2 IN (SELECT /*+ FORCE_INDEX(TBL2_IDX2) */ T2.V2 FROM TBL2 T2 WHERE T2.V1=? AND T2.V2=?);

Note that only the first query has a hint in such a case as:

SELECT /*+ FORCE_INDEX */ V1 FROM TBL1 WHERE V1=? AND V2=?
UNION ALL
SELECT V1 FROM TBL1 WHERE V3>?

But there are exceptions: hints of engine or optimizer level, such as DISABLE_RULE or QUERY_ENGINE. Such hints should be defined at the beginning of the query and are related to the whole query.

Hints errors

The optimizer tries to apply every hint and its parameters, if possible. But it skips the hint or hint parameter if:

  • There is no such supported hint.

  • Required hint parameters are not passed.

  • The hint parameters have been passed, but the hint does not support any parameter.

  • The hint parameter is incorrect or refers to a nonexistent object, such as a nonexistent index or table.

  • The current hints or current parameters are incompatible with the previous ones, such as forcing the use and disabling of the same index.

Supportted hints

FORCE_INDEX / NO_INDEX

Forces or disables index scan.

Parameters:
  • Empty. To force an index scan for every undelying table. Optimizer will choose any available index. Or to disable all indexes.

  • Single index name to use or skip exactly this index.

  • Several index names. They can relate to different tables. The optimizer will choose indexes for scanning or skip them all.

Example:
SELECT /*+ FORCE_INDEX */ T1.* FROM TBL1 T1 WHERE T1.V1 = T2.V1 AND T1.V2 > ?;

SELECT /*+ FORCE_INDEX(TBL1_IDX2, TBL2_IDX1) */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;

SELECT /*+ NO_INDEX */ T1.* FROM TBL1 T1 WHERE T1.V1 = T2.V1 AND T1.V2 > ?;

SELECT /*+ NO_INDEX(TBL1_IDX2, TBL2_IDX1) */ T1.V1, T2.V1 FROM TBL1 T1, TBL2 T2 WHERE T1.V1 = T2.V1 AND T1.V2 > ? AND T2.V2 > ?;

ENFORCE_JOIN_ORDER

Forces join order as appears in a query. Fastens building of joins plan.

Example:
SELECT /*+ ENFORCE_JOIN_ORDER */ T1.V1, T2.V1, T2.V2, T3.V1, T3.V2, T3.V3 FROM TBL1 T1 JOIN TBL2 T2 ON T1.V3=T2.V1 JOIN TBL3 T3 ON T2.V3=T3.V1 AND T2.V2=T3.V2

SELECT t1.v1, t3.v2 FROM TBL1 t1 JOIN TBL3 t3 on t1.v3=t3.v3 WHERE t1.v2 in (SELECT /*+ ENFORCE_JOIN_ORDER */ t2.v2 FROM TBL2 t2 JOIN TBL3 t3 ON t2.v1=t3.v1)

EXPAND_DISTINCT_AGG

If the optimizer wraps aggregation operations with a join, forces expanding of only distinct aggregates to the join. Removes duplicates before the joining and speeds up it.

Example:
SELECT /*+ EXPAND_DISTINCT_AGG */ SUM(DISTINCT V1), AVG(DISTINCT V2) FROM TBL1 GROUP BY V3

QUERY_ENGINE

Selects a particular engine to run individual queries. This is an engine level hint.

Parameters:

Single parameter required: the engine name.

Example:
SELECT /*+ QUERY_ENGINE('calcite') */ V1 FROM TBL1

DISABLE_RULE

Disables certain optimizer rules. This is an optimizer level hint.

Parameters:
  • One or more optimizer rules for skipping.

Example:
SELECT /*+ DISABLE_RULE('MergeJoinConverter') */ T1.* FROM TBL1 T1 JOIN TBL2 T2 ON T1.V1=T2.V1 WHERE T2.V2=?