Import Model from Apache Spark | Ignite Documentation

Ignite Summit 2023 — Watch on demand — Register now!


Import Model from Apache Spark

Starting with Ignite 2.8, it’s possible to import the following models of Apache Spark ML:

  • Logistic regression (

  • Linear regression (

  • Decision tree (

  • Support Vector Machine (

  • Random forest (

  • K-Means (

  • Decision tree regression (

  • Random forest regression (

  • Gradient boosted trees regression (

  • Gradient boosted trees (

This feature works with models saved in snappy.parquet files.

Supported and tested Spark version: 2.3.0 Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4

To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below:

val spark: SparkSession = TitanicUtils.getSparkSession

val passengers = TitanicUtils.readPassengersWithCasting(spark)
    .select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")

// Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler
val assembler = new VectorAssembler()
    .setInputCols(Array("pclass", "sibsp", "parch", "survived"))

// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
val output = assembler.transform("pclass", "sibsp", "parch", "survived", "age"))
).select("features", "age")

val lr = new LinearRegression()

// Fit the model
val model =

To load in Ignite ML you should use SparkModelParser class via method parse() call

DecisionTreeModel mdl = (DecisionTreeModel)SparkModelParser.parse(

You can see more examples of using this API in the examples module in the package:

It does not support loading from PipelineModel in Spark. It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side.