October 20th, Q&A session: Get you issues solved and questions answered!

GitHub logo
Edit

Import Model from Apache Spark

Starting with Ignite 2.8, it’s possible to import the following models of Apache Spark ML:

  • Logistic regression (org.apache.spark.ml.classification.LogisticRegressionModel)

  • Linear regression (org.apache.spark.ml.classification.LogisticRegressionModel)

  • Decision tree (org.apache.spark.ml.classification.DecisionTreeClassificationModel)

  • Support Vector Machine (org.apache.spark.ml.classification.LinearSVCModel)

  • Random forest (org.apache.spark.ml.classification.RandomForestClassificationModel)

  • K-Means (org.apache.spark.ml.clustering.KMeansModel)

  • Decision tree regression (org.apache.spark.ml.regression.DecisionTreeRegressionModel)

  • Random forest regression (org.apache.spark.ml.regression.RandomForestRegressionModel)

  • Gradient boosted trees regression (org.apache.spark.ml.regression.GBTRegressionModel)

  • Gradient boosted trees (org.apache.spark.ml.classification.GBTClassificationModel)

This feature works with models saved in snappy.parquet files.

Supported and tested Spark version: 2.3.0 Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4

To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below:

val spark: SparkSession = TitanicUtils.getSparkSession

val passengers = TitanicUtils.readPassengersWithCasting(spark)
    .select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")

// Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler
val assembler = new VectorAssembler()
    .setInputCols(Array("pclass", "sibsp", "parch", "survived"))
    .setOutputCol("features")

// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
val output = assembler.transform(
    passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
).select("features", "age")


val lr = new LinearRegression()
    .setMaxIter(100)
    .setRegParam(0.1)
    .setElasticNetParam(0.1)
    .setLabelCol("age")
    .setFeaturesCol("features")

// Fit the model
val model = lr.fit(output)
model.write.overwrite().save("/home/models/titanic/linreg")

To load in Ignite ML you should use SparkModelParser class via method parse() call

DecisionTreeNode mdl = (DecisionTreeNode)SparkModelParser.parse(
   SPARK_MDL_PATH,
   SupportedSparkModels.DECISION_TREE
);

You can see more examples of using this API in the examples module in the package: org.apache.ignite.examples.ml.inference.spark.modelparser

Note
It does not support loading from PipelineModel in Spark. It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side.