Import Model from Apache Spark
Starting with Ignite 2.8, it’s possible to import the following models of Apache Spark ML:
-
Logistic regression (
org.apache.spark.ml.classification.LogisticRegressionModel
) -
Linear regression (
org.apache.spark.ml.classification.LogisticRegressionModel
) -
Decision tree (
org.apache.spark.ml.classification.DecisionTreeClassificationModel
) -
Support Vector Machine (
org.apache.spark.ml.classification.LinearSVCModel
) -
Random forest (
org.apache.spark.ml.classification.RandomForestClassificationModel
) -
K-Means (
org.apache.spark.ml.clustering.KMeansModel
) -
Decision tree regression (
org.apache.spark.ml.regression.DecisionTreeRegressionModel
) -
Random forest regression (
org.apache.spark.ml.regression.RandomForestRegressionModel
) -
Gradient boosted trees regression (
org.apache.spark.ml.regression.GBTRegressionModel
) -
Gradient boosted trees (
org.apache.spark.ml.classification.GBTClassificationModel
)
This feature works with models saved in snappy.parquet files.
Supported and tested Spark version: 2.3.0 Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4
To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below:
val spark: SparkSession = TitanicUtils.getSparkSession
val passengers = TitanicUtils.readPassengersWithCasting(spark)
.select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")
// Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler
val assembler = new VectorAssembler()
.setInputCols(Array("pclass", "sibsp", "parch", "survived"))
.setOutputCol("features")
// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
val output = assembler.transform(
passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
).select("features", "age")
val lr = new LinearRegression()
.setMaxIter(100)
.setRegParam(0.1)
.setElasticNetParam(0.1)
.setLabelCol("age")
.setFeaturesCol("features")
// Fit the model
val model = lr.fit(output)
model.write.overwrite().save("/home/models/titanic/linreg")
To load in Ignite ML you should use SparkModelParser class via method parse() call
DecisionTreeModel mdl = (DecisionTreeModel)SparkModelParser.parse(
SPARK_MDL_PATH,
SupportedSparkModels.DECISION_TREE
);
You can see more examples of using this API in the examples module in the package: org.apache.ignite.examples.ml.inference.spark.modelparser
Note
|
It does not support loading from PipelineModel in Spark. It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side. |
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.