Shared Apache Spark RDDs with Apache Ignite
Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily share state in memory across multiple Spark jobs, either within the same application or between different Spark applications.
IgniteRDD is implemented is as a view over a distributed Ignite cache,
which may be deployed either within the Spark job executing process, or on a Spark worker,
or in its own cluster.
Depending on the pre-configured deployment mode, the shared state may either exist only
during the lifespan of a Spark application (
embedded mode), or it may out-survive
the Spark application (
standalone mode), in which case the state can be shared across
multiple Spark applications.
val sharedRdd = igniteContext.fromCache("partitioned") // Store pairs of integers from 1 to 10000 into in-memory cache // named "partitioned" using 10 parallel store operations. sharedRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i)))
val sharedRdd = igniteContext.fromCache("partitioned") val result = sharedRdd.sql( "select _val from Integer where val > ? and val < ?", 10, 100)
|Shared Spark RDDs||
Spark does not support SQL indexes, while Ignite does. Because of advanced in-memory indexing capabilities, IgniteRDD allows to execute SQL queries 100s of times faster than Spark native RDDs or Data Frames.