In-Memory File System

One of unique capabilities of Ignite is a distributed in-memory file system called Ignite File System (IGFS). IGFS delivers similar functionality to Hadoop HDFS, but only in memory. In fact, in addition to its own APIs, IGFS implements Hadoop FileSystem API and can be transparently plugged into Hadoop or Spark deployments.

IGFS splits the data from each file into separate data blocks and stores them in a distributed in-memory cache. However, unlike Hadoop HDFS, IGFS does not need a name node and automatically determines file data locality using a hashing function.

IGFS can be deployed stand alone, as well as on top of HDFS, in which case it becomes a transparent caching layer for the files stored in HDFS.

Tachyon Replacement

IGFS can transparently replace Tachyon file system in Spark deployments. Given that IGFS is based on battle-tested Ignite data grid technology, it exhibits much better write and read performance than Tachyon and is more stable.

Hadoop File System

See Hadoop integration documentation if you plan to use IGFS as Hadoop file system. In this case working with IGFS is no different than working with HDFS.

GitHub Native API Examples

Also see IGFS native API examples available on GitHub.

Ignite File System Features

Feature Description
On-Heap and Off-Heap

IGFS allows to store files either on-heap or off-heap. For larger memory spaces it is critical to use off-heap to avoid JVM lengthy garbage collection pauses.

IGFS as Hadoop FileSystem

IGFS implements Hadoop FileSystem API and can be deployed as a native Hadoop file system, just like HDFS. This allows to natively deploy IGFS into Hadoop or Spark installations in plug-n-play fashion.

Hadoop FileSystem Cache

IGFS can also be deployed as a caching layer over another Hadoop File System. In this case, if a file is updated IGFS, then update will automatically be written through to HDFS. Also, if a file is read and is not currently in IGFS, it will be automatically loaded from HDFS.

Any Hadoop Distribution

IGFS integrates with native Apache Hadoop distribution, as well as Cloudera CDH, and Hortonworks HDP.