Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Has anyone tried to use Apache Ignite on Yarn with HDFS?

avatar
Guru

Has anyone tried to use Apache Ignite on Yarn with HDFS? Specifically the HDFS acceleration feature (I am guessing similar to Tachyon).

1 ACCEPTED SOLUTION

avatar
Guru

So I have now had a chance to do some reading an experimentation on my own and Ignite seems to be very impressive in terms of the capabilities it provides. Besides for providing the standard ability of an in-memory data grid to cache data across many remote JVMs, the ability to distribute processing to where the data lives, execute functions on individual requests based on operation type, live based event notification when data changes, the ability to read/write through/behind to data store, ect... Ignite in particular has some excellent built in integration with Hadoop. Data grids provide the ability to plug a persistence layer by writing code, Ignite allows configuration based integration with HDFS, no code. This means that it is possible to support OLTP type of access to data in order to support new kinds of applications. It could also help speed up so long running jobs by ensuring that really hot data is read from memory, or perhaps speed things up by using in-memory scratch space instead of local disk (this seemed like it was possible but was not entirely clear). It also seems to also do all of the same things that Tachyon does. The other key integration point is the ability to run Ignite on Yarn. The compute aspect of some of the acceleration features can be managed by the cluster as if it is native. There definitely seems to be some great synergy between Ingnite and Hadoop.

View solution in original post

4 REPLIES 4

avatar
Master Mentor
@Vadim

See if this is helpful.

avatar
Guru

@Neeraj Sabharwal I have built systems on IMDGs before and have read how the HDFS acceleration is supposed to work. I am asking if you yourself have tried it and have any insight. Does it live up to what the marketing material claims it can do? Do you yourself see potential for it in the field?

avatar
Guru

So I have now had a chance to do some reading an experimentation on my own and Ignite seems to be very impressive in terms of the capabilities it provides. Besides for providing the standard ability of an in-memory data grid to cache data across many remote JVMs, the ability to distribute processing to where the data lives, execute functions on individual requests based on operation type, live based event notification when data changes, the ability to read/write through/behind to data store, ect... Ignite in particular has some excellent built in integration with Hadoop. Data grids provide the ability to plug a persistence layer by writing code, Ignite allows configuration based integration with HDFS, no code. This means that it is possible to support OLTP type of access to data in order to support new kinds of applications. It could also help speed up so long running jobs by ensuring that really hot data is read from memory, or perhaps speed things up by using in-memory scratch space instead of local disk (this seemed like it was possible but was not entirely clear). It also seems to also do all of the same things that Tachyon does. The other key integration point is the ability to run Ignite on Yarn. The compute aspect of some of the acceleration features can be managed by the cluster as if it is native. There definitely seems to be some great synergy between Ingnite and Hadoop.

avatar
Master Guru

I have used Apache Ignite with Spark 1.6 on a standalone cluster and it did provide acceleration for file reading