Support Questions

Find answers, ask questions, and share your expertise

HDFS over object storage for Hadoop on demand?

avatar
Guru

Azure HDInsight service provides that capability to create a Hadoop cluster that can be torn down and brought back up without losing any data (including meta store). Can this setup be achieved with Open Stack Swift and Cloud Break? If so, what are the steps and considerations to implement this architecture?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi,

we have not tested Cloudbreak with Swift, but Hadoop supports Swift out of the box , therefore it shall work on a Hadoop cluster which have been installed with Cloudbreak.

Attila

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

Hi,

we have not tested Cloudbreak with Swift, but Hadoop supports Swift out of the box , therefore it shall work on a Hadoop cluster which have been installed with Cloudbreak.

Attila

avatar
Guru

@Attila Kanto So if you have Object storage and use cloud break to install Hadoop, HDFS will sit on top of the object storage once everything is installed, I get that. But will the data show up on HDFS if the cluster is taken down and then brought back up with Cloud Break or will it have to be reloaded?

avatar
Expert Contributor

@Vadim, no, the HDFS and the Swift Object storage will be two different storage and both work parallel with the same cluster.

1.) HDFS: the HDFS components will be installed by Ambari and you can use the HDFS storage as usual, you can store data on it and access to it as usual e.g:

hdfs dfs -ls /some_dir/ 

2.) You can connect to Swift with the aid of the Swift connector shown in the link above. The Swift connector is basically allows you to communicate with Swift trough a HDFS API and the Swift connector will translate the HDFS API calls to API calls that can be understood by the Swift object storage. Therefore commands like this will work:

hdfs dfs -ls swift://some_container/