Azure HDInsight service provides that capability to create a Hadoop cluster that can be torn down and brought back up without losing any data (including meta store). Can this setup be achieved with Open Stack Swift and Cloud Break? If so, what are the steps and considerations to implement this architecture?
@Attila Kanto So if you have Object storage and use cloud break to install Hadoop, HDFS will sit on top of the object storage once everything is installed, I get that. But will the data show up on HDFS if the cluster is taken down and then brought back up with Cloud Break or will it have to be reloaded?
@Vadim, no, the HDFS and the Swift Object storage will be two different storage and both work parallel with the same cluster.
1.) HDFS: the HDFS components will be installed by Ambari and you can use the HDFS storage as usual, you can store data on it and access to it as usual e.g:
hdfs dfs -ls /some_dir/
2.) You can connect to Swift with the aid of the Swift connector shown in the link above. The Swift connector is basically allows you to communicate with Swift trough a HDFS API and the Swift connector will translate the HDFS API calls to API calls that can be understood by the Swift object storage. Therefore commands like this will work:
hdfs dfs -ls swift://some_container/