Member since
12-09-2015
2
Posts
1
Kudos Received
0
Solutions
12-12-2015
02:09 PM
2 Kudos
There isn't really much in the way of Ceph integration. There is a published filesystem client JAR which, if you get on your classpath, should let you refer to data using ceph:// as the path. You also appear to need its native lib on the path, which is a bit trickier. This comes from the Ceph team, not the Hadoop people, and 1. I don't know how up to date/in sync it is with recent Hadoop versions. 2. It doesn't get released or tested by the Hadoop team: we don't know how well it works, or how it goes wrong. Filesystems are an interesting topic in Hadoop. Its a core critical part of the system: you don't want to lose data. And while there's lots of support for different filesystem implementations in hadoop (s3n, avs, ftp , swift: file:), HDFS is the one things are built and tested against. Object stores (s3, swift) are not real filesystems, and cannot be used in place of HDFS as the direct output of MR, Tez or spark jobs; and absolutely never to run HBase or accumulo atop. I don't know where ceph fits in here. It's probably safe to use it as a source of data; it's as the destination where the differences usually show up. Finally: HDP is not tested on Ceph, so cannot be supported. We do test on HDFS, against Azure storage (in HD/Insight), and on other filesystems (e.g. Isilon). I don't know of anyone else who tests Hadoop on Ceph, the way, say Redhat do with Gluster FS.
... View more