Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to size local & HDFS storage when input & result data is in an ObjectStore? (i.e. wasb, s3, gcs, swift)

How to size local & HDFS storage when input & result data is in an ObjectStore? (i.e. wasb, s3, gcs, swift)

How much storage is required and for what purposes? Those that come to mind:

  • HDFS intermediate
  • yarn.app.*.am.staging-dir
  • yarn.nodemanager.resource.local-dirs
  • hadoop.tmp.dir
3 REPLIES 3

Re: How to size local & HDFS storage when input & result data is in an ObjectStore? (i.e. wasb, s3, gcs, swift)

Rising Star

@Sean Roberts

if you are using s3:// or wasb:// in the access url of HDFS then those calls by pass datanodes completely. I am not sure if that is what you wanted to know.

Re: How to size local & HDFS storage when input & result data is in an ObjectStore? (i.e. wasb, s3, gcs, swift)

That's not true. While the input/output will use the objectstorage (i.e. s3a://) if specified, many are things will still land locally. Such as hadoop tmp, yarn local-dirs, ...

Re: How to size local & HDFS storage when input & result data is in an ObjectStore? (i.e. wasb, s3, gcs, swift)

Rising Star

@Sean Roberts

Sorry, My answer was very HDFS specific, that is how HDFS operates. It is not concerned with the rest of the stack. I should have clarified that in my answer.