Support Questions

Find answers, ask questions, and share your expertise

Difference between dfs.data.dir & dfs.datanode.data.dir

avatar
Explorer

Hi Team, 

 

Can anyone please let me know, what is the difference between these 2 parameters ?

 

Cloudera manager sets dfs.datanode.data.dir inside /swap/ folder by default. In EC2 instance, /swap is a temporary directory, which gets deleted and recreated at bootup.

 

Does that mean some blocks will be deleted at start up and cluster will be curropted ? 

 

I have set up a single node cluster on a EC2 machine with CDH5.5.0 and facing the cluster curruption just after shut down and restart.

 

Can it be one of the reason ?

 

Vikas

1 ACCEPTED SOLUTION

avatar
Hi Vikas,

In general, we recommend storing data on instance storage drives for EC2 since EBS volumes are slow and charge you per access. Instance storage is ephemeral, which means that whether the dir is named "/swap" or something else, it'll disappear if you restart the machine. You should back up your data to a safe location before powering down your EC2 machine, as discussed here:
http://www.cloudera.com/content/www/en-us/documentation/other/reference-architecture/PDF/cloudera_re...

The difference between dfs.data.dir and dfs.datanode.data.dir is that the first is a very old name used in CDH3 and perhaps earlier, while the second the preferred current config name as of CDH4. They are both logically the same thing.

Thanks,
Darren

View solution in original post

1 REPLY 1

avatar
Hi Vikas,

In general, we recommend storing data on instance storage drives for EC2 since EBS volumes are slow and charge you per access. Instance storage is ephemeral, which means that whether the dir is named "/swap" or something else, it'll disappear if you restart the machine. You should back up your data to a safe location before powering down your EC2 machine, as discussed here:
http://www.cloudera.com/content/www/en-us/documentation/other/reference-architecture/PDF/cloudera_re...

The difference between dfs.data.dir and dfs.datanode.data.dir is that the first is a very old name used in CDH3 and perhaps earlier, while the second the preferred current config name as of CDH4. They are both logically the same thing.

Thanks,
Darren