- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Difference between dfs.data.dir & dfs.datanode.data.dir
- Labels:
-
Cloudera Manager
Created on ‎11-27-2015 02:09 AM - edited ‎09-16-2022 02:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
Can anyone please let me know, what is the difference between these 2 parameters ?
Cloudera manager sets dfs.datanode.data.dir inside /swap/ folder by default. In EC2 instance, /swap is a temporary directory, which gets deleted and recreated at bootup.
Does that mean some blocks will be deleted at start up and cluster will be curropted ?
I have set up a single node cluster on a EC2 machine with CDH5.5.0 and facing the cluster curruption just after shut down and restart.
Can it be one of the reason ?
Vikas
Created ‎11-30-2015 01:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general, we recommend storing data on instance storage drives for EC2 since EBS volumes are slow and charge you per access. Instance storage is ephemeral, which means that whether the dir is named "/swap" or something else, it'll disappear if you restart the machine. You should back up your data to a safe location before powering down your EC2 machine, as discussed here:
http://www.cloudera.com/content/www/en-us/documentation/other/reference-architecture/PDF/cloudera_re...
The difference between dfs.data.dir and dfs.datanode.data.dir is that the first is a very old name used in CDH3 and perhaps earlier, while the second the preferred current config name as of CDH4. They are both logically the same thing.
Thanks,
Darren
Created ‎11-30-2015 01:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general, we recommend storing data on instance storage drives for EC2 since EBS volumes are slow and charge you per access. Instance storage is ephemeral, which means that whether the dir is named "/swap" or something else, it'll disappear if you restart the machine. You should back up your data to a safe location before powering down your EC2 machine, as discussed here:
http://www.cloudera.com/content/www/en-us/documentation/other/reference-architecture/PDF/cloudera_re...
The difference between dfs.data.dir and dfs.datanode.data.dir is that the first is a very old name used in CDH3 and perhaps earlier, while the second the preferred current config name as of CDH4. They are both logically the same thing.
Thanks,
Darren
