Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Data block storage directory can't work

Data block storage directory can't work

New Contributor

Hello 

 

I am trying to deploy hadoop cluster using cloudera manager API call. I am able to configure HDFS service and MAPREDUCE service using cm API's.
If I logon to the https://IP:7180/ as "admin" I can see all the configuration. 

I am using EBS on ec2. I have mounted the volume on /data and want this to be use to store hdfs data, therefore, I have set the 'dfs.datanode.data.dir'
value to '/data/dn1,/data/dn2,/data/dn3' as I am using three data nodes. I can verify this configuration on CM UI.

Now, once the set up is done I see 0 capacity for hdfs. Following is output of "hdfs dfsadmin -report". 

 

OUTPUT:
###############################
-------------------------------------------------
Live datanodes (3):
Name: 10.125.17.204:50010 (ip-10-125-17-204.us-west-2.compute.internal)
Hostname: ip-10-125-17-204.us-west-2.compute.internal
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 28672 (28 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Thu Oct 08 12:36:09 UTC 2015
###############################

 

After debugging a little further I realised that "/data/dn1,/data/dn2,/data/dn3" is not present in hdfs-site.xml. I verified the current value for 'dfs.datanode.data.dir'
using "hdfs getconf -confKey dfs.datanode.data.dir", following is the output,

 

OUTPUT:
file:///tmp/hadoop-hdfs/dfs/data

 

Note that I can see other values which I set using API like NameNode IP and all in hdfs-site.xml. Can somebody please help me with this?

1 REPLY 1

Re: Data block storage directory can't work

Cloudera Employee

Hi,

 

A few questions and comments.

 

1. Did you start up the HDFS service, either via the API or via the CM web ui?

2. Do the /data directory and subdirectories exist on the servers? If so, what is in them?

3. What are the permissions and ownership of the /data directory? See the following page for more information. http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hdfs_cluster_dep...

4. Using a list of values for dfs.datanode.data.dir is done to create multiple datanode storage directories on each server. This is done to be able to parallelize writes at the hdfs level. For example, with direct attached storage if you have 10 disks available for data storage, you'd do something like "/data1/dn, /data2/dn, .../data10/dn". This doesn't apply since you are using EBS, but I wanted to use that example to show how this property is meant to be used. You don't use the comma-separated list of directories for the property to create a separate datanode data directory for each datanode. What you are doing will technically work, but what will happen is each server will have /data/dn1, /data/dn2, and /data/dn3 directories, each with hdfs data blocks in them.

 

Regards,

Justin

Don't have an account?
Coming from Hortonworks? Activate your account here