Member since
02-15-2017
41
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
23822 | 07-21-2017 10:30 AM | |
16284 | 07-04-2017 07:33 AM |
07-21-2017
10:30 AM
Hello! I tracked the missing blocks and fortunately they belonged to a decommisioned DN so I decided to remove them. That's it! Thanks for your help! Guido.
... View more
07-06-2017
12:11 PM
Thanks @mbigelow! I took a deep dive into those corrupt blocks and I realized that dont' belong to HBase tables, are just files from the HDFS. I think I can understand more or less what is happening, please feel free to correct me in case I am wrong. 1) Under replicated blocks: 1169 2) Blocks with corrupt replicas: 0 3) Missing blocks: 1169 4) Missing blocks (with replication factor 1): 21062 1) I run "hdfs fsck / -list-corruptfileblocks" in order to find what files these blocks belong to. Then I listed those files and all of them had the replication factor in 1. The replication factor by default in the cluster is 3 so no matter how much time I wait for HDFS to autommatically handle these under replicated blocks, they always be listed as under replicated. Am I wright? The cluster has a lots of files with replication factor in 1 too but were not listed as "under replicated", I can't understand why. 2) Nothing to agree. 3) Are missing from the entire cluster, are dead and there's no way to give them back without a backup. These blocks are the same as the under replicated ones in 1), my question here is...Why these files are not in "Missing blocks (with replication factor 1)"? or maybe they are but in this case why are there no more "under replicated" blocks? 4) No much to agree, clarifying 3) I'll better understand this. Thanks again!
... View more
07-05-2017
06:02 AM
Thanks @mbigelow for clarifying this. I've run the command (hdfs dfsadmin -report) as I did yesterday an the output is the same. Configured Capacity: 1706034579673088 (1.52 PB) Present Capacity: 1683460189648203 (1.50 PB) DFS Remaining: 558365599904940 (507.83 TB) DFS Used: 1125094589743263 (1023.27 TB) DFS Used%: 66.83% Under replicated blocks: 1169 Blocks with corrupt replicas: 0 Missing blocks: 1169 Missing blocks (with replication factor 1): 21062 I've a couple of questions that maybe you can help me with. 1) Is there a way to get rid of that message in the NameNode web console? 2) There is a way to find out/list the missing files instead of the missing blocks? 3) Under replicated blocks staying steady in 1169, is CDH supposet to handle this? An important thing I forgot to mention is that HBase is present in the cluster an there are 5 region servers, maybe this question fit better in a new post but as far as I know HBase and HDFS balancer don't like each other so I'm wondering if this situation can be the reason why CDH is not replicating the under replicated blocks. Thanks again! Guido.
... View more
07-04-2017
08:04 AM
Hello community! I recently added 4 more DNs to my Hadoop cluster, now there are 46 DNs up and running. I'm running the balanncer since 5 days and today a message appears in the top of the NameNode web console (my_name_node_url:50070) about "There are 1169 missing blocks. The following files may be corrupted:" and a list of some of this corrupted blocks. After I saw that message I decided to run the command "hdfs dfsadmin -report" and the result was: Configured Capacity: 1706034579673088 (1.52 PB) Present Capacity: 1683943231506526 (1.50 PB) DFS Remaining: 559797934331658 (509.13 TB) DFS Used: 1124145297174868 (1022.40 TB) DFS Used%: 66.76% Under replicated blocks: 1169 Blocks with corrupt replicas: 0 Missing blocks: 1169 Missing blocks (with replication factor 1): 21062 For storage capacity reasons a group of developers decided to avoid my advice and set the replication factor in 1 for some files. What does "Missing blocks: 1169" means? Is "Missing blocks (with replication factor 1)" message telling that those 21062 blocks from files with replication factor 1 cannot be recover? I'll be very grateful if anyone can clarify this concept. Thanks! Guido.
... View more
Labels:
- Labels:
-
HDFS
07-04-2017
07:33 AM
Finally the node is up and running! Lot's of java custom packages that I forgot to include in the new DN. I didn't get clear how missing jar files can cause such a strange behavior. Thanks for your help! Guido.
... View more
06-29-2017
05:40 AM
@csguna I'll try to do it. I realized the cluster is full of custom java classes and dependences so I've to take a deep dive into the config in order to find out what is happening. Thus the cluster is pure CDH, no Cloudera Manager, so any issue is a little bit complex to solve. As far as I can solve this I'll let you know. Thanks!
... View more
06-28-2017
08:04 AM
Thanks @csguna! I run the same command (hdfs datanode) in other nodes that are up and running and the error is the same. The strange thing here is my service is not logging, I mean hadoop-hdfs-datanode-mynode.log is empty, blanck, nothing... Yarn nodemanager is up and running in the same node and it's working fine, picked-up tasks to do inmmediatelly. If I can't get logs from the datanode service (wich is running dummly) I wouldn't be able to do anything. Any help is very welcome! Thanks. Guido.
... View more
06-27-2017
01:24 PM
The weird thing here is datanode service not loggin anything. The log in the /var/hadoop-hdfs/ has no text, only the out file but in the out file always appears to be fine, the service is running but the node is not present in the cluster and is not logging. There is a way to debbug the datanode? Thanks!
... View more
06-27-2017
12:21 PM
Hello @csguna! The property "dfs.data.transfer.protection" is not present in hdfs-site.xml file. Thanks! Guido.
... View more
06-27-2017
06:11 AM
Hello! I'm trying to add a new DataNode in a running cluster. The cluster is in HA for HDFS (NNs) and for Yarn (RMs) and is secured by kerberos integration. When I performed the necessary steps to add a new DN and started the hadoop-hdfs-datanode service the new node didn't shows up in the list of DNs (I perfomed a refresh on the NNs). In /var/log/hadoop-hdfs/hadoop-hdfs-datanode-mynode.log there is nothing logged. The output for command "hdfs datanode" is: 2017-06-26 15:42:58,544 INFO security.UserGroupInformation: Login successful for user hdfs/my.datanode.fqdn@MY.REALM.FQDN using keytab file /path/to/hdfs.keytab 2017-06-26 15:42:59,292 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2017-06-26 15:42:59,333 INFO impl.MetricsSinkAdapter: Sink collectd started 2017-06-26 15:42:59,385 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2017-06-26 15:42:59,385 INFO impl.MetricsSystemImpl: DataNode metrics system started 2017-06-26 15:42:59,404 INFO datanode.DataNode: File descriptor passing is enabled. 2017-06-26 15:42:59,407 INFO datanode.DataNode: Configured hostname is my.datanode.fqdn 2017-06-26 15:42:59,415 FATAL datanode.DataNode: Exception in secureMain java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP. Using privileged resources in combination with SASL RPC data transfer protection is not supported. at org.apache.hadoop.hdfs.server.datanode.DataNode.checkSecureConfig(DataNode.java:1205) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1106) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:451) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2406) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2293) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2340) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2517) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2541) 2017-06-26 15:42:59,436 INFO util.ExitUtil: Exiting with status 1 2017-06-26 15:42:59,475 INFO datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at my.datanode.fqdn/my-datanode-ip-address ************************************************************/ Thanks for your help! Guido.
... View more
Labels:
- Labels:
-
HDFS