Member since
10-09-2015
5
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5350 | 10-19-2015 01:24 AM |
10-19-2015
01:24 AM
The error was initially encountered in an older version of CDH, and it disappeared when we also updated the client to the same version.
... View more
10-13-2015
05:41 AM
Using CDH 5.4.7-1.cdh5.4.7.p0.3, when I run multiple mapreduce jobs one after another, eventually one of the jobs will fail with this stack trace: 2015-10-13 14:22:28,187 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: { hdfs://hdfs-nameservice/user/hdfs/.staging/job_1444734646472_0003/libjars/htrace-core-3.1.0-incubating.jar, 1444738926587, FILE, null } failed: Rename cannot overwrite non empty destination directory /yarn/nm/usercache/hdfs/filecache/945 java.io.IOException: Rename cannot overwrite non empty destination directory /yarn/nm/usercache/hdfs/filecache/945 at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) at org.apache.hadoop.fs.FileContext.rename(FileContext.java:909) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-10-13 14:22:28,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://hdfs-nameservice/user/hdfs/.staging/job_1444734646472_0003/libjars/htrace-core-3.1.0-incubating.jar(->/yarn/nm/usercache/hdfs/filecache/945/htrace-core-3.1.0-incubating.jar) transitioned from DOWNLOADING to FAILED 2015-10-13 14:22:28,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e10_1444734646472_0003_01_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED 2015-10-13 14:22:28,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Container container_e10_1444734646472_0003_01_000001 sent RELEASE event on a resource request { hdfs://hdfs-nameservice/user/hdfs/.staging/job_1444734646472_0003/libjars/htrace-core-3.1.0-incubating.jar, 1444738926587, FILE, null } not present in cache. 2015-10-13 14:22:28,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Unknown localizer with localizerId container_e10_1444734646472_0003_01_000001 is sending heartbeat. Ordering it to DIE The number in the path varies, but restarting the failed job does not get rid of the error. I have tried to set set the yarn.nodemanager.localizer.cache.target-size-mb to 0, restarting YARN, and waiting until after the cleanup, but it doesn't help. The file /yarn/nm/usercache/hdfs/filecache/812 does not seem to exist before/after running the job. Has anybody experienced this, or have an explanation as to why it happens?
... View more
Labels:
10-09-2015
08:12 AM
1 Kudo
Thank you! I actually just an hour ago came across that solution when reading this, so I already implemented the solution you suggested, but was still waiting to see if it'd balance out evenly before I posted: http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera Regardless, thank you very much for your help in resolving this issue.
... View more
10-09-2015
03:41 AM
Thank you for the answer. It's all DFS usage: $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/md2 3,6T 3,4T 4,0G 100% / $ du -hs /dfs/dn/
3,4T /dfs/dn/ And it's not balanced at all, unfortunately (output from dfsadmin -report, see hdfs-8 and hdfs-9 which are the affected hosts): Configured Capacity: 49343113617408 (44.88 TB)
Present Capacity: 47369416204288 (43.08 TB)
DFS Remaining: 21149683597312 (19.24 TB)
DFS Used: 26219732606976 (23.85 TB)
DFS Used%: 55.35%
Under replicated blocks: 16418
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (13):
Name: x.x.x.x:50010 (hdfs-8.xxx)
Hostname: hdfs-8.xxx
Rack: /dc19
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 3685917872128 (3.35 TB)
Non DFS Used: 105435455488 (98.19 GB)
DFS Remaining: 4270796800 (3.98 GB)
DFS Used%: 97.11%
DFS Remaining%: 0.11%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 33
Last contact: Fri Oct 09 12:28:54 CEST 2015
Name: x.x.x.x:50010 (hdfs-2.xxx)
Hostname: hdfs-2.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1767240470528 (1.61 TB)
Non DFS Used: 108024397824 (100.61 GB)
DFS Remaining: 1920359256064 (1.75 TB)
DFS Used%: 46.56%
DFS Remaining%: 50.59%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 208
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-6.xxx)
Hostname: hdfs-6.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1631343329280 (1.48 TB)
Non DFS Used: 107747889152 (100.35 GB)
DFS Remaining: 2056532905984 (1.87 TB)
DFS Used%: 42.98%
DFS Remaining%: 54.18%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 216
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-15.xxx)
Hostname: hdfs-15.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1888278859776 (1.72 TB)
Non DFS Used: 101076938752 (94.14 GB)
DFS Remaining: 1806268325888 (1.64 TB)
DFS Used%: 49.75%
DFS Remaining%: 47.59%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 190
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-12.xxx)
Hostname: hdfs-12.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1545568223232 (1.41 TB)
Non DFS Used: 100874694656 (93.95 GB)
DFS Remaining: 2149181206528 (1.95 TB)
DFS Used%: 40.72%
DFS Remaining%: 56.62%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 228
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-13.xxx)
Hostname: hdfs-13.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1879598047232 (1.71 TB)
Non DFS Used: 100941463552 (94.01 GB)
DFS Remaining: 1815084613632 (1.65 TB)
DFS Used%: 49.52%
DFS Remaining%: 47.82%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 206
Last contact: Fri Oct 09 12:28:54 CEST 2015
Name: x.x.x.x:50010 (hdfs-9.xxx)
Hostname: hdfs-9.xxx
Rack: /dc13
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 3690058039296 (3.36 TB)
Non DFS Used: 105563381760 (98.31 GB)
DFS Remaining: 2703360 (2.58 MB)
DFS Used%: 97.22%
DFS Remaining%: 0.00%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 31
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-1.xxx)
Hostname: hdfs-1.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1514972078080 (1.38 TB)
Non DFS Used: 728471805952 (678.44 GB)
DFS Remaining: 1552180240384 (1.41 TB)
DFS Used%: 39.91%
DFS Remaining%: 40.89%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 205
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-10.xxx)
Hostname: hdfs-10.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1584416079872 (1.44 TB)
Non DFS Used: 101046075392 (94.11 GB)
DFS Remaining: 2110161969152 (1.92 TB)
DFS Used%: 41.74%
DFS Remaining%: 55.59%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 195
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-5.xxx)
Hostname: hdfs-5.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1900345761792 (1.73 TB)
Non DFS Used: 108638838784 (101.18 GB)
DFS Remaining: 1786639523840 (1.62 TB)
DFS Used%: 50.07%
DFS Remaining%: 47.07%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 205
Last contact: Fri Oct 09 12:28:55 CEST 2015
Name: x.x.x.x:50010 (hdfs-14.xxx)
Hostname: hdfs-14.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1457428815872 (1.33 TB)
Non DFS Used: 101012418560 (94.08 GB)
DFS Remaining: 2237182889984 (2.03 TB)
DFS Used%: 38.40%
DFS Remaining%: 58.94%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 200
Last contact: Fri Oct 09 12:28:54 CEST 2015
Name: x.x.x.x:50010 (hdfs-11.xxx)
Hostname: hdfs-11.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1721947557888 (1.57 TB)
Non DFS Used: 100971548672 (94.04 GB)
DFS Remaining: 1972705017856 (1.79 TB)
DFS Used%: 45.37%
DFS Remaining%: 51.97%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 202
Last contact: Fri Oct 09 12:28:54 CEST 2015
Name: x.x.x.x:50010 (hdfs-7.xxx)
Hostname: hdfs-7.xxx
Rack: /default
Decommission Status : Normal
Configured Capacity: 3795624124416 (3.45 TB)
DFS Used: 1952617472000 (1.78 TB)
Non DFS Used: 103892504576 (96.76 GB)
DFS Remaining: 1739114147840 (1.58 TB)
DFS Used%: 51.44%
DFS Remaining%: 45.82%
Configured Cache Capacity: 3406823424 (3.17 GB)
Cache Used: 0 (0 B)
Cache Remaining: 3406823424 (3.17 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 193
Last contact: Fri Oct 09 12:28:54 CEST 2015
... View more
10-09-2015
02:47 AM
We run Cloudera with HBase on a cluster with 13 DataNodes (+ 2 non-DataNode nodes). Each DataNode has 3.6 TiB of space (2 equally sized disks in raid 0), and we have 24 TiB of data used by HDFS. However, the data is very unevenly distributed, with two servers that are now out of disk space, and the 11 other servers using about 50% of their disk space. We have about 16 "red lights" in Cloudera right now that are all caused by running out of disk space on these two serers (mostly about missing disk space for logs). When we try to run the balancer, it exits after a few minutes with this error: No block has been moved for 5 iterations. Exiting. Is this related to the following bug? https://issues.apache.org/jira/browse/HDFS-6621 The fix version of that bug is 2.6.0 and we're running HDFS 2.6.0-cdh5.4.2. And what can we do to fix it?
... View more
Labels:
- Labels:
-
HDFS