Member since
11-17-2016
63
Posts
7
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2562 | 11-23-2017 10:50 AM | |
5601 | 05-12-2017 02:13 PM | |
17429 | 01-11-2017 04:20 PM | |
11024 | 01-06-2017 04:03 PM | |
7050 | 01-06-2017 03:49 PM |
12-14-2017
11:15 AM
Hi @Harsh J, Yesterday, I cleaned 50GB worth files from HDFS using fs -rm. The daily incoming size on HDFS is almost 13-15GB(including replication) however today again the size of dfs has increased almost 30-55 GB more. I dont understand why? Only, on one Datanode the dfs files generated almost 15GB. [root@DataNode1 finalized]# ls -lrt| grep "Dec 13" drwxr-xr-x 208 hdfs hdfs 4096 Dec 13 05:59 subdir130 drwxr-xr-x 195 hdfs hdfs 4096 Dec 13 06:52 subdir132 drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 07:24 subdir134 drwxr-xr-x 188 hdfs hdfs 4096 Dec 13 07:32 subdir135 drwxr-xr-x 187 hdfs hdfs 4096 Dec 13 08:30 subdir138 drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 09:09 subdir139 drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 09:46 subdir173 drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 10:07 subdir174 drwxr-xr-x 258 hdfs hdfs 12288 Dec 13 15:30 subdir211
... View more
12-11-2017
05:26 PM
Thanks for the explanation. Can you suggest a way to compress HDFS directories by using any libraries?
... View more
12-11-2017
12:21 PM
Hi All, I have Cloudera 5.9 running on 3 Nodes. However recently I am noticing I get HDFS disk space alert very frequently. My total cluster size is 1.3 TB. I noticed that size of my dfs/dn/current/<blockpool>/current/finalized directory is too high. I am aware the finalized contains blocks that are not being written to by a client and have been completed. However whenever I move some of the subdir to anyother mount, it is replaced very fast in a couple of days with again many files(subdir). I have these questions: Can I delete old subdirs becuase they only contain the info of the files that were written and completed. The auto-generation of so many files in a day means, that the connectivity of that particular node to Namenode is frquently going up and down. Hence creating so many subdirs? Size of dfs/dn/current/<blockpool>/current/finalized on 3 Nodes: [hdfs@MasterNode1 current]$ du -sh finalized/
639G finalized/
[root@DataNode1 current]# du -sh finalized/
435G finalized/
[root@DataNode2 current]# du -sh finalized
426G finalized Just for Nov 29 and 30 you can see so many Subdirs created, almost of a size of 800 MB to 3 GB drwxr-xr-x 20 hdfs hdfs 4096 Nov 29 10:07 subdir41
drwxr-xr-x 13 hdfs hdfs 4096 Nov 29 10:09 subdir42
drwxr-xr-x 31 hdfs hdfs 4096 Nov 29 10:12 subdir43
drwxr-xr-x 24 hdfs hdfs 4096 Nov 29 10:17 subdir44
drwxr-xr-x 26 hdfs hdfs 4096 Nov 29 10:20 subdir45
drwxr-xr-x 17 hdfs hdfs 4096 Nov 29 10:24 subdir46
drwxr-xr-x 10 hdfs hdfs 4096 Nov 29 10:25 subdir47
drwxr-xr-x 29 hdfs hdfs 4096 Nov 29 10:32 subdir48
drwxr-xr-x 21 hdfs hdfs 4096 Nov 29 10:40 subdir51
drwxr-xr-x 12 hdfs hdfs 4096 Nov 29 10:40 subdir52
drwxr-xr-x 13 hdfs hdfs 4096 Nov 29 11:30 subdir53
drwxr-xr-x 27 hdfs hdfs 4096 Nov 29 11:30 subdir54
drwxr-xr-x 15 hdfs hdfs 4096 Nov 29 11:32 subdir55
drwxr-xr-x 117 hdfs hdfs 4096 Nov 29 13:48 subdir69
drwxr-xr-x 119 hdfs hdfs 4096 Nov 29 14:36 subdir71
drwxr-xr-x 136 hdfs hdfs 4096 Nov 29 15:18 subdir79
drwxr-xr-x 258 hdfs hdfs 12288 Nov 29 15:46 subdir193
drwxr-xr-x 89 hdfs hdfs 4096 Nov 29 16:06 subdir33
drwxr-xr-x 129 hdfs hdfs 4096 Nov 30 05:34 subdir72
drwxr-xr-x 122 hdfs hdfs 4096 Nov 30 06:21 subdir75
drwxr-xr-x 124 hdfs hdfs 4096 Nov 30 07:55 subdir77
drwxr-xr-x 95 hdfs hdfs 4096 Nov 30 08:32 subdir78
drwxr-xr-x 126 hdfs hdfs 4096 Nov 30 11:32 subdir85
drwxr-xr-x 124 hdfs hdfs 4096 Nov 30 12:08 subdir86
drwxr-xr-x 112 hdfs hdfs 4096 Nov 30 13:25 subdir88
drwxr-xr-x 130 hdfs hdfs 4096 Nov 30 14:25 subdir90
drwxr-xr-x 112 hdfs hdfs 4096 Nov 30 15:00 subdir91
drwxr-xr-x 57 hdfs hdfs 4096 Nov 30 18:23 subdir26
drwxr-xr-x 173 hdfs hdfs 4096 Nov 30 19:01 subdir34
drwxr-xr-x 30 hdfs hdfs 4096 Nov 30 19:03 subdir49
drwxr-xr-x 11 hdfs hdfs 4096 Nov 30 19:03 subdir50
drwxr-xr-x 27 hdfs hdfs 4096 Nov 30 19:06 subdir56
drwxr-xr-x 79 hdfs hdfs 4096 Nov 30 19:08 subdir57
drwxr-xr-x 141 hdfs hdfs 4096 Nov 30 19:49 subdir61
drwxr-xr-x 109 hdfs hdfs 4096 Nov 30 21:53 subdir64
drwxr-xr-x 126 hdfs hdfs 4096 Nov 30 22:08 subdir65
drwxr-xr-x 136 hdfs hdfs 4096 Nov 30 23:08 subdir68 Please advice. Thanks, Shilpa
... View more
Labels:
- Labels:
-
HDFS
11-23-2017
10:50 AM
I copied the older subdir* from dfs/dn/current/BP<...>/current/finalized to another storage mount/drive. This has not affected my data on hdfs. Please let me know if someone thinks there is a better way than this workaround. Thanks, Shilpa
... View more
11-23-2017
09:50 AM
Does anyone has actual solution to this problem not re-installation of the cluster. I am facing the same error for one of my DataNode, it becomes unavailable after restart and throws similar exceptions/errors. PS: I cannot re-install because its my production cluster. Looking for help. Thanks, Shilpa
... View more
11-14-2017
12:41 PM
Hi, My Cluster has 3 nodes which occupies almost 870 GB. [hdfs@XXXX bin]$ hadoop fs -du -s -h /user/hdfs 435.3 G 870.7 G /user/hdfs However the space held by dfs.data.dir directory on single node is more than total space occupied by the cluster. [hdfs@XXXX bigdata]$ du -sh dfs
464G dfs
[hdfs@YYYY bin]$ du -sh /bigdata/dfs
746G /bigdata/dfs
[hdfs@ZZZZ ~]$ du -sh /bigdata/dfs
257G /bigdata/dfs Isn't hdfs files are stored on dfs.data.dir directory so how the space occupied is more than that of cluster? Please help me reduce the space of dfs directory as it is in critical state. [hdfs@XXXX finalized]$ ls -lart
total 2008
drwxr-xr-x 244 hdfs hdfs 12288 Mar 7 2017 subdir0
drwxr-xr-x 258 hdfs hdfs 12288 Mar 7 2017 subdir1
drwxr-xr-x 258 hdfs hdfs 12288 Mar 10 2017 subdir2
drwxr-xr-x 258 hdfs hdfs 12288 Mar 12 2017 subdir3
drwxr-xr-x 258 hdfs hdfs 12288 Mar 14 2017 subdir4
drwxr-xr-x 258 hdfs hdfs 12288 Mar 15 2017 subdir5
drwxr-xr-x 258 hdfs hdfs 12288 Mar 15 2017 subdir6
drwxr-xr-x 258 hdfs hdfs 12288 Mar 16 2017 subdir7
drwxr-xr-x 258 hdfs hdfs 12288 Mar 17 2017 subdir8
drwxr-xr-x 258 hdfs hdfs 12288 Mar 17 2017 subdir9
drwxr-xr-x 258 hdfs hdfs 12288 Mar 17 2017 subdir10
.
.
. drwxr-xr-x 258 hdfs hadoop 12288 Nov 14 03:16 subdir119 drwxr-xr-x 258 hdfs hadoop 12288 Nov 14 05:04 subdir122 drwxr-xr-x 5 hdfs hadoop 4096 Nov 14 14:25 subdir181
[hdfs@XXXX finalized]$ pwd
/bigdata/dfs/dn/current/BP-939287337-10.0.0.4-1484085163925/current/finalized
[hdfs@XXXX finalized]$
These subdir have more subdirs under it and finally blocks. Can I delete the older ones? Thanks, Shilpa
... View more
Labels:
- Labels:
-
HDFS
08-01-2017
09:53 AM
Ok @mbigelow thanks. I also reasearched more and found this hortonworks link, https://community.hortonworks.com/questions/45962/dataxceiver-error-processing-write-block-operation.html where they say we can ignore this error. This issue has been already fixed in 2.3 version of ambari but of course I am using CDH. Thanks, Shilpa
... View more
07-31-2017
06:15 PM
Hi, I have 3 node Cloudera 5.9 Cluster running on CentOS 6.7. Recently during any write operation on Hadoop, I am witnessing these errors in Datanode logs. However the write happens but I am concerned why this is happening. PFB the stack trace. 2017-07-29 10:33:04,109 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: <datanodename>:50010:DataXceiver error processing WRITE_BLOCK
operation src: /Y.Y.Y.Y:43298 dst: /X.X.X.X:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:500)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:896)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:802)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
2017-07-29 10:36:06,172 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DatanodeNetworkCounts of Hadoop:service=DataNode,name=DataNodeInfo threw an except
ion
javax.management.RuntimeMBeanException: java.lang.NullPointerException
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:346)
at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:324)
at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:217)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1296)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDatanodeNetworkCounts(DataNode.java:1956)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
... 31 more
2017-07-29 10:36:06,231 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute NamenodeAddresses of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
2017-07-31 14:49:41,561 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: <datanodename>:50010:DataXceiver error processing WRITE_BLOCK operation src: /Y.Y.Y.Y:43298 dst: /X.X.X.X:50010 java.io.IOException: Not ready to serve the block pool, BP-939287337-X.X.X.X-1484085163925.
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1284)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1292)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745) Some Important configurations of my cluster: yarn.nodemanager.resource.memory-mb - 12GB yarn.scheduler.maximum-allocation-mb - 16GB mapreduce.map.memory.mb - 4GB mapreduce.reduce.memory.mb - 4GB mapreduce.map.java.opts.max.heap - 3GB mapreduce.reduce.java.opts.max.heap - 3GB namenode_java_heapsize - 6GB secondarynamenode_java_heapsize - 6GB dfs_datanode_max_locked_memory - 3GB dfs blocksize - 128 MB Can anyone please help me? Thanks, Shilpa
... View more
Labels:
- Labels:
-
HDFS
05-19-2017
12:18 PM
Though Nutch is installed, It is NOT running on Hadoop. It is just installed on the VM. Can anyone help me in running Nutch on top of Existing Hadoop Cluster.??
... View more
05-12-2017
02:13 PM
Nutch is installed. FOr this I had to download ant and build the code. Make sure to set $JAVA_HOME correctly. [hdfs@X.X.X.X apache-nutch-2.3.1]$ant runtime As I had to setup it with MongoDB, so do these changes in $NUTCH_HOME/conf/nutch-site.xml <configuration>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.mongodb.store.MongoStore</value>
<description>Default class for storing data</description>
</property>
</configuration> Ensure the MongoDB gora-mongodb dependency is available in $NUTCH_HOME/ivy/ivy.xml; Uncomment the below line from the file $ vim $NUTCH_HOME/ivy/ivy.xml
...
<dependency org="org.apache.gora" name="gora-mongodb" rev="0.5" conf="*->default" />
...
</dependency> Also, Ensure that MongoStore is set as the default datastore in $NUTCH_HOME/conf/gora.properties. Give all the details related to mongoDB. Thanks, Shilpa
... View more