About ShilpaSinha

ShilpaSinha · ‎12-14-2017

Hi @Harsh J, Yesterday, I cleaned 50GB worth files from HDFS using fs -rm. The daily incoming size on HDFS is almost 13-15GB(including replication) however today again the size of dfs has increased almost 30-55 GB more. I dont understand why? Only, on one Datanode the dfs files generated almost 15GB. [root@DataNode1 finalized]# ls -lrt| grep "Dec 13" drwxr-xr-x 208 hdfs hdfs 4096 Dec 13 05:59 subdir130 drwxr-xr-x 195 hdfs hdfs 4096 Dec 13 06:52 subdir132 drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 07:24 subdir134 drwxr-xr-x 188 hdfs hdfs 4096 Dec 13 07:32 subdir135 drwxr-xr-x 187 hdfs hdfs 4096 Dec 13 08:30 subdir138 drwxr-xr-x 210 hdfs hdfs 4096 Dec 13 09:09 subdir139 drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 09:46 subdir173 drwxr-xr-x 234 hdfs hdfs 12288 Dec 13 10:07 subdir174 drwxr-xr-x 258 hdfs hdfs 12288 Dec 13 15:30 subdir211

ShilpaSinha · ‎12-11-2017

Thanks for the explanation. Can you suggest a way to compress HDFS directories by using any libraries?

ShilpaSinha · ‎12-11-2017

Hi All, I have Cloudera 5.9 running on 3 Nodes. However recently I am noticing I get HDFS disk space alert very frequently. My total cluster size is 1.3 TB. I noticed that size of my dfs/dn/current/<blockpool>/current/finalized directory is too high. I am aware the finalized contains blocks that are not being written to by a client and have been completed. However whenever I move some of the subdir to anyother mount, it is replaced very fast in a couple of days with again many files(subdir). I have these questions: Can I delete old subdirs becuase they only contain the info of the files that were written and completed. The auto-generation of so many files in a day means, that the connectivity of that particular node to Namenode is frquently going up and down. Hence creating so many subdirs? Size of dfs/dn/current/<blockpool>/current/finalized on 3 Nodes: [hdfs@MasterNode1 current]$ du -sh finalized/ 639G finalized/ [root@DataNode1 current]# du -sh finalized/ 435G finalized/ [root@DataNode2 current]# du -sh finalized 426G finalized Just for Nov 29 and 30 you can see so many Subdirs created, almost of a size of 800 MB to 3 GB drwxr-xr-x 20 hdfs hdfs 4096 Nov 29 10:07 subdir41 drwxr-xr-x 13 hdfs hdfs 4096 Nov 29 10:09 subdir42 drwxr-xr-x 31 hdfs hdfs 4096 Nov 29 10:12 subdir43 drwxr-xr-x 24 hdfs hdfs 4096 Nov 29 10:17 subdir44 drwxr-xr-x 26 hdfs hdfs 4096 Nov 29 10:20 subdir45 drwxr-xr-x 17 hdfs hdfs 4096 Nov 29 10:24 subdir46 drwxr-xr-x 10 hdfs hdfs 4096 Nov 29 10:25 subdir47 drwxr-xr-x 29 hdfs hdfs 4096 Nov 29 10:32 subdir48 drwxr-xr-x 21 hdfs hdfs 4096 Nov 29 10:40 subdir51 drwxr-xr-x 12 hdfs hdfs 4096 Nov 29 10:40 subdir52 drwxr-xr-x 13 hdfs hdfs 4096 Nov 29 11:30 subdir53 drwxr-xr-x 27 hdfs hdfs 4096 Nov 29 11:30 subdir54 drwxr-xr-x 15 hdfs hdfs 4096 Nov 29 11:32 subdir55 drwxr-xr-x 117 hdfs hdfs 4096 Nov 29 13:48 subdir69 drwxr-xr-x 119 hdfs hdfs 4096 Nov 29 14:36 subdir71 drwxr-xr-x 136 hdfs hdfs 4096 Nov 29 15:18 subdir79 drwxr-xr-x 258 hdfs hdfs 12288 Nov 29 15:46 subdir193 drwxr-xr-x 89 hdfs hdfs 4096 Nov 29 16:06 subdir33 drwxr-xr-x 129 hdfs hdfs 4096 Nov 30 05:34 subdir72 drwxr-xr-x 122 hdfs hdfs 4096 Nov 30 06:21 subdir75 drwxr-xr-x 124 hdfs hdfs 4096 Nov 30 07:55 subdir77 drwxr-xr-x 95 hdfs hdfs 4096 Nov 30 08:32 subdir78 drwxr-xr-x 126 hdfs hdfs 4096 Nov 30 11:32 subdir85 drwxr-xr-x 124 hdfs hdfs 4096 Nov 30 12:08 subdir86 drwxr-xr-x 112 hdfs hdfs 4096 Nov 30 13:25 subdir88 drwxr-xr-x 130 hdfs hdfs 4096 Nov 30 14:25 subdir90 drwxr-xr-x 112 hdfs hdfs 4096 Nov 30 15:00 subdir91 drwxr-xr-x 57 hdfs hdfs 4096 Nov 30 18:23 subdir26 drwxr-xr-x 173 hdfs hdfs 4096 Nov 30 19:01 subdir34 drwxr-xr-x 30 hdfs hdfs 4096 Nov 30 19:03 subdir49 drwxr-xr-x 11 hdfs hdfs 4096 Nov 30 19:03 subdir50 drwxr-xr-x 27 hdfs hdfs 4096 Nov 30 19:06 subdir56 drwxr-xr-x 79 hdfs hdfs 4096 Nov 30 19:08 subdir57 drwxr-xr-x 141 hdfs hdfs 4096 Nov 30 19:49 subdir61 drwxr-xr-x 109 hdfs hdfs 4096 Nov 30 21:53 subdir64 drwxr-xr-x 126 hdfs hdfs 4096 Nov 30 22:08 subdir65 drwxr-xr-x 136 hdfs hdfs 4096 Nov 30 23:08 subdir68 Please advice. Thanks, Shilpa

ShilpaSinha · ‎11-23-2017

I copied the older subdir* from dfs/dn/current/BP<...>/current/finalized to another storage mount/drive. This has not affected my data on hdfs. Please let me know if someone thinks there is a better way than this workaround. Thanks, Shilpa

ShilpaSinha · ‎11-23-2017

Does anyone has actual solution to this problem not re-installation of the cluster. I am facing the same error for one of my DataNode, it becomes unavailable after restart and throws similar exceptions/errors. PS: I cannot re-install because its my production cluster. Looking for help. Thanks, Shilpa

ShilpaSinha · ‎11-14-2017

Hi, My Cluster has 3 nodes which occupies almost 870 GB. [hdfs@XXXX bin]$ hadoop fs -du -s -h /user/hdfs 435.3 G 870.7 G /user/hdfs However the space held by dfs.data.dir directory on single node is more than total space occupied by the cluster. [hdfs@XXXX bigdata]$ du -sh dfs 464G dfs [hdfs@YYYY bin]$ du -sh /bigdata/dfs 746G /bigdata/dfs [hdfs@ZZZZ ~]$ du -sh /bigdata/dfs 257G /bigdata/dfs Isn't hdfs files are stored on dfs.data.dir directory so how the space occupied is more than that of cluster? Please help me reduce the space of dfs directory as it is in critical state. [hdfs@XXXX finalized]$ ls -lart total 2008 drwxr-xr-x 244 hdfs hdfs 12288 Mar 7 2017 subdir0 drwxr-xr-x 258 hdfs hdfs 12288 Mar 7 2017 subdir1 drwxr-xr-x 258 hdfs hdfs 12288 Mar 10 2017 subdir2 drwxr-xr-x 258 hdfs hdfs 12288 Mar 12 2017 subdir3 drwxr-xr-x 258 hdfs hdfs 12288 Mar 14 2017 subdir4 drwxr-xr-x 258 hdfs hdfs 12288 Mar 15 2017 subdir5 drwxr-xr-x 258 hdfs hdfs 12288 Mar 15 2017 subdir6 drwxr-xr-x 258 hdfs hdfs 12288 Mar 16 2017 subdir7 drwxr-xr-x 258 hdfs hdfs 12288 Mar 17 2017 subdir8 drwxr-xr-x 258 hdfs hdfs 12288 Mar 17 2017 subdir9 drwxr-xr-x 258 hdfs hdfs 12288 Mar 17 2017 subdir10 . . . drwxr-xr-x 258 hdfs hadoop 12288 Nov 14 03:16 subdir119 drwxr-xr-x 258 hdfs hadoop 12288 Nov 14 05:04 subdir122 drwxr-xr-x 5 hdfs hadoop 4096 Nov 14 14:25 subdir181 [hdfs@XXXX finalized]$ pwd /bigdata/dfs/dn/current/BP-939287337-10.0.0.4-1484085163925/current/finalized [hdfs@XXXX finalized]$ These subdir have more subdirs under it and finally blocks. Can I delete the older ones? Thanks, Shilpa

ShilpaSinha · ‎08-01-2017

Ok @mbigelow thanks. I also reasearched more and found this hortonworks link, https://community.hortonworks.com/questions/45962/dataxceiver-error-processing-write-block-operation.html where they say we can ignore this error. This issue has been already fixed in 2.3 version of ambari but of course I am using CDH. Thanks, Shilpa

ShilpaSinha · ‎07-31-2017

Hi, I have 3 node Cloudera 5.9 Cluster running on CentOS 6.7. Recently during any write operation on Hadoop, I am witnessing these errors in Datanode logs. However the write happens but I am concerned why this is happening. PFB the stack trace. 2017-07-29 10:33:04,109 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: <datanodename>:50010:DataXceiver error processing WRITE_BLOCK operation src: /Y.Y.Y.Y:43298 dst: /X.X.X.X:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:500) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:896) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:802) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745) 2017-07-29 10:36:06,172 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DatanodeNetworkCounts of Hadoop:service=DataNode,name=DataNodeInfo threw an except ion javax.management.RuntimeMBeanException: java.lang.NullPointerException at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:346) at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:324) at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:217) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1296) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getDatanodeNetworkCounts(DataNode.java:1956) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647) ... 31 more 2017-07-29 10:36:06,231 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute NamenodeAddresses of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException 2017-07-31 14:49:41,561 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: <datanodename>:50010:DataXceiver error processing WRITE_BLOCK operation src: /Y.Y.Y.Y:43298 dst: /X.X.X.X:50010 java.io.IOException: Not ready to serve the block pool, BP-939287337-X.X.X.X-1484085163925. at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1284) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1292) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) at java.lang.Thread.run(Thread.java:745) Some Important configurations of my cluster: yarn.nodemanager.resource.memory-mb - 12GB yarn.scheduler.maximum-allocation-mb - 16GB mapreduce.map.memory.mb - 4GB mapreduce.reduce.memory.mb - 4GB mapreduce.map.java.opts.max.heap - 3GB mapreduce.reduce.java.opts.max.heap - 3GB namenode_java_heapsize - 6GB secondarynamenode_java_heapsize - 6GB dfs_datanode_max_locked_memory - 3GB dfs blocksize - 128 MB Can anyone please help me? Thanks, Shilpa

ShilpaSinha · ‎05-19-2017

Though Nutch is installed, It is NOT running on Hadoop. It is just installed on the VM. Can anyone help me in running Nutch on top of Existing Hadoop Cluster.??

ShilpaSinha · ‎05-12-2017

Nutch is installed. FOr this I had to download ant and build the code. Make sure to set $JAVA_HOME correctly. [hdfs@X.X.X.X apache-nutch-2.3.1]$ant runtime As I had to setup it with MongoDB, so do these changes in $NUTCH_HOME/conf/nutch-site.xml <configuration> <property> <name>storage.data.store.class</name> <value>org.apache.gora.mongodb.store.MongoStore</value> <description>Default class for storing data</description> </property> </configuration> Ensure the MongoDB gora-mongodb dependency is available in $NUTCH_HOME/ivy/ivy.xml; Uncomment the below line from the file $ vim $NUTCH_HOME/ivy/ivy.xml ... <dependency org="org.apache.gora" name="gora-mongodb" rev="0.5" conf="*->default" /> ... </dependency> Also, Ensure that MongoStore is set as the default datastore in $NUTCH_HOME/conf/gora.properties. Give all the details related to mongoDB. Thanks, Shilpa

Online	Offline
Last Visited	‎05-17-2018 02:52 PM

Member Since	‎11-17-2016 11:39 AM
Last Visited	‎05-17-2018 02:52 PM
Posts	63
Kudos received	7

Cloudera Community

Re: dfs storage(dfs.data.dir) space issue

Re: Install and run Apache Nutch on existing Hadoo...

Re: Run SparkR | or R package on my Cloudera 5.9 S...

Re: Use Flume to get a webpage data. How to config...

Re: hdfs.HDFSEventSink: HDFS IO error java.io.IOEx...

Re: how to clean dfs/dn/current/<blockpool>/curren...

Re: how to clean dfs/dn/current/<blockpool>/curren...

how to clean dfs/dn/current/<blockpool>/current/fi...

Re: dfs storage(dfs.data.dir) space issue

Re: Datanode is not connecting to namenode

dfs storage(dfs.data.dir) space issue

Re: Datanode WRITE_BLOCK Error

Datanode WRITE_BLOCK Error

Re: Install and run Apache Nutch on existing Hadoo...

Re: Install and run Apache Nutch on existing Hadoo...