Member since
11-22-2017
48
Posts
2
Kudos Received
0
Solutions
03-26-2019
07:28 PM
There seems to be the problem with the zk election process. Regardless of the "server not running" messages, is Zookeeper up and running and accepting client request? Could you post 30 lines above and below the log message " WARN org.apache.zookeeper.server.NIOServerCnxn:Exception " for review?
... View more
03-11-2019
07:06 PM
You're observing JVM pauses but "No GCs detected". This indicates problem with the underlying host, typically kernel level CPU lockups or general process hangings. Check the host's /var/log/messages to find clues about such issues. When found, rectify them and then see if you still face Thrift issues. We'll take it further accordingly.
... View more
12-14-2018
03:29 AM
" connection-pending remote " .. whatever the remote IP is, this system is not able to connect to it, masy be even to a specific port. This can happen if the remote system is busy or just the port to which the connection is being tried is busy or not open. You need to check your network settings and/or the service state that is running on the unresponsive port.
... View more
12-05-2018
07:56 PM
You can go to the link again and click on "+ new paste" for a new text field to post the logs. Once done, scroll below and click on "create new paste". A link will be generated. Share that link with us.
... View more
12-03-2018
07:38 PM
Editing my update: Could you please post the DN logs in a pastbing link.. https://pastebin.com/ We can have a look at them. The exceptions given in the description seem to be a consequence of an earlier problem and hence looking at the DN logs before the mentioned exceptions should help us clarify the problem. Also, grep your DN logs with " xceiverCount " or " exceeds the limit of concurrent xcievers " and post the results here.
... View more
12-01-2018
05:19 AM
I would still check with the developer as to why it fails the first time and not again. A certain paramter is being hit that we cannot determine from our end.
... View more
11-29-2018
11:21 PM
Exception in the log snippet shown is related to class " com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close ". Your DNs are aborting operation pointing to this class. This seems to be a custom 3rd party class. Kindly check with your vendor about this.
... View more
11-28-2018
06:21 PM
Lets start by fixing them one by one. 1. Start the ntpd service on all nodes to fix the clock offset problem if the service is not already started. If it is started, make sure that all the nodes refer to the same ntpd server 2. Check the space utilization for DNs that report "Free Space" issue. I would assume that you're reaching a certain threshold which is causing these alerts. 3. About agent status, could you show what the actual message is for this one? Alternatively, restart the cloudera-scm-agent service on the nodes that are hitting this alert and see if the alerts go away. 4. Post the exact message for Data Directory status. 5. Could you specify more about the frame errors, like exact message or a screenshot?
... View more
11-28-2018
01:15 AM
Check the disk status for the DataNode that is mentioned in the exception. Do you see any warning on your CM dashboard? If yes, can you post it?
... View more
11-22-2018
09:10 PM
Not sure at the moment if we can build chart but if there are regions in transition, you can see them in the HMaster WebUI. You can get the Region-In-Transition count from jmx metric for HMaster as well.
... View more
11-22-2018
08:51 PM
We cannot be sure of the reasons for this message with the snippet that you have provided. If you notice, the connection is being successfuly set but there is not response from DN. ~~~ java.nio.channels.SocketChannel[connected local=/172.31.15.196:50010 remote=/172.31.1.81:57017] ~~~ It can happen due to various reasons, like, the pipeline is interrupted, there are network congestions at play, the DN disk is not performing well, DN host OS is having issues like kernel soft lockups or just that the DN is too heavily loaded to respond back. You'd have to dig in more into the logs and look for more information. See the messages logged before the exception you're getting in the DN logs.
... View more
11-22-2018
03:07 AM
1 Kudo
At this point I would assume that compaction of hfiles shouldn't affect REST server. Could you check/post the stdout and stderr file contents? If you're sure about OOM, you might wanna increase heap size for REST server.
... View more
11-22-2018
12:13 AM
Just to add, you might wanna be interested in the following solved thread as it has more details. https://community.cloudera.com/t5/Storage-Random-Access-HDFS/HBase-Cell-level-TTL-does-not-work-when-after-memstore/td-p/69531
... View more
11-22-2018
12:08 AM
I am assuming you're referring to Hfile format version. Go to CM >> HBASE >> Configuration >> HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml And add "hfile.format.version" as Name, "3" as Value. Later, you can add it to "HBase Client Advanced Configuration Snippet (Safety Valve) for hbase-site.xml"
... View more
11-21-2018
10:34 PM
What alteration is the user trying to do? What was the TTL value set? Is the user trying to change the TTL value?
... View more
10-25-2018
11:25 PM
HDFS fsck only checks the files that are persisted on hdfs and not open files. Since you were seeing just one missing block in the UI warnings of CM and NN and no missing blocks in fsck output, this could indicate that the missing block alert is being generated from a file that is open in the memory. fsck option "-openforwrite" should show if the above is the case. Just a reference in case you hit the issue again.
... View more
05-10-2018
11:24 PM
1 Kudo
HDFS fsck only checks the files that are persisted on hdfs and not open files. Since you're seeing just one missing block in the UI warnings of CM and NN and no missing blocks in fsck output, this would indicate that the missing block alert is being generated from a file that is open in the memory and is most likely a false alarm. This should go away when the NN role is restarted or the cluster is restarted, probably during your next maintainance window.
... View more