- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Very slow hdfs command responses in cluster members other than namenodes
- Labels:
-
Apache Hadoop
Created ‎02-20-2017 12:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried to find answers in previous posts, but I didn't find them.
Well, I have very slow responses from commands like "hdfs dfs -ls /" executed in cluster members other than namenodes. Comparing responses, a simple "hdfs dfs -ls /" in a namenode lasts from 2 to 3 seconds, while in any other cluster computer this time is 22 seconds. I tried to debug the process but I can´t find anything different among them. When answer is slow, it always stops for 20 seconds after "DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@3c77d488" and before "DEBUG unix.DomainSocketWatcher: org.apache.hadoop.net.unix.DomainSocketWatcher$2@130263db: starting with interruptCheckPeriodMs = 60000".
Any help?
Best regards,
Silvio
Created ‎02-22-2017 04:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Silvio del Val. Is there good communication performance between the other nodes and the NameNode in general? Try to test how quickly ping works, for example. Those commands will contact the namenode for the information you request, so I'm thinking there might be some problem with your network performance.
Created ‎02-22-2017 05:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ana,
Thanks for your answer. Well, all cluster machines are connected to the same physical switches, so I don't think it's a network problem.
I think it has to be something regarding configs...but I don't know.
Created ‎02-22-2017 05:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hm, it's hard to tell. I doubt it would be any hadoop-specific configuration because it's fast on the NameNode machine and you would have the same *-site.xml files on the others. Are those other cluster members VMs or still physical? Do they all consistently behave the same way? What else is running on them?
Also have you had a look at whether those other cluster machines are busy - e.g. if they have enough free memory to run the JVM for example.
Created ‎02-22-2017 07:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you try "export HADOOP_ROOT_LOGGER=TRACE,console" before running "hdfs dfs -ls /"? That will reveal more end-to-end RPC related traces for the root cause.
Created ‎02-23-2017 11:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, I tried to debug some days ago but I didn't understand why it was stopping 20 seconds in a single point after the command.
In nodes when the query is slow, it always stops here:
17/02/23 12:28:50 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@1a942c18 17/02/23 12:29:10 DEBUG unix.DomainSocketWatcher: org.apache.hadoop.net.unix.DomainSocketWatcher$2@173ba36d: starting with interruptCheckPeriodMs = 60000
It seems to take 20 seconds (and always 20 seconds) to get the client out of cache, while in namenodes it takes no time, that's why queries are faster....but I don't know what "get the client out of cache" means....
Created ‎03-13-2017 06:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is your version of Hadoop? Could you post the output from "hadoop -version"?
Created ‎06-07-2017 09:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
clear /etc/resolv.conf
I think the problem is resolved dns。 @Silvio del Val
,clear /etc/resolv.conf
I think the problem is resolved dns。
