Member since
02-18-2019
55
Posts
0
Kudos Received
0
Solutions
04-15-2021
12:20 AM
Hello, I was trying to take report from HDFS Report and get the below message Upon checking /var/log/cloudera-scm-headlamp and see the following error 021-04-15 18:09:44,934 ERROR com.cloudera.headlamp.HeadlampIndexManager: Index build failed for service hdfs
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Unsupported layout version -64
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.cloudera.headlamp.AbstractIndexBuilder.buildIndex(AbstractIndexBuilder.java:80)
at com.cloudera.headlamp.HeadlampIndex.buildIndex(HeadlampIndex.java:257)
at com.cloudera.headlamp.HeadlampIndex.reindex(HeadlampIndex.java:325)
at com.cloudera.headlamp.HeadlampIndexManager.reindexIndexes(HeadlampIndexManager.java:240)
at com.cloudera.headlamp.HeadlampIndexManager.access$100(HeadlampIndexManager.java:57)
at com.cloudera.headlamp.HeadlampIndexManager$1.run(HeadlampIndexManager.java:494)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Unsupported layout version -64
at com.cloudera.headlamp.AbstractIndexBuilder$1.run(AbstractIndexBuilder.java:74)
at com.cloudera.cmf.cdhclient.CdhExecutor$RunnableWrapper.call(CdhExecutor.java:221)
at com.cloudera.cmf.cdhclient.CdhExecutor$RunnableWrapper.call(CdhExecutor.java:211)
at com.cloudera.cmf.cdhclient.CdhExecutor$CallableWrapper.doWork(CdhExecutor.java:236)
at com.cloudera.cmf.cdhclient.CdhExecutor$1.call(CdhExecutor.java:125)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.IOException: Unsupported layout version -64
at com.cloudera.headlamp.IndexBuilderCDH.buildIndexImpl(IndexBuilderCDH.java:77)
at com.cloudera.headlamp.AbstractIndexBuilder$1.run(AbstractIndexBuilder.java:72)
... 8 more
Caused by: java.io.IOException: Unsupported layout version -64
at org.apache.hadoop.hdfs.server.namenode.FSImageUtil.loadSummary(FSImageUtil.java:75)
at org.apache.hadoop.hdfs.tools.offlineImageViewer.CdhClientPBImageViewer.go(CdhClientPBImageViewer.java:113)
at com.cloudera.headlamp.IndexBuilderCDH.buildIndexImpl(IndexBuilderCDH.java:70)
... 9 more CM / CDH - 6.3.3 Appreciate any help / guidance in fixing this issue . Thanks Amn
... View more
03-26-2021
04:25 AM
Hello, Need some assistance / guidance on how we can reduce Non-HDFS Space. We see Non-HDFS Space of around 270 used, as we are facing space crunch, we would explore possibilities for getting non-hdfs space reduced. I have cleared all Yarn logs for the applications which were killed/ failed etc (our /data mountpoint houses dfs, yarn, kudu, impala), yet this does not solve our issue. Any assistance / guidance is much appreciated. Thanks Amn
... View more
03-23-2021
09:17 PM
Hello, We are getting alerts for Block Count on one of our data nodes as it has crossed the threshold of 10000. Since HDFS balancer did not fix the issue, the next thing I turned my focus to see if we are hitting small files issue. I was trying to put up a report via terminal script ( hdfs dfs -ls -R /tmp |grep ^- |awk '{if ($5 < 134217728) print $5, $8;}'| head -5 | column –t) but when I compare the result from the script output vs HDFS Report from Cloudera Manager I see a difference in the size of the same file. Could anyone provide any guidance / assistance on this, or am I doing something wrong. Thanks Amn
... View more
03-16-2021
03:03 AM
Thanks @tjangid Just one doubt does it matter if we keep two dash (--backend_client_rpc_timeout_ms ) or (-backend_client_rpc_timeout_ms ), please confirm. Thanks Amn
... View more
03-15-2021
10:26 AM
Hello, I need to increase the query timeout (backend_client_rpc_timeout_ms) from current 5 minutes to 30 minutes. Could anyone guide me from where in Impala Configurations I can do this change. I did some checking but cannot find any related, appreciate any assistance / guidance. Thanks Amn
... View more
03-08-2021
06:16 PM
Hello, We are seeing concerning alert on one of our data node related to File Descriptor (Concerning: Open file descriptors: 16,410. File descriptor limit: 32,768. Percentage in use: 50.08%. Warning threshold: 50.00%.) Would appreciate any help/ guidance to fix this before it goes out of hand. [user1@myserver ~]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1030544 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [user1@myserver ~]$ cat /proc/sys/fs/file-max 26161091 [user1@myserver ~]$ cat /proc/sys/fs/file-nr 80400 0 26161091 Thanks Amn
... View more
Labels:
11-10-2020
10:59 PM
Hello @PabitraDas, Appreciate your assistance, below is the block count on our DNs, as mentioned earlier we have allocated 6 GB JVM Heap for DN's and 10 GB Heap for NN & SNN. Do you suggest to increase DN Heap, or NN / SNN Heap as suggested by Shelton. Block Count: Node 1 = 7421379 Node 2 = 5569699 Node 3 = 6003009 Node 4 = 7444205 Node 5 = 8770674 Node 6 = 8849641 Node 7 = 8232779 Node 8 = 8354714 Node 9 = 8860602 Also, would greatly appreciate if you have any pointers / suggestions (scripts etc. ) to identify small file issue and possible remediation. Thanks Amn
... View more
11-05-2020
07:33 PM
@Shelton Apologies for the delay in replying. For my understanding, if possible, would you please explain how increasing NN Heap would fix DN Pause duration. Thanks in advance Amn
... View more
10-27-2020
12:41 AM
Hello @GangWar @Shelton Appericate your assistance, Following is the information available from NN WebUI- (23,326,719 files and directories, 22,735,340 blocks = 46,062,059 total filesystem object(s). Heap Memory used 5.47 GB of 10.6 GB Heap Memory. Max Heap Memory is 10.6 GB. Non Heap Memory used 120.51 MB of 122.7 MB Commited Non Heap Memory. Max Non Heap Memory is <unbounded>.) Could you please re-confirm whether I need to adjust the NN Heap Memory OR DN heap memory, as the issue is seen on data Node and that too only one data node other 8 seem to be running without any issues. Thanks Amn
... View more
10-26-2020
11:32 PM
Hello, On our data node, we are increasing getting alerts related to Data Node Pause Duration. So far, this is happening on a single data node out of nine data nodes. Following is the error captured from DN logs 2020-10-27 16:20:05,140 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1821ms GC pool 'ParNew' had collection(s): count=1 time=2075ms) Current Java Heap Size of Data Node in Bytes is at 6GB CM / CDH – 5.16.x Any help is appreciated. Regards Amn
... View more
09-13-2020
07:33 PM
Hello, How to update DNS part in host name, example from 192.168.0.1.test-1.pl to 192.168.0.1.test.co.pl so that we can open CM GUI via 192.168.0.1.test.co.pl:7180 and other web gui (hue, impala etc) with the new dns name test.co.pl. Regards Anm
... View more
07-19-2020
03:07 AM
Hi @GangWar Thanks for your reply.
... View more
07-17-2020
12:44 AM
Hello, I need to reboot my Data node to fix an underlying network issue. OS team needs around 3 hours (approx.) to complete this activity, what would be the best approach: Decommission and recommission host / dn. Move DN to Offline Mode (feature in CM 5.16.x) Any help / guidance is appreciated. Regards Amn
... View more
Labels:
07-05-2020
07:57 PM
Hello, We are observing error on our Impala Query Status: InternalException: Error updating the catalog due to lock contention. When we see from Impala >Queries it appeares to be showing as Executing but when we check in query details we see error in query status. Session Type: HIVESERVER2 Impala Version: impalad version 2.12.0-cdh5.16.2 Query Type: DML Query State: EXCEPTION Any assistance is appreciated. Regards Amn.
... View more
06-17-2020
10:42 PM
Hi @tjangid Thanks for your reply, in my previous post I incorrectly mentioned that we want to move from MIT Kerberos to AD, whereas, we currently have MIT Kerberos (local) working in our cluster and we need that to be integrated with AD. So basically I am looking to find / get some detailed steps / guides on how to get this done. I have come across some blogs regarding one-way cross-realm trust etc, and a bit confused on these. Appreciate any help in this regard Thanks
... View more
06-05-2020
02:42 AM
Hello,
In our Cluster we have MIT Kerberos authentication enabled, we would like to move to AD Authentication, would appreciate if someone could share best practices / documents / how to etc, on how to move forward on this and what changes would be required in order to achieve this mission.
Regards
Amn
... View more
Labels:
05-14-2020
06:02 AM
Hi@Madhur Appreciate your assistance, I am using CM, where would this setting be in CM for making the changes in Cluster level, and to confirm these values have to be passed in seconds ? Could you also provide steps / document outlining how to change this while submitting spark jobs.
... View more
05-14-2020
01:08 AM
Hi @Madhur This is happening with all Spark jobs, there has been no changes in the code or cluster, also the failure is random.
... View more
05-13-2020
09:42 PM
Hello All, We are running Spark jobs via yarn and its failing with the below error, any help / pointer to fix is much appericated. Shell output: main : command provided 1
main : run as user is TEST1
main : requested yarn user is TEST1
Writing to tmp file /data/8/yarn/nm/nmPrivate/application_1587389136999_0013/container_e56_1587389136999_0013_01_000477/container_e56_1587389136999_0013_01_000477.pid.tmp
Writing to cgroup task files...
Container exited with a non-zero exit code 1
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Failure.recover(Try.scala:216)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
... 8 more Regards Amn
... View more
Labels:
04-09-2020
01:10 AM
Hi,
Wanted to know if SAN can be used to store kudu related data instead of a physical HDD on a server.
Example:
drwx------. 2 kudu kudu 151552 Mar 9 16:24 consensus-meta drwx------. 2 kudu kudu 4624384 Mar 9 16:20 data -rw-------. 1 kudu kudu 681 Nov 16 2017 instance drwx------. 2 kudu kudu 118784 Mar 9 16:23 tablet-meta drwx------. 1775 kudu kudu 110592 Mar 9 13:49 wals
Thanks
Amn
... View more
- Tags:
- kudu
- physical HDD
- san
Labels:
04-01-2020
10:41 PM
Hi @venkatsambath Appericate your help, would like some assistance in configuring NN heap size, would like to know what would be the best way to move forward: Increase NN heap via Cloudera Manager (HDFS>Config> NN>Heap (HA Cluster)) OR Changing it via /etc/hadoop/conf/hadoop-env.sh (HADOOP_NAMENODE_OPTS) Kindly Advise Regards Amn
... View more
03-25-2020
12:55 AM
Hi @StevenOD I tried to run re-balance tool but I get below error, Failed RPC negotiation. Trace:
0325 20:44:12.092074 (+ 0us) reactor.cc:577] Submitting negotiation task for server connection from XXX.XX.XXX.XXX:52183
0325 20:44:12.092167 (+ 93us) server_negotiation.cc:176] Beginning negotiation
0325 20:44:12.092170 (+ 3us) server_negotiation.cc:365] Waiting for connection header
0325 20:44:12.096890 (+ 4720us) server_negotiation.cc:373] Connection header received
0325 20:44:12.098104 (+ 1214us) server_negotiation.cc:329] Received NEGOTIATE NegotiatePB request
0325 20:44:12.098105 (+ 1us) server_negotiation.cc:412] Received NEGOTIATE request from client
0325 20:44:12.098128 (+ 23us) server_negotiation.cc:341] Sending NEGOTIATE NegotiatePB response
0325 20:44:12.098177 (+ 49us) server_negotiation.cc:197] Negotiated authn=SASL
0325 20:44:12.104531 (+ 6354us) server_negotiation.cc:329] Received TLS_HANDSHAKE NegotiatePB request
0325 20:44:12.106114 (+ 1583us) server_negotiation.cc:341] Sending TLS_HANDSHAKE NegotiatePB response
0325 20:44:12.115849 (+ 9735us) server_negotiation.cc:329] Received TLS_HANDSHAKE NegotiatePB request
0325 20:44:12.116299 (+ 450us) server_negotiation.cc:341] Sending TLS_HANDSHAKE NegotiatePB response
0325 20:44:12.116346 (+ 47us) server_negotiation.cc:581] Negotiated TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 TLSv1.2 Kx=ECDH Au=RSA Enc=AES(256) Mac=SHA384
0325 20:44:12.123359 (+ 7013us) negotiation.cc:304] Negotiation complete: Network error: Server connection negotiation failed: server connection from XXX.XX.XXX.XXX:52183: BlockingRecv error: failed to read from TLS socket: Cannot send after transport endpoint shutdown (error 108)
Metrics: {"server-negotiator.queue_time_us":53} Thanks Amn
... View more
03-22-2020
07:19 PM
Hi, We are repeatedly getting alerts for NN Pause duration (The health test result for NAME_NODE_PAUSE_DURATION has become bad: Average time spent paused was 44.5 second(s) (74.25%) per minute over the previous 5 minute(s). Critical threshold: 60.00%.) CM / CDH - 5.16.2 Current NN Heap Size - 4GB Block Used - 5TB Request some assistance in fixing this. Thanks Amn
... View more
Labels:
03-17-2020
09:39 PM
Hi, Do we need to stop or put all hadoop services in maintenance, during our DB patching activity or just putting Cloudera Manager in maintenance mode will suffice? Thanks Anm
... View more
03-01-2020
08:50 PM
Hello,
I would like to know if there a way to rebalance data in Kudu evenly across all kudu t-servers. Our Kudu deployment is as follows:
3 Kudu Masters
9 Tablet Servers
kudu 1.7.0-cdh5.16.2/ CM 5.16.2
Data across these 9 T-Servers is not evenly distributed, out these 9 t-servers, I see data is more stored on 3 t-servers and not distributed evenly. I was going through some articles and found that currently there is no rebalance tool like HDFS (https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Data-Directories-rebalancing/td-p/79649),
However if we Go to Clusters > Kudu > Click Actions I see Run Kudu Rebalancer Tool, would like to know, what is the purpose of this. Will this distribute data for overall Kudu or just Kudu Master or Kudu T-Servers too. Request some advice / assistance on the same.
Thanks
Amn
... View more
Labels: