About vidanimegh

vidanimegh · ‎04-29-2021

I also had the exact same error. I'm sure it is indicating that something is wrong with the Fair Scheduler config. If this setup is currently under test, can you check whether you're able to get it working with Capacity Scheduler? This will help us isolate that the issue is indeed related to YARN scheduler. Thanks, Megh

vidanimegh · ‎04-29-2021

Hi, Can you share the impalad.FATAL error log from one of your impala Daemons? Thanks, Megh

vidanimegh · ‎04-28-2021

Hi @dmharshit , Can you share the log snippets from Hive Metastore, Hiveserver2 and YARN ResourceManager from the timeframe of execution of this query? Thanks, Megh

vidanimegh · ‎04-28-2021

Hi, I had got this same error when I changed my YARN scheduler from Capacity Scheduler to Fair Scheduler. I reverted to capacity scheduler and it went away. Is it the case with you as well? Thanks, Megh

vidanimegh · ‎04-27-2021

I think the repo itself is broken. navigating to http://repos.bigtop.apache.org/releases/1.5.0/centos/7/$basearch, takes to this: <Error> <Code>NoSuchKey</Code> <Message>The specified key does not exist.</Message> <Key>releases/1.5.0/centos/7/$basearch</Key> <RequestId>K3BYRJWH03RKGFVK</RequestId> <HostId>l+++bHT4W/dNczJrC5KWH7vPQ4dydVA6kLD69001OO8XQKpj6ziHZxLIsVCEIBsoZZT7o+1ILEQ=</HostId> </Error> Can you point to the documentation you're referring for this installation? Thanks, Megh

vidanimegh · ‎04-27-2021

Hi @ryu , It seems that your repo is not correctly configured. Can you paste the output of "yum list webhcat-tar-hive*"? Also, share the contents of /etc/yum.repos.d/<your repo file>.repo Thanks, Megh

vidanimegh · ‎04-24-2021

Came across this bug and followed the steps given in this comment. Haven't faced this issue after that. Thanks, Megh

vidanimegh · ‎04-22-2021

Hello Everyone, I have an external table (Text with Gzip compression) partitioned on a date column. I have created another external table with the same structure but with Parquet format with Snappy compression. I want to copy all the data from the text table to parquet table, and I'm using the following query for testing it out for one day. insert into parquet_table partition(asdt) select * from text_table where partition_col='somedate'; The partition contains almost 100M rows. This query runs for almost half an hour and then results into this error: ERROR : Vertex failed, vertexName=Reducer 2, vertexId=vertex_1619005705788_0004_2_01, diagnostics=[Task failed, taskId=task_1619005705788_0004_2_01_000240, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher_O {Map_1} #1 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:306) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:288) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Map_1: Shuffle failed with too many fetch failures and insufficient progress!failureCounts=20, pendingInputs=20, fetcherHealthy=false, reducerProgressedEnough=true, reducerStalled=true at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:1053) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:794) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:318) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:182) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:194) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:57) ... 7 more , errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher_O {Map_1} #1 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:306) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:288) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Map_1: Shuffle failed with too many fetch failures and insufficient progress!failureCounts=20, pendingInputs=20, fetcherHealthy=false, reducerProgressedEnough=true, reducerStalled=true at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:1053) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:794) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:318) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:182) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:194) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:57) ... 7 more ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher_O {Map_1} #7 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:306) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:288) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Map_1: Shuffle failed with too many fetch failures and insufficient progress!failureCounts=24, pendingInputs=24, fetcherHealthy=false, reducerProgressedEnough=true, reducerStalled=true at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.isShuffleHealthy(ShuffleScheduler.java:1053) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:794) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:318) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:182) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:194) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.callInternal(FetcherOrderedGrouped.java:57) ... 7 more , errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher_O {Map_1} #7 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:306) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:288) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) I'm running this in hive on tez in my CDP 7.1.5 I've disabled the default transactional property and one time insert only properties in Hive on Tez config. I think I'm hitting some bug but unable to find any useful info in the hiveserver/Resourcemanager/metastore/nodemanager logs. Has anybody else encountered this? Thanks, Megh

vidanimegh · ‎04-22-2021

Understood. In our case, the culprit turned out to be 50075 port in firewall. I've added as an answer. In any case, thanks for your support. Thanks, Megh

vidanimegh · ‎04-22-2021

This Issue was resolved after we opened 50075 port (Datanode WebHDFS port) between the source and destination cluster. I think the error org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error is rather misleading. Once I tried to distcp with the active namenode instead of the nameservice, I was able to get the root cause saying "connection timed out" on port 50075. In the official documentation of Cloudera this port is not mentioned as a requirement. Thanks, Megh

Online	Offline
Last Visited	‎08-22-2023 05:41 AM

Member Since	‎12-21-2020 04:03 AM
Last Visited	‎08-22-2023 05:41 AM
Posts	91
Kudos received	8

Cloudera Community

Re: 2 Doubts - Distcp between secure clusters in d...

Re: Analyze table commands not working in CDP

Re: How to define the retention period for Ranger ...

Re: Ranger

Re: Command Distcp is not working HDP 2.6.1 (kerbe...

Re: Impala Daemons Error

Re: Impala Daemons Error

Re: Error: Error while processing statement: FAILE...

Re: Impala Daemons Error

Re: hive installation error via ambari

Re: hive installation error via ambari

Re: Hive queries are failing

Hive queries are failing

Re: Distcp is not working after enabling Kerberos

Re: Distcp is not working after enabling Kerberos