Member since
05-21-2021
33
Posts
1
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
776 | 06-23-2022 01:06 AM | |
1730 | 04-22-2022 02:24 AM | |
8610 | 03-29-2022 01:20 AM |
04-21-2022
03:29 AM
Hello Team, In our customer cluster, we are testing the HDFS replication through Cloudera Manager. The replication policy looks as follows. All the other configuration is the default. The replication is hung in the below state for a long time. We looked into the Cloudera Manager logs and we can see the below error repeatedly occurring. Can you please help us to resolve the issue? 2022-04-21 12:27:57,199 ERROR CommandPusher-1:com.cloudera.cmf.service.AgentResultFetcher: Exception occured while handling tempfile com.cloudera.cmf.service.AgentResultFetcher@618eac09 Best Regards Sayed Anisul Hoque
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
-
HDFS
04-19-2022
04:06 PM
@Bharati Thank you! This worked. However, could you please share which logs had shown that it was trying to copy the system database and information_schema?
... View more
04-19-2022
08:54 AM
Hello Team, We are setting the Hive replication through Cloudera Manager. The replication policy looks as follows. Note that we also enabled the snapshot on the source cluster for the path - /warehouse However, when we press save policy then we get the below notification. We looked into the Cloudera Manager logs and we can see the below error. Can you please help us to get the correct configuration to resolve the issue? 2022-04-19 15:51:07,848 ERROR scm-web-1686:com.cloudera.server.web.cmf.WebController: getHiveWarehouseSnapshotsEnabled
javax.ws.rs.NotAuthorizedException: HTTP 401 Unauthorized
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.cxf.jaxrs.client.AbstractClient.convertToWebApplicationException(AbstractClient.java:507)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.checkResponse(ClientProxyImpl.java:324)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.handleResponse(ClientProxyImpl.java:878)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:791)
....
....
....
....
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:750)
2022-04-19 15:51:07,849 ERROR scm-web-1686:com.cloudera.server.web.common.JsonResponse: JsonResponse created with throwable:
javax.ws.rs.NotAuthorizedException: HTTP 401 Unauthorized
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.cxf.jaxrs.client.AbstractClient.convertToWebApplicationException(AbstractClient.java:507)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.checkResponse(ClientProxyImpl.java:324)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.handleResponse(ClientProxyImpl.java:878)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:791)
... View more
Labels:
03-29-2022
01:20 AM
With the help of @mszurap we could narrow down the issue. There are 2 issues, the first one was coming due to the OOM and the second one was from the application itself. Below are some of the logs that we noticed during the Oozie job run. 22/03/23 13:18:54 INFO mapred.SparkHadoopMapRedUtil: attempt_20220323131847_0000_m_000000_0: Committed
22/03/23 13:18:54 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1384 bytes result sent to driver
22/03/23 13:19:55 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
22/03/23 13:19:55 INFO storage.DiskBlockManager: Shutdown hook called
22/03/23 13:19:55 INFO util.ShutdownHookManager: Shutdown hook called from Miklos, "executor.Executor" ... "RECEIVED SIGNAL TERM" is completely normal that an executor is killed by the AM/Driver. Since the Spark job was succeeding in the lower environments (like Dev/Test) the suggestion was to check if the application is using the same dependencies too in lower environments (get the Spark event logs for the good and bad run). Also to check the driver YARN logs, there could be a possibility of some abrupt exit due to an OOM. We then looked in the direction of the OOM, and also checked if there were no System.exit() calls in the Spark code. We updated the driver memory to 2GB and ran the job, now we can see the actual error (the error from the application). Hope this helps someone in the future.
... View more
03-25-2022
10:02 AM
Hello Team,
Spark job through Oozie is failing with the below exception in the Prod cluster. Note that the same job passes in the lower clusters (f.e. Dev)
22/03/24 05:05:22 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/03/24 05:05:22 ERROR scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:627)
at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:583)
at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:134)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:145)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:145)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:145)
at org.apache.spark.scheduler.EventLoggingListener.onApplicationEnd(EventLoggingListener.scala:191)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1231)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
22/03/24 05:05:22 INFO server.AbstractConnector: Stopped Spark@12bcc45b{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
22/03/24 05:05:22 INFO ui.SparkUI: Stopped Spark web UI at http://xxxxxxxxxxxxxxx:33687
22/03/24 05:05:22 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
22/03/24 05:05:22 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/03/24 05:05:22 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
22/03/24 05:05:22 ERROR util.Utils: Uncaught exception in thread shutdown-hook-0
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1685)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1745)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1742)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1757)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723)
at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:249)
at org.apache.spark.SparkContext$$anonfun$stop$9$$anonfun$apply$mcV$sp$7.apply(SparkContext.scala:1966)
at org.apache.spark.SparkContext$$anonfun$stop$9$$anonfun$apply$mcV$sp$7.apply(SparkContext.scala:1966)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext$$anonfun$stop$9.apply$mcV$sp(SparkContext.scala:1966)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1269)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1965)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:578)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1874)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
22/03/24 05:05:22 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/03/24 05:05:22 INFO memory.MemoryStore: MemoryStore cleared
22/03/24 05:05:22 INFO storage.BlockManager: BlockManager stopped
22/03/24 05:05:22 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/03/24 05:05:22 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/03/24 05:05:22 INFO spark.SparkContext: Successfully stopped SparkContext
22/03/24 05:05:22 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
22/03/24 05:05:22 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
Already looked into the NameNode logs and couldn't find any ERROR regarding this. Please help in resolving the issue.
Best Regards
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
03-24-2022
03:34 AM
@Scharan Can you please give a short explanation as my customer is asking for it as to why shadow file matters in this case i.e. what's the relation with Knox with shadow file? Thank you!
... View more
03-24-2022
03:22 AM
Yes, that resolved the issue! I had 000 as my permission. Thank you @Scharan I appreciate the quick reply.
... View more
03-24-2022
03:05 AM
Hello Team, I have an issue with setting the Knox authentication with PAM. I have the default login in /etc/pam.d/ $ cat /etc/pam.d/login
#%PAM-1.0
auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.so
auth substack system-auth
auth include postlogin
account required pam_nologin.so
account include system-auth
password include system-auth
# pam_selinux.so close should be the first session rule
session required pam_selinux.so close
session required pam_loginuid.so
session optional pam_console.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session required pam_selinux.so open
session required pam_namespace.so
session optional pam_keyinit.so force revoke
session include system-auth
session include postlogin
-session optional pam_ck_connector.so Knox-sso looks as following (the default one) I created a user named - test with a password. I tried to access the Knox Gateway UI but I get the issue. The Knox Gateway log says: (KnoxPamRealm.java:handleAuthFailure(170)) - Shiro unable to login: null Note: I am using CDP 7.1.6 and I can login to my host (where Knox Gateway is installed) using the test user. Also, there's no Kerberos setup. Please share if there's something that needs to be adjusted. Best Regards Sayed
... View more
Labels:
03-02-2022
02:39 AM
Hello Team,
We are experiencing an issue with tables for Hive on HBase where we get the following issue, Note: The pure Hive tables and query on HBase are working fine.
Failed after attempts=11, exceptions:
2022-02-28T12:12:54.719Z, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60115: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991 row '' on table 'a-hbase-table' at region=a-hbase-table,,1598840250675.94c031e70c63dbb0f4726251987eb4ec., hostname=rs.host.500,16020,1645349772604, seqNum=550418
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=11, exceptions:
2022-02-28T12:12:54.719Z, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60115: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991 row '' on table 'a-hbase-table' at region=a-hbase-table,,1598840250675.94c031e70c63dbb0f4726251987eb4ec., hostname=rs.host.500,16020,1645349772604, seqNum=550418
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:299)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:251)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:267)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:435)
at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:310)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:595)
at org.apache.hadoop.hbase.client.ResultScanner.next(ResultScanner.java:97)
at org.apache.hadoop.hbase.thrift.ThriftHBaseServiceHandler.scannerGetList(ThriftHBaseServiceHandler.java:858)
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
....
....
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60115: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991 row '' on table 'a-hbase-table' at region=a-hbase-table,,1598840250675.94c031e70c63dbb0f4726251987eb4ec., hostname=rs.host.500,16020,1645349772604, seqNum=550418
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:383)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:91)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:414)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
... 1 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991
at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
... 4 more
We increased the hbase.regionserver.handler.count property from 30 to 48, and hbase.rpc.timeout from 60 seconds to 90 seconds.
Note that, I already checked the RS logs which were shown in the logs but I haven't found any issue there. But, we still see the above error occurring. Also, though the RPM timeout is set to 90 seconds, still the timeout is showing 60 seconds.
Can you please share a solution for this?
Best Regards
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive