Member since
05-21-2021
34
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
802 | 06-23-2022 01:06 AM | |
1791 | 04-22-2022 02:24 AM | |
8947 | 03-29-2022 01:20 AM |
04-21-2022
07:18 AM
The logs from the CM agent on the host doing the task are shown below. [21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Launching process. one-off True, command dr/precopylistingcheck.sh, args [u'-bandwidth', u'100', u'-i', u'-m', u'20', u'-prbugpa', u'-skipAclErr', u'-update', u'-proxyuser', u'hbackup', u'-log', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp/2022-04-21_9975', u'-sequenceFilePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/fileList.seq', u'-diffRenameDeletePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/renamesDeletesList.seq', u'-sourceconf', u'source-client-conf', u'-sourceprincipal', u'hdfs/SOURCE_HOSTNAME', u'-sourcetktcache', u'source.tgt', u'-copyListingOnSource', u'-useSnapshots', u'distcp-33--26584462', u'-ignoreSnapshotFailures', u'-diff', u'-useDistCpFileStatus', u'-replaceNameservice', u'-strategy', u'dynamic', u'-filters', u'exclusion-filter.list', u'-scheduleId', u'33', u'-scheduleName', u'test-copy', u'/test-prod2-copy', u'/test-prod2-copy']
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor WARNING Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 2815-hdfs-precopylistingcheck-40444302'>)
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic audit plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Creating metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using specific metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin audit plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue throttling_logger INFO (22 skipped) Scheduling a refresh for Audit Plugin for hdfs-precopylistingcheck-40444302 with count 1 pipelines names [''].
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin metadata plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Not creating a monitor for 2815-hdfs-precopylistingcheck-40444302: should_monitor returns false
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Daemon refresh complete for process 2815-hdfs-precopylistingcheck-40444302.
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin navigator_plugin INFO Pipelines updated for Metadata Plugin: []
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin throttling_logger INFO (22 skipped) Refreshing Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:09 +0200] 1697 Audit-Plugin navigator_plugin INFO Pipelines updated for Audit Plugin: []
[21/Apr/2022 15:55:10 +0200] 1697 MainThread process INFO [2815-hdfs-precopylistingcheck-40444302] Unregistered supervisor process EXITED
[21/Apr/2022 15:55:10 +0200] 1697 MainThread supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:10 +0200] 1697 MainThread throttling_logger INFO Removed keytab /var/run/cloudera-scm-agent/process/2815-hdfs-precopylistingcheck-40444302/hdfs.keytab as a candidate to kinit from
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'running': (True, False), u'run_generation': (1, 5)}
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:29 +0200] 1697 Metadata-Plugin navigator_plugin INFO stopping Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:29 +0200] 1697 Audit-Plugin navigator_plugin INFO stopping Audit Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (5, 8)}
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (8, 11)}
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (11, 15)}
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (15, 19)}
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (19, 23)}
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors The below logs keeps repeating [21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
... View more
04-21-2022
03:29 AM
Hello Team, In our customer cluster, we are testing the HDFS replication through Cloudera Manager. The replication policy looks as follows. All the other configuration is the default. The replication is hung in the below state for a long time. We looked into the Cloudera Manager logs and we can see the below error repeatedly occurring. Can you please help us to resolve the issue? 2022-04-21 12:27:57,199 ERROR CommandPusher-1:com.cloudera.cmf.service.AgentResultFetcher: Exception occured while handling tempfile com.cloudera.cmf.service.AgentResultFetcher@618eac09 Best Regards Sayed Anisul Hoque
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
-
HDFS
04-19-2022
04:06 PM
@Bharati Thank you! This worked. However, could you please share which logs had shown that it was trying to copy the system database and information_schema?
... View more
04-19-2022
08:54 AM
Hello Team, We are setting the Hive replication through Cloudera Manager. The replication policy looks as follows. Note that we also enabled the snapshot on the source cluster for the path - /warehouse However, when we press save policy then we get the below notification. We looked into the Cloudera Manager logs and we can see the below error. Can you please help us to get the correct configuration to resolve the issue? 2022-04-19 15:51:07,848 ERROR scm-web-1686:com.cloudera.server.web.cmf.WebController: getHiveWarehouseSnapshotsEnabled
javax.ws.rs.NotAuthorizedException: HTTP 401 Unauthorized
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.cxf.jaxrs.client.AbstractClient.convertToWebApplicationException(AbstractClient.java:507)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.checkResponse(ClientProxyImpl.java:324)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.handleResponse(ClientProxyImpl.java:878)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:791)
....
....
....
....
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:750)
2022-04-19 15:51:07,849 ERROR scm-web-1686:com.cloudera.server.web.common.JsonResponse: JsonResponse created with throwable:
javax.ws.rs.NotAuthorizedException: HTTP 401 Unauthorized
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.cxf.jaxrs.client.AbstractClient.convertToWebApplicationException(AbstractClient.java:507)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.checkResponse(ClientProxyImpl.java:324)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.handleResponse(ClientProxyImpl.java:878)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:791)
... View more
Labels:
03-29-2022
01:20 AM
With the help of @mszurap we could narrow down the issue. There are 2 issues, the first one was coming due to the OOM and the second one was from the application itself. Below are some of the logs that we noticed during the Oozie job run. 22/03/23 13:18:54 INFO mapred.SparkHadoopMapRedUtil: attempt_20220323131847_0000_m_000000_0: Committed
22/03/23 13:18:54 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1384 bytes result sent to driver
22/03/23 13:19:55 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
22/03/23 13:19:55 INFO storage.DiskBlockManager: Shutdown hook called
22/03/23 13:19:55 INFO util.ShutdownHookManager: Shutdown hook called from Miklos, "executor.Executor" ... "RECEIVED SIGNAL TERM" is completely normal that an executor is killed by the AM/Driver. Since the Spark job was succeeding in the lower environments (like Dev/Test) the suggestion was to check if the application is using the same dependencies too in lower environments (get the Spark event logs for the good and bad run). Also to check the driver YARN logs, there could be a possibility of some abrupt exit due to an OOM. We then looked in the direction of the OOM, and also checked if there were no System.exit() calls in the Spark code. We updated the driver memory to 2GB and ran the job, now we can see the actual error (the error from the application). Hope this helps someone in the future.
... View more
03-25-2022
10:02 AM
Hello Team,
Spark job through Oozie is failing with the below exception in the Prod cluster. Note that the same job passes in the lower clusters (f.e. Dev)
22/03/24 05:05:22 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/03/24 05:05:22 ERROR scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:627)
at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:583)
at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:134)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:145)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:145)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:145)
at org.apache.spark.scheduler.EventLoggingListener.onApplicationEnd(EventLoggingListener.scala:191)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1231)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
22/03/24 05:05:22 INFO server.AbstractConnector: Stopped Spark@12bcc45b{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
22/03/24 05:05:22 INFO ui.SparkUI: Stopped Spark web UI at http://xxxxxxxxxxxxxxx:33687
22/03/24 05:05:22 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
22/03/24 05:05:22 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/03/24 05:05:22 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
22/03/24 05:05:22 ERROR util.Utils: Uncaught exception in thread shutdown-hook-0
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1685)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1745)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1742)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1757)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723)
at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:249)
at org.apache.spark.SparkContext$$anonfun$stop$9$$anonfun$apply$mcV$sp$7.apply(SparkContext.scala:1966)
at org.apache.spark.SparkContext$$anonfun$stop$9$$anonfun$apply$mcV$sp$7.apply(SparkContext.scala:1966)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext$$anonfun$stop$9.apply$mcV$sp(SparkContext.scala:1966)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1269)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1965)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:578)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1874)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
22/03/24 05:05:22 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/03/24 05:05:22 INFO memory.MemoryStore: MemoryStore cleared
22/03/24 05:05:22 INFO storage.BlockManager: BlockManager stopped
22/03/24 05:05:22 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/03/24 05:05:22 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/03/24 05:05:22 INFO spark.SparkContext: Successfully stopped SparkContext
22/03/24 05:05:22 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
22/03/24 05:05:22 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
Already looked into the NameNode logs and couldn't find any ERROR regarding this. Please help in resolving the issue.
Best Regards
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
03-24-2022
03:34 AM
@Scharan Can you please give a short explanation as my customer is asking for it as to why shadow file matters in this case i.e. what's the relation with Knox with shadow file? Thank you!
... View more
03-24-2022
03:22 AM
Yes, that resolved the issue! I had 000 as my permission. Thank you @Scharan I appreciate the quick reply.
... View more
03-24-2022
03:05 AM
Hello Team, I have an issue with setting the Knox authentication with PAM. I have the default login in /etc/pam.d/ $ cat /etc/pam.d/login
#%PAM-1.0
auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.so
auth substack system-auth
auth include postlogin
account required pam_nologin.so
account include system-auth
password include system-auth
# pam_selinux.so close should be the first session rule
session required pam_selinux.so close
session required pam_loginuid.so
session optional pam_console.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session required pam_selinux.so open
session required pam_namespace.so
session optional pam_keyinit.so force revoke
session include system-auth
session include postlogin
-session optional pam_ck_connector.so Knox-sso looks as following (the default one) I created a user named - test with a password. I tried to access the Knox Gateway UI but I get the issue. The Knox Gateway log says: (KnoxPamRealm.java:handleAuthFailure(170)) - Shiro unable to login: null Note: I am using CDP 7.1.6 and I can login to my host (where Knox Gateway is installed) using the test user. Also, there's no Kerberos setup. Please share if there's something that needs to be adjusted. Best Regards Sayed
... View more
Labels: