Member since
05-21-2021
32
Posts
0
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
58 | 06-23-2022 01:06 AM | |
184 | 04-22-2022 02:24 AM | |
872 | 03-29-2022 01:20 AM |
06-23-2022
01:06 AM
Update: After the restart of the Cluster the issue went away. All good now.
... View more
06-23-2022
12:30 AM
Hello @araujo, by the time I logged in to the node to check the entropy_avail value it became good, this issue seems to resolve fast as from the Cloudera Alert mail I can see good status within the next minute after this issue occurred. Also, from the screenshot attached you can the value was 1.
... View more
06-22-2022
01:09 PM
Hello Team, We are seeing a frequent entropy issue in our customer cluster as shown below. /proc/sys/kernel/random/entropy_avail returns 3754. We also installed the rng-tools in all the nodes and the service rngd is also running. After checking the document https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/installation/topics/cdpdc-data-at-rest-encryption-requirements.html#pnavId1 we can see the ExecStart looks as follows ExecStart=/sbin/rngd -f -r /dev/urandom However, our ExecStart is the default one and looks as follows ExecStart=/sbin/rngd -f can you please share if updating the ExecStart will solve the issue? Best Regards Sayed Anisul Hoque
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
06-17-2022
04:10 AM
Hello @ywu Thank you for the links. This one helps. So, if I understand correctly, there is not much we can do to control the size of the logs from the YARN but from the application itself since the application log files will continue to grow until the disk gets filled and the NodeManager goes into the decommissioned state, is it right?
... View more
06-17-2022
03:56 AM
Hello Team, We recently upgraded the CM from version 7.2.X to 7.6.1. Since this is a production cluster we didn't do a restart of Cluster yet. However, we are getting alert with YARN HistoryServer. After checking the logs, we couldn't see any issue from the YARN HistoryServer but we found issues in the Cloudera agent. Please check the logs as shown below. [16/Jun/2022 07:50:44 +0200] 3249 GM JOBHISTORY throttling_logger ERROR (4 skipped) Error fetching metrics at 'https://xxx.xxx.com:19890/jmx'
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/generic/metric_collectors.py", line 223, in _collect_and_parse_and_return
self._adapter.safety_valve))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/url.py", line 305, in urlopen_with_retry_on_authentication_errors
return function()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/generic/metric_collectors.py", line 245, in _open_url
cipher_list=self._tls_cipher_list)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/url.py", line 104, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 469, in error
result = self._call_chain(*args)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 203, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 127, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 143, in retry_http_kerberos_auth
resp = self.parent.open(req)
File "/usr/lib64/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 360, in http_error_default
raise e
HTTPError: HTTP Error 403: Forbidden Could you please share what can we do to fix the error? Best Regards Sayed Anisul Hoque
... View more
Labels:
06-13-2022
04:56 AM
Hello Team, We had a situation where one application consumed over 1 TB of disk space which eventually flooded the disk space. We had to kill this application for freeing the space on this disk. Due to this not happening in the future, we want to limit the storage consumption of the YARN application. Could you please share how to configure this? Best Regards
... View more
Labels:
- Labels:
-
Apache YARN
04-27-2022
03:34 AM
Hello Team, In a customer cluster facing the issue with too many open file descriptors. The CDP version is 7.1.5 Bad : Open file descriptors: 31,364. File descriptor limit: 32,768. Percentage in use: 95.72%. Critical threshold: 70.00%. Can you please share how to mitigate this issue? Is it okay to increase the Maximum Process file descriptors and what would be the recommended value? Best Regards
... View more
- Tags:
- CDP
- descriptors
- impala
Labels:
04-22-2022
02:24 AM
The issue was resolved. The problem was the directory owner and group in the subfolders of /var/lib/cloudera-scm-server. The owner and the group need to be cloudera-scm:cloudera-scm, somehow these values changed to root:root.
... View more
04-21-2022
07:18 AM
The logs from the CM agent on the host doing the task are shown below. [21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Launching process. one-off True, command dr/precopylistingcheck.sh, args [u'-bandwidth', u'100', u'-i', u'-m', u'20', u'-prbugpa', u'-skipAclErr', u'-update', u'-proxyuser', u'hbackup', u'-log', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp/2022-04-21_9975', u'-sequenceFilePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/fileList.seq', u'-diffRenameDeletePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/renamesDeletesList.seq', u'-sourceconf', u'source-client-conf', u'-sourceprincipal', u'hdfs/SOURCE_HOSTNAME', u'-sourcetktcache', u'source.tgt', u'-copyListingOnSource', u'-useSnapshots', u'distcp-33--26584462', u'-ignoreSnapshotFailures', u'-diff', u'-useDistCpFileStatus', u'-replaceNameservice', u'-strategy', u'dynamic', u'-filters', u'exclusion-filter.list', u'-scheduleId', u'33', u'-scheduleName', u'test-copy', u'/test-prod2-copy', u'/test-prod2-copy']
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor WARNING Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 2815-hdfs-precopylistingcheck-40444302'>)
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic audit plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Creating metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using specific metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin audit plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue throttling_logger INFO (22 skipped) Scheduling a refresh for Audit Plugin for hdfs-precopylistingcheck-40444302 with count 1 pipelines names [''].
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin metadata plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Not creating a monitor for 2815-hdfs-precopylistingcheck-40444302: should_monitor returns false
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Daemon refresh complete for process 2815-hdfs-precopylistingcheck-40444302.
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin navigator_plugin INFO Pipelines updated for Metadata Plugin: []
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin throttling_logger INFO (22 skipped) Refreshing Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:09 +0200] 1697 Audit-Plugin navigator_plugin INFO Pipelines updated for Audit Plugin: []
[21/Apr/2022 15:55:10 +0200] 1697 MainThread process INFO [2815-hdfs-precopylistingcheck-40444302] Unregistered supervisor process EXITED
[21/Apr/2022 15:55:10 +0200] 1697 MainThread supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:10 +0200] 1697 MainThread throttling_logger INFO Removed keytab /var/run/cloudera-scm-agent/process/2815-hdfs-precopylistingcheck-40444302/hdfs.keytab as a candidate to kinit from
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'running': (True, False), u'run_generation': (1, 5)}
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:29 +0200] 1697 Metadata-Plugin navigator_plugin INFO stopping Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:29 +0200] 1697 Audit-Plugin navigator_plugin INFO stopping Audit Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (5, 8)}
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (8, 11)}
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (11, 15)}
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (15, 19)}
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (19, 23)}
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors The below logs keeps repeating [21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
... View more
04-21-2022
03:29 AM
Hello Team, In our customer cluster, we are testing the HDFS replication through Cloudera Manager. The replication policy looks as follows. All the other configuration is the default. The replication is hung in the below state for a long time. We looked into the Cloudera Manager logs and we can see the below error repeatedly occurring. Can you please help us to resolve the issue? 2022-04-21 12:27:57,199 ERROR CommandPusher-1:com.cloudera.cmf.service.AgentResultFetcher: Exception occured while handling tempfile com.cloudera.cmf.service.AgentResultFetcher@618eac09 Best Regards Sayed Anisul Hoque
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
-
HDFS
04-19-2022
04:06 PM
@Bharati Thank you! This worked. However, could you please share which logs had shown that it was trying to copy the system database and information_schema?
... View more
04-19-2022
08:54 AM
Hello Team, We are setting the Hive replication through Cloudera Manager. The replication policy looks as follows. Note that we also enabled the snapshot on the source cluster for the path - /warehouse However, when we press save policy then we get the below notification. We looked into the Cloudera Manager logs and we can see the below error. Can you please help us to get the correct configuration to resolve the issue? 2022-04-19 15:51:07,848 ERROR scm-web-1686:com.cloudera.server.web.cmf.WebController: getHiveWarehouseSnapshotsEnabled
javax.ws.rs.NotAuthorizedException: HTTP 401 Unauthorized
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.cxf.jaxrs.client.AbstractClient.convertToWebApplicationException(AbstractClient.java:507)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.checkResponse(ClientProxyImpl.java:324)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.handleResponse(ClientProxyImpl.java:878)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:791)
....
....
....
....
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:750)
2022-04-19 15:51:07,849 ERROR scm-web-1686:com.cloudera.server.web.common.JsonResponse: JsonResponse created with throwable:
javax.ws.rs.NotAuthorizedException: HTTP 401 Unauthorized
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.cxf.jaxrs.client.AbstractClient.convertToWebApplicationException(AbstractClient.java:507)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.checkResponse(ClientProxyImpl.java:324)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.handleResponse(ClientProxyImpl.java:878)
at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:791)
... View more
- Tags:
- cloudera-manager
- Hive
Labels:
03-29-2022
01:20 AM
With the help of @mszurap we could narrow down the issue. There are 2 issues, the first one was coming due to the OOM and the second one was from the application itself. Below are some of the logs that we noticed during the Oozie job run. 22/03/23 13:18:54 INFO mapred.SparkHadoopMapRedUtil: attempt_20220323131847_0000_m_000000_0: Committed
22/03/23 13:18:54 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1384 bytes result sent to driver
22/03/23 13:19:55 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
22/03/23 13:19:55 INFO storage.DiskBlockManager: Shutdown hook called
22/03/23 13:19:55 INFO util.ShutdownHookManager: Shutdown hook called from Miklos, "executor.Executor" ... "RECEIVED SIGNAL TERM" is completely normal that an executor is killed by the AM/Driver. Since the Spark job was succeeding in the lower environments (like Dev/Test) the suggestion was to check if the application is using the same dependencies too in lower environments (get the Spark event logs for the good and bad run). Also to check the driver YARN logs, there could be a possibility of some abrupt exit due to an OOM. We then looked in the direction of the OOM, and also checked if there were no System.exit() calls in the Spark code. We updated the driver memory to 2GB and ran the job, now we can see the actual error (the error from the application). Hope this helps someone in the future.
... View more
03-25-2022
10:02 AM
Hello Team,
Spark job through Oozie is failing with the below exception in the Prod cluster. Note that the same job passes in the lower clusters (f.e. Dev)
22/03/24 05:05:22 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/03/24 05:05:22 ERROR scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:627)
at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:583)
at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:134)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:145)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:145)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:145)
at org.apache.spark.scheduler.EventLoggingListener.onApplicationEnd(EventLoggingListener.scala:191)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1231)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
22/03/24 05:05:22 INFO server.AbstractConnector: Stopped Spark@12bcc45b{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
22/03/24 05:05:22 INFO ui.SparkUI: Stopped Spark web UI at http://xxxxxxxxxxxxxxx:33687
22/03/24 05:05:22 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
22/03/24 05:05:22 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/03/24 05:05:22 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
22/03/24 05:05:22 ERROR util.Utils: Uncaught exception in thread shutdown-hook-0
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1685)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1745)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1742)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1757)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723)
at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:249)
at org.apache.spark.SparkContext$$anonfun$stop$9$$anonfun$apply$mcV$sp$7.apply(SparkContext.scala:1966)
at org.apache.spark.SparkContext$$anonfun$stop$9$$anonfun$apply$mcV$sp$7.apply(SparkContext.scala:1966)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext$$anonfun$stop$9.apply$mcV$sp(SparkContext.scala:1966)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1269)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1965)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:578)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1874)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
22/03/24 05:05:22 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/03/24 05:05:22 INFO memory.MemoryStore: MemoryStore cleared
22/03/24 05:05:22 INFO storage.BlockManager: BlockManager stopped
22/03/24 05:05:22 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/03/24 05:05:22 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/03/24 05:05:22 INFO spark.SparkContext: Successfully stopped SparkContext
22/03/24 05:05:22 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
22/03/24 05:05:22 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
Already looked into the NameNode logs and couldn't find any ERROR regarding this. Please help in resolving the issue.
Best Regards
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
03-24-2022
03:34 AM
@Scharan Can you please give a short explanation as my customer is asking for it as to why shadow file matters in this case i.e. what's the relation with Knox with shadow file? Thank you!
... View more
03-24-2022
03:22 AM
Yes, that resolved the issue! I had 000 as my permission. Thank you @Scharan I appreciate the quick reply.
... View more
03-24-2022
03:05 AM
Hello Team, I have an issue with setting the Knox authentication with PAM. I have the default login in /etc/pam.d/ $ cat /etc/pam.d/login
#%PAM-1.0
auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.so
auth substack system-auth
auth include postlogin
account required pam_nologin.so
account include system-auth
password include system-auth
# pam_selinux.so close should be the first session rule
session required pam_selinux.so close
session required pam_loginuid.so
session optional pam_console.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session required pam_selinux.so open
session required pam_namespace.so
session optional pam_keyinit.so force revoke
session include system-auth
session include postlogin
-session optional pam_ck_connector.so Knox-sso looks as following (the default one) I created a user named - test with a password. I tried to access the Knox Gateway UI but I get the issue. The Knox Gateway log says: (KnoxPamRealm.java:handleAuthFailure(170)) - Shiro unable to login: null Note: I am using CDP 7.1.6 and I can login to my host (where Knox Gateway is installed) using the test user. Also, there's no Kerberos setup. Please share if there's something that needs to be adjusted. Best Regards Sayed
... View more
Labels:
03-02-2022
02:39 AM
Hello Team,
We are experiencing an issue with tables for Hive on HBase where we get the following issue, Note: The pure Hive tables and query on HBase are working fine.
Failed after attempts=11, exceptions:
2022-02-28T12:12:54.719Z, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60115: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991 row '' on table 'a-hbase-table' at region=a-hbase-table,,1598840250675.94c031e70c63dbb0f4726251987eb4ec., hostname=rs.host.500,16020,1645349772604, seqNum=550418
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=11, exceptions:
2022-02-28T12:12:54.719Z, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60115: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991 row '' on table 'a-hbase-table' at region=a-hbase-table,,1598840250675.94c031e70c63dbb0f4726251987eb4ec., hostname=rs.host.500,16020,1645349772604, seqNum=550418
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:299)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:251)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:267)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:435)
at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:310)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:595)
at org.apache.hadoop.hbase.client.ResultScanner.next(ResultScanner.java:97)
at org.apache.hadoop.hbase.thrift.ThriftHBaseServiceHandler.scannerGetList(ThriftHBaseServiceHandler.java:858)
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
....
....
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60115: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991 row '' on table 'a-hbase-table' at region=a-hbase-table,,1598840250675.94c031e70c63dbb0f4726251987eb4ec., hostname=rs.host.500,16020,1645349772604, seqNum=550418
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to rs.host.500/rs.host.ip.500:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:383)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:91)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:414)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
... 1 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=5,methodName=Scan], waitTime=60002, rpcTimeout=59991
at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
... 4 more
We increased the hbase.regionserver.handler.count property from 30 to 48, and hbase.rpc.timeout from 60 seconds to 90 seconds.
Note that, I already checked the RS logs which were shown in the logs but I haven't found any issue there. But, we still see the above error occurring. Also, though the RPM timeout is set to 90 seconds, still the timeout is showing 60 seconds.
Can you please share a solution for this?
Best Regards
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
11-01-2021
08:21 AM
Hello @nthomas Sorry for the delayed reply. I set the LDAP user search filter and LDAP user search base to the Cloudera Manager > Settings. By setting these values I could block the users from showing the cluster information and the settings but I couldn’t completely block the users from logging in. The main intention was to block the users from logging in. Do you know how can I block the users completely?
... View more
10-21-2021
03:33 AM
Hello @smdas Thank you for your detailed reply. We looked into the ZooKeeper logs and couldn't find any issue there. After [2] for RangerAudits Shard1 Replica1 to kept showing the same error couple of times and then the Solr server stopped. We investigated this further and found that there was a long GC pause during that time due to which the application (Solr server) lost connection with the ZooKeeper and started throwing the error. We have increased the zkClientTimeout to 30 seconds and restarted the Solr service. We can see that a leader is elected for the collection. Version: CDP 7.1.6 Thanks
... View more
10-20-2021
04:18 AM
Dear team, We are facing the below issue on one of the Solr nodes. 2021-10-17 04:05:57.006 ERROR (qtp1916575798-2477) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
Cannot talk to ZooKeeper - Updates are disabled.
at org.apache.solr.update.processor.DistributedZkUpdateProcessor.zkCheck(DistributedZkUpdateProcessor.java:1245)
at org.apache.solr.update.processor.DistributedZkUpdateProcessor.setupRequest(DistributedZkUpdateProcessor.java:582)
at org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:239)
at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:477)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118) However, after some time Solr server is able to reconnect. 2021-10-17 04:05:57.028 WARN (Thread-2414) [ ] o.a.z.Login TGT renewal thread has been interrupted and will exit.
2021-10-17 04:05:57.043 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.c.ConnectionManager zkClient has connected
2021-10-17 04:05:57.043 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
2021-10-17 04:05:57.043 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing core states after session expiration.
2021-10-17 04:05:57.047 WARN (qtp1916575798-2461) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.h.s.a.u.KerberosName auth_to_local rule mechanism not set.Using default of hadoop
2021-10-17 04:05:57.072 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (3) -> (2)
2021-10-17 04:05:57.085 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.Overseer Overseer (id=72334547140450792-192.168.0.17:8985_solr-n_0000000153) closing
2021-10-17 04:05:57.085 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.Overseer Overseer (id=72334547140450792-192.168.0.17:8985_solr-n_0000000153) closing
2021-10-17 04:05:57.085 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.Overseer Overseer (id=72334547140450792-192.168.0.17:8985_solr-n_0000000153) closing
2021-10-17 04:05:57.087 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.Overseer Overseer (id=72334547140450792-192.168.0.17:8985_solr-n_0000000153) closing
2021-10-17 04:05:57.089 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.ZkController Publish node=192.168.0.17:8985_solr as DOWN
2021-10-17 04:05:57.093 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/192.168.0.17:8985_solr
2021-10-17 04:05:57.097 INFO (zkCallback-10-thread-28) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/ranger_audits/state.json] for collection [ranger_audits] has occurred - updating... (live nodes size: [2])
2021-10-17 04:05:57.098 INFO (coreZkRegister-1-thread-5) [ ] o.a.s.c.ZkController Registering core ranger_audits_shard1_replica_n1 afterExpiration? true
2021-10-17 04:05:57.099 INFO (coreZkRegister-1-thread-6) [ ] o.a.s.s.ZkIndexSchemaReader Creating ZooKeeper watch for the managed schema at /configs/ranger_audits/managed-schema
2021-10-17 04:05:57.099 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.c.DefaultConnectionStrategy Reconnected to ZooKeeper
2021-10-17 04:05:57.099 INFO (zkConnectionManagerCallback-11-thread-1-EventThread) [ ] o.a.s.c.c.ConnectionManager zkClient Connected:true
2021-10-17 04:05:57.102 INFO (zkCallback-10-thread-24) [ ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (2) -> (3)
2021-10-17 04:05:57.102 INFO (Thread-2418) [ ] o.a.s.c.SolrCore config update listener called for core ranger_audits_shard1_replica_n1
2021-10-17 04:05:57.103 INFO (coreZkRegister-1-thread-6) [ ] o.a.s.s.ZkIndexSchemaReader Current schema version 0 is already the latest
2021-10-17 04:05:57.109 INFO (coreZkRegister-1-thread-5) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.c.ShardLeaderElectionContextBase make sure parent is created /collections/ranger_audits/leaders/shard1
2021-10-17 04:05:57.114 INFO (coreZkRegister-1-thread-5) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.c.ShardLeaderElectionContext Enough replicas found to continue.
2021-10-17 04:05:57.114 INFO (coreZkRegister-1-thread-5) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.c.ShardLeaderElectionContext I may be the new leader - try and sync
2021-10-17 04:05:57.114 INFO (coreZkRegister-1-thread-5) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.c.SyncStrategy Sync replicas to https://192.168.0.17:8985/solr/ranger_audits_shard1_replica_n1/
2021-10-17 04:05:57.114 INFO (coreZkRegister-1-thread-5) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.c.SyncStrategy Sync Success - now sync replicas to me
2021-10-17 04:05:57.114 INFO (coreZkRegister-1-thread-5) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.c.SyncStrategy https://192.168.0.17:8985/solr/ranger_audits_shard1_replica_n1/ has no replicas
2021-10-17 04:05:57.114 INFO (coreZkRegister-1-thread-5) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.c.ShardLeaderElectionContextBase Creating leader registration node /collections/ranger_audits/leaders/shard1/leader after winning as /collections/ranger_audits/leader_elect/shard1/election/216449719079380911-core_node2-n_0000000061 But these keep on repeating and after around 10 minutes, we see the below error and the Solr server finally gives up. 2021-10-17 04:14:25.112 ERROR (qtp1916575798-2487) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.u.p.DistributedZkUpdateProcessor ClusterState says we are
the leader, but locally we don't think so
2021-10-17 04:14:25.112 ERROR (qtp1916575798-2325) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.u.p.DistributedZkUpdateProcessor ClusterState says we are
the leader, but locally we don't think so
2021-10-17 04:14:25.114 INFO (qtp1916575798-2487) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.u.p.LogUpdateProcessorFactory [ranger_audits_shard1_replic
a_n1] webapp=/solr path=/update params={wt=javabin&version=2}{} 0 36703
2021-10-17 04:14:25.114 WARN (qtp1916575798-2492) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.h.s.a.u.KerberosName auth_to_local rule mechanism not set.Us
ing default of hadoop
2021-10-17 04:14:25.116 INFO (qtp1916575798-2325) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.u.p.LogUpdateProcessorFactory [ranger_audits_shard1_replic
a_n1] webapp=/solr path=/update params={wt=javabin&version=2}{} 0 36707
2021-10-17 04:14:37.503 WARN (Thread-2474) [ ] o.a.z.Login TGT renewal thread has been interrupted and will exit.
2021-10-17 04:14:37.504 ERROR (qtp1916575798-2325) [c:ranger_audits s:shard1 r:core_node2 x:ranger_audits_shard1_replica_n1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ClusterState says we are the leader (https://192.168.0.17:8985/solr/ranger_audits_shard1_replica_n1), but locally we don't think so. Request came from null
at org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDefensiveChecks(DistributedZkUpdateProcessor.java:1017)
at org.apache.solr.update.processor.DistributedZkUpdateProcessor.setupRequest(DistributedZkUpdateProcessor.java:655)
at org.apache.solr.update.processor.DistributedZkUpdateProcessor.setupRequest(DistributedZkUpdateProcessor.java:593)
at org.apache.solr.update.processor.DistributedZkUpdateProcessor.setupRequest(DistributedZkUpdateProcessor.java:585) Please help to resolve this issue. Thanks
... View more
Labels:
10-11-2021
09:27 AM
Hello Team, I have a requirement to apply specific filters for user login on Cloudera Manager. I came across a configuration setting that allows for including an external authentication script. But I am not clear on how the script should look like. https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/security-kerberos-authentication/topics/cm-security-external-authentication.html . Does anybody have an idea? Thanks
... View more
Labels:
09-16-2021
08:17 AM
@asish Sorry for the delayed reply. We only saw this error in the log, however, I looked into the logs again later today and couldn't find this - "under construction" in the logs anymore. Is there any workaround for this so that in the future this doesn't happen?
... View more
09-15-2021
05:04 AM
@balajip I set the config in the wrong service. After setting the config on Hive service it worked. Thank you.
... View more
09-15-2021
01:17 AM
Thank you @balajip. I tried the solution. Updated the config on Hive on Tez, but still, I am getting the issue. The full stack trace is given below. [HiveServer2-Handler-Pool: Thread-114015]: Error fetching results:
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@4fcc96f rejected from java.util.concurrent.ThreadPoolExecutor@66bee5ac[Shutting down, pool size = 162, active threads = 0, queued tasks = 0, completed tasks = 5297]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:476) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:946) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:567) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:798) [hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837) [hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822) [hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:654) [hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: java.io.IOException: java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@4fcc96f rejected from java.util.concurrent.ThreadPoolExecutor@66bee5ac[Shutting down, pool size = 162, active threads = 0, queued tasks = 0, completed tasks = 5297]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:638) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
... 13 more
Caused by: java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@4fcc96f rejected from java.util.concurrent.ThreadPoolExecutor@66bee5ac[Shutting down, pool size = 162, active threads = 0, queued tasks = 0, completed tasks = 5297]
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:200) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:267) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:435) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:310) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:595) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:211) ~[hbase-mapreduce-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:133) ~[hbase-mapreduce-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:219) ~[hbase-mapreduce-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:140) ~[hive-hbase-handler-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:101) ~[hive-hbase-handler-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:605) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
... 13 more
Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@4fcc96f rejected from java.util.concurrent.ThreadPoolExecutor@66bee5ac[Shutting down, pool size = 162, active threads = 0, queued tasks = 0, completed tasks = 5297]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_242]
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService.submit(ResultBoundedCompletionService.java:171) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.addCallsForCurrentReplica(ScannerCallableWithReplicas.java:329) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:191) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:267) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:435) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:310) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:595) ~[hbase-client-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:211) ~[hbase-mapreduce-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:133) ~[hbase-mapreduce-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:219) ~[hbase-mapreduce-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:140) ~[hive-hbase-handler-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:101) ~[hive-hbase-handler-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:605) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
... 13 more
... View more
09-14-2021
04:18 AM
On CDP we are using both the Hive (for the Hive Metastores) and Hive on Tez (for the HiveServers). We are getting the below error while trying to run a query based on a condition. I can't share the table information and the exact query but it looks something as below. CREATE EXTERNAL TABLE IF NOT EXISTS XXX (
`1` string,
`6` varchar(30),
`7` varchar(5),
`8` varchar(10)
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,
xx:1,
xx:5,
xx:6')
TBLPROPERTIES (
'hbase.table.name'='YYYYY'
); the query looks as follows: select * from XXX where 8 = '1990-10-10'; And we see the below error from the HiveServer [a3ed3b7b-d225-43af-9ac0-76917911a742 HiveServer2-Handler-Pool: Thread-128-EventThread]: Error while calling watcher
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@1d7573cd rejected from java.util.concurrent.ThreadPoolExecutor@194ae4bb[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_242]
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[?:1.8.0_242]
at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678) ~[?:1.8.0_242]
at org.apache.hadoop.hbase.zookeeper.ZKWatcher.process(ZKWatcher.java:541) ~[hbase-zookeeper-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40) ~[hbase-zookeeper-2.2.3.7.1.6.0-297.jar:2.2.3.7.1.6.0-297]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535) [zookeeper-3.5.5.7.1.6.0-297.jar:3.5.5.7.1.6.0-297]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) [zookeeper-3.5.5.7.1.6.0-297.jar:3.5.5.7.1.6.0-297] We added the below config on the HiveServer (based on this: https://community.cloudera.com/t5/Support-Questions/HIVE-concurrency-request-erreur-when-run-several-same/td-p/319166) but still, we are getting the issue. <property>
<name>hive.server2.parallel.ops.in.session</name>
<value>true</value>
</property>
... View more
Labels:
09-07-2021
02:02 AM
Dear team, We are getting the below error in the CDP - 7.1.6 HiveServer logs. Can you please share what's the cause of this issue and any possible solution? 2021-09-03 13:12:25,571 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-16886]: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get checks
um, since file /warehouse/tablespace/managed/hive/xxxxx/xxxxx/xxxxx/xxxxx/delta_0000003_0000003_0000/xxxxx.xxxxx is under construction.
2021-09-03 13:12:25,571 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-16886]: Completed executing command(queryId=hive_20210903131225_70117bf2-c60f-4564-83e9-8a60be421f63); Time taken: 0.12 seconds
2021-09-03 13:12:25,572 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-16886]: OK
2021-09-03 13:12:25,572 INFO org.apache.hadoop.hive.ql.lockmgr.DbTxnManager: [HiveServer2-Background-Pool: Thread-16886]: Stopped heartbeat for query: hive_20210903131225_70117bf2-c60f-4564-83e9-8a60be421f63
2021-09-03 13:12:25,578 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-16886]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get checksum, since file /warehouse/tablespace/managed/hive/xxxxx/xxxxx/xxxxx/xxxxx/delta_0000003_0000003_0000/xxxxx.xxxxx is under construction.
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:362) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:241) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) [hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_242]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_242]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) [hadoop-common-3.1.1.7.1.6.0-297.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) [hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_242]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Fail to get checksum, since file /warehouse/tablespace/managed/hive/xxxxx/xxxxx/xxxxx/xxxxx/delta_0000003_0000003_0000/xxxxx.xxxxx is under construction.
at org.apache.hadoop.hive.ql.metadata.Hive.addWriteNotificationLog(Hive.java:3509) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2245) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:515) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:432) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:742) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:497) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:491) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) ~[hive-exec-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) ~[hive-service-3.1.3000.7.1.6.0-297.jar:3.1.3000.7.1.6.0-297]
... 13 more
... View more
Labels:
05-26-2021
04:21 AM
I edited my solution above a bit. We found that the issue was related to some kind of routing from Oozie WF to YARN logs. What we wanted was to view the logs from the Oozie WF manager. When we access the logs from the YARN RM UI it works, but we couldn't able to view the logs directly from the Oozie WF manager. We already have the correct configurations present in the MapReduce service.
... View more
05-26-2021
03:43 AM
Hello @Scharan and @Shelton Thank you for the reply. Please note that there are groups too and the group name should be separated from the user name with space or else all will be treated as users. Source: https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/yarn-security/topics/yarn-admin-acl.html
... View more