Member since
07-01-2015
460
Posts
77
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
766 | 11-26-2019 11:47 PM | |
696 | 11-25-2019 11:44 AM | |
5450 | 08-07-2019 12:48 AM | |
1287 | 04-17-2019 03:09 AM | |
2196 | 02-18-2019 12:23 AM |
11-26-2019
11:47 PM
1 Kudo
The solution is quite simple, was not aware that the service-wide configurations are not in roles but in services. So the solution is to use a ServicesResourceApi endpoint and read_service_config method. Something like this: def get_service_config(self, service_name):
"""Returns the configuration of the service"""
services_instance = cm_client.ServicesResourceApi(self.api)
view = 'summary'
try:
api_response = services_instance.read_service_config(
self.cluster_name, service_name, view=view)
return api_response.to_dict()
except ApiException as exception:
print(f"Exception when calling ServicesResourceApi->read_service_config: {exception}\n")
... View more
11-25-2019
11:44 AM
It looks like the java class com.cloudera.enterprise.dbutil.DbProvisioner expects that the user has superuser privilege on the PosgreSQL and thus the Create DB and Create Role is not enough (AWS RDS unfortunately does not provide superuser). I had to workaround the issue by creating the databases upfront.
... View more
11-25-2019
03:32 AM
Hi Cloudera,
the cloudera altus director cannot create a database(s) for CM and throws this error, even if it has a root user access to the external AWS RDS posgre database:
org.postgresql.util.PSQLException: ERROR: must be member of role "cmadmin_dkndwlxo"
I could not find any hint in the docs what this exact role means and why the root user must be member of this role.
Postgre error log:
2019-11-25 11:07:06 UTC:10.150.1.7(43878):dbroot@postgres:[12215]:ERROR: must be member of role "cmadmin_dkndwlxo"
2019-11-25 11:07:06 UTC:10.150.1.7(43878):dbroot@postgres:[12215]:STATEMENT: create database scm_75ilop0jdikuhinujsfs7l5m1n owner cmadmin_dkndwlxo encoding 'UTF8'
2019-11-25 11:07:17 UTC:10.150.1.7(43880):dbroot@postgres:[12313]:ERROR: must be member of role "cmadmin_wrhjespw"
2019-11-25 11:07:17 UTC:10.150.1.7(43880):dbroot@postgres:[12313]:STATEMENT: create database scm_38kegs9qab7j5l6hgqo069h3am owner cmadmin_wrhjespw encoding 'UTF8'
2019-11-25 11:07:28 UTC:10.150.1.7(43882):dbroot@postgres:[12422]:ERROR: must be member of role "cmadmin_kfelwpnh"
2019-11-25 11:07:28 UTC:10.150.1.7(43882):dbroot@postgres:[12422]:STATEMENT: create database scm_5vrk2jc93r9h4nq9n87c3majfp owner cmadmin_kfelwpnh encoding 'UTF8'
2019-11-25 11:07:48 UTC:10.150.1.7(43884):dbroot@postgres:[12703]:ERROR: must be member of role "cmadmin_xxyehrrb"
2019-11-25 11:07:48 UTC:10.150.1.7(43884):dbroot@postgres:[12703]:STATEMENT: create database scm_fprfmbk5dq8n7n659594goeukg owner cmadmin_xxyehrrb encoding 'UTF8'
2019-11-25 11:08:19 UTC:10.150.1.7(43886):dbroot@postgres:[13017]:ERROR: must be member of role "cmadmin_qgathjfw"
2019-11-25 11:08:19 UTC:10.150.1.7(43886):dbroot@postgres:[13017]:STATEMENT: create database scm_fo6j4rn05hdlrid3g0l584urjs owner cmadmin_qgathjfw encoding 'UTF8'
Postgre users:
test=> \du
List of roles
Role name | Attributes | Member of
------------------+------------------------------------------------+-----------------
cmadmin_dkndwlxo | | {}
cmadmin_kfelwpnh | | {}
cmadmin_qgathjfw | | {}
cmadmin_wrhjespw | | {}
cmadmin_xxyehrrb | | {}
dbroot | Create role, Create DB +| {rds_superuser}
| Password valid until infinity |
Any hints?
Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
11-20-2019
11:48 AM
Hi,
I am wondering if it is possible to get Service-Wide configurations via read_config method of the RoleConfigGroupsResourceApi class.
https://archive.cloudera.com/cm6/6.3.0/generic/jar/cm_api/swagger-html-sdk-docs/python/docs/RoleConfigGroupsResourceApi.html#read_config
The read_roles method of RolesResourceAPI returns theses roles:
CD-HDFS-eHtEMKVf-DATANODE-BASE
CD-HDFS-eHtEMKVf-SECONDARYNAMENODE-BASE
CD-HDFS-eHtEMKVf-HTTPFS-BASE
CD-HDFS-eHtEMKVf-DATANODE-BASE
CD-HDFS-eHtEMKVf-DATANODE-BASE
CD-HDFS-eHtEMKVf-NAMENODE-BASE
But when I query all these roles, I cannot find the Service-Wide property of Advanced configuration for core-site.xml
Reading configuration for CD-HDFS-eHtEMKVf-DATANODE-BASE
{'items': [{'default': None,
'description': 'For advanced use only, key-value pairs (one on '
"each line) to be inserted into a role's "
'environment. Applies to configurations of this '
'role except client configuration.',
'display_name': 'DataNode Environment Advanced Configuration '
'Snippet (Safety Valve)',
'name': 'DATANODE_role_env_safety_valve',
'related_name': '',
'required': False,
'sensitive': False,
'validation_message': None,
'validation_state': 'OK',
'validation_warnings_suppressed': False,
'value': None},
{'default': '{"critical":"never","warning":"1000000.0"}',
'description': 'The health test thresholds of the number of blocks '
'on a DataNode',
'display_name': 'DataNode Block Count Thresholds',
'name': 'datanode_block_count_thresholds',
'related_name': '',
'required': False,
'sensitive': False,
'validation_message': None,
'validation_state': 'OK',
'validation_warnings_suppressed': None,
'value': None},
{'default': None,
Maybe I should search in other classes? Please advise,
Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
11-07-2019
12:54 AM
As @Shelton mentioned, I do not recommend to use Public IP addresses at all. If you need to understand what is happening in the background check the cloudera manager logs on the CM instance. Also check out if you are able to SSH from the CM host to the new nodes from command line. Cloudera python scripts are using a lot of time python socket library to convert IP to hostname or hostname to IP, and my experience is that you should have these information on all hosts consistent and you should not mix with public IP addresses.
... View more
11-06-2019
11:05 PM
Hi,
I would like to know if there is some way to restrict how much disks space can a YARN user use in the YARN's NodeManager user cache. I would like to avoid to fill up he entire disk by some accident by only one user.
Is there a way to set lets say that every user can have X amount of GB for usercache in YARN? If not, can I somehow instruct YARN to use different folder (drive) for non-production users and thus avoid the consumption of all free space?
Thanks
... View more
Labels:
- Labels:
-
Apache YARN
10-10-2019
02:28 AM
Removed the balancer override (so it is now true) and the DN is still ok. So I dont know what is the reason, but it is definitely not solved. I think under some conditions this can happen to anybody running on CDH5
... View more
10-10-2019
02:10 AM
And after 2 minutes (it was writing the same NPE error) suddenly "fixes" itself, and the DN starts.. 2019-10-10 11:04:04,846 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320)
at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
... 31 more
2019-10-10 11:04:05,054 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current: 56073ms
2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current: 93125ms
2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to add all replicas to map: 93126ms
2019-10-10 11:04:42,170 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data9/cdh, DS-190d4a84-4811-4186-9fda-a6cfe07008ec): no suitable block pools found to scan. Waiting 551660352 ms.
2019-10-10 11:04:42,184 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data2/cdh, DS-1e368637-4201-4558-99c1-25d7ab6bb6d4): no suitable block pools found to scan. Waiting 551660354 ms.
2019-10-10 11:04:42,200 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh, after more than 504 hour(s)
2019-10-10 11:04:42,205 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh, after more than 504 hour(s)
2019-10-10 11:04:42,227 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data3/cdh, DS-6d2daa74-6042-4e3e-a91f-1c91393777f4): no suitable block pools found to scan. Waiting 551660336 ms.
2019-10-10 11:04:42,276 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh, after more than 504 hour(s) So I am not sure now, is this because of the disable - or something else?
... View more
10-10-2019
02:01 AM
Hi Cloudera, I have a similar issue with this https://community.cloudera.com/t5/Support-Questions/Datanode-is-not-connecting-to-namenode-CDH-5-14-0/m-p/65172#M55187, but in my case the solution by disabling the disk balancer did not helped. STARTUP_MSG: build = <a href="<a href="http://github.com/cloudera/hadoop" target="_blank">http://github.com/cloudera/hadoop</a>" target="_blank"><a href="http://github.com/cloudera/hadoop</a" target="_blank">http://github.com/cloudera/hadoop</a</a>> -r 2d822203265a2827554b84cbb46c69b86ccca149; compiled by 'jenkins' on 2018-08-09T16:22Z
STARTUP_MSG: java = 1.8.0_161
************************************************************/
2019-10-10 10:32:12,421 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2019-10-10 10:32:12,880 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL using keytab file hdfs.keytab
2019-10-10 10:32:13,074 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2019-10-10 10:32:13,119 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
2019-10-10 10:32:13,120 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled.
2019-10-10 10:32:13,121 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:13,151 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 8589934592
2019-10-10 10:32:13,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /10.197.27.68:50010
2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s
2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Listening on UNIX domain socket: /var/run/hdfs-sockets/dn
2019-10-10 10:32:13,219 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-10 10:32:13,224 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-10 10:32:13,228 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
2019-10-10 10:32:13,235 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-10 10:32:13,236 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-10 10:32:13,248 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 44480
2019-10-10 10:32:13,248 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2019-10-10 10:32:13,435 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:44480
2019-10-10 10:32:13,781 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTPS traffic on /10.197.27.68:50475
2019-10-10 10:32:13,786 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL
2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = hdfs
2019-10-10 10:32:13,811 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 300
2019-10-10 10:32:13,822 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2019-10-10 10:32:13,936 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /10.197.27.68:50020
2019-10-10 10:32:13,966 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: hanameservice
2019-10-10 10:32:13,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: hanameservice
2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022 starting to offer service
2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022 starting to offer service
2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,049 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022
2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022
2019-10-10 10:32:29,154 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 6 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=6, dataDirs=6)
2019-10-10 10:32:29,169 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data1/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,192 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data2/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,195 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data3/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,221 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data8/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,240 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data9/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,255 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data11/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,313 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,314 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,347 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,348 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,364 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=1710566395;bpid=BP-76826636-10.197.31.86-1501521881839;lv=-56;nsInfo=lv=-60;cid=cluster2;nsid=1710566395;c=0;bpid=BP-76826636-10.197.31.86-1501521881839;dnuuid=2de9411f-0f62-431e-bdfb-c1bbc7c20655
2019-10-10 10:32:29,384 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy: Available space volume choosing policy initialized: dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold = 10737418240, dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction = 0.75
2019-10-10 10:32:29,392 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-37471644-43b9-4631-be36-b72215d9c152
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data1/cdh/current, StorageType: DISK
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-1e368637-4201-4558-99c1-25d7ab6bb6d4
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data2/cdh/current, StorageType: DISK
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-6d2daa74-6042-4e3e-a91f-1c91393777f4
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data3/cdh/current, StorageType: DISK
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-3605e8a7-240c-4f46-bd94-fb9a76240925
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data8/cdh/current, StorageType: DISK
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-190d4a84-4811-4186-9fda-a6cfe07008ec
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data9/cdh/current, StorageType: DISK
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-709f30d4-d700-48f9-972d-6def31844ab7
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data11/cdh/current, StorageType: DISK
2019-10-10 10:32:29,398 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Volume reference is released.
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current...
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current...
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current...
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 863853522944
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865228335039
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865263112192
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 860011779047
2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 735340314624
2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 1472138194944
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data8/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data3/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data2/cdh/current: 12ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data11/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data9/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data1/cdh/current: 12ms
2019-10-10 10:32:29,412 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to scan all replicas for block pool BP-76826636-10.197.31.86-1501521881839: 13ms
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current...
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current...
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current...
2019-10-10 10:32:30,051 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:31,936 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current: 2521ms
2019-10-10 10:32:32,156 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current: 2741ms
2019-10-10 10:32:33,780 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current: 4365ms
2019-10-10 10:32:33,824 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException I verfied the override for this particular data node: </property>
<!--'dfs.disk.balancer.enabled', originally set to 'true' (non-final), is overridden below by a safety valve-->
<property>
<name>dfs.disk.balancer.enabled</name>
<value>false</value>
</property> @pifta mentioned that there should be some error in the startup about race condition, but I dont see any kind of error except the NPE there. Also there was a suggestion that filled disks can cause this issue, but every data volume has at least 80GB available. Thanks
... View more
Labels:
09-12-2019
10:51 PM
This issue is fixed in the Altus 6.2+. Director unable to install unlimited strength JCE with OpenJDK Director is unable to properly detect the version of OpenJDK being used and thus is unable to install the unlimited strength JCE when bootstrapping a deployment if it is requested. Cloudera Issue: DIR-8957
... View more
08-20-2019
12:49 AM
Hi, it is not clear from the documentaiton which enterprise license must be activated for using AutoTLS. Auto-TLS, first introduced in Cloudera Manager 5.13 on Cloudera Director 2.6, is now available for on-premises clusters in Cloudera Enterprise 6. An Enterprise license is required to enable Auto-TLS. In the pricing page multiple different subcsriptions are mentioned, does it mean that any of these licenses are required? Or the "Enterprise license" == "Enterprise Data Hub"? https://www.cloudera.com/products/pricing/product-features.html Essentials Data Science & Engineering1 Operational DB Data Warehouse Enterprise Data Hub1 Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
08-09-2019
02:04 AM
@eMazarakis I noticed the "-t" flag on the directory. That is a sticky bit, every time you see it in the HDFS ACLs, it tells you that only the owner is able to drop directory, nobody else (even if write access is granted) So in your case only the hive user can remove this directory. Maybe as @EricL pointed out, you have impersonation, so the query is running under a different user. In either way, you need to search for this permission issue in the logs.
... View more
08-07-2019
03:07 AM
Hi Cloudera, I am little bit confused, I was not able to clearly find out whether the Auto TLS feature extendes the certificates automatically before they expiry, or the administrator has to check the validity and do some manual steps (is it Rotating?) Also the docs says here that the auto-tls is not possible on the existing clusters, but here is that it is possible from 6.2 https://www.cloudera.com/documentation/enterprise/latest/topics/how_to_configure_cm_tls.html https://www.cloudera.com/documentation/enterprise/latest/topics/auto_tls.html#rotate_auto-tls So my question is, what is the steps required to renew the certificates in CDH 5.15 (CM 5.15) cluster which was deployed using Auto-TLS? Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
08-07-2019
12:48 AM
1 Kudo
Hi, the probable root cause is that the spark job submitted by the Jupyter notebook has a different memory config parameters. So I dont think the issue is Jupyter, but rather the executor and driver memory settings. Yarn is not able to provide enough resources (i.e. memory) 19/08/06 23:10:41 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Check your cluster settings: - how much memory YARN has allocated in NodeManagers, how big the container could be - what are the submit options of your spark job
... View more
08-06-2019
08:25 AM
But from the error message it looks like the service is using SSL. Are you sure that you are not using the ClouderaManager's Auto TLS feature? Verify it in the Impala Service configuration.
... View more
08-06-2019
08:23 AM
You can use a script like this to create snapshots of old and new files - i.e. search files which are older than 3 days and search for files which are newer than 3 days, just make sure, you use the correct path to the cloudera jars. In the case of CDH5.15: #!/bin/bash
now=`date +"%Y-%m-%dT%H:%M:%S"`
hdfs dfs -rm /data/cleanup_report/part=older3days/*
hdfs dfs -rm /data/cleanup_report/part=newer3days/*
hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /data -type d -mtime +3 | sed "s/^/${now}\tolder3days\t/" | hadoop fs -put - /data/cleanup_report/part=older3days/data.csv
hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /data -type d -mtime -3 | sed "s/^/${now}\tnewer3days\t/" | hadoop fs -put - /data/cleanup_report/part=newer3days/data.csv Then create an external table with partitions on top of this HDFS folder.
... View more
08-06-2019
07:30 AM
Hi @capacman this issue is not fixed in CDH5.15. Not sure about 5.16, but I guess not, if there is nothing in release notes. Tomas
... View more
08-06-2019
07:23 AM
Hi, I dont think it is so easy to do. At least I tried it once - downloading and compiling from the source. That part was the easier part - I just had to install some development libraries, gcc and other tools. But the issue is that HUE in CDH is running with a specific versions of python packages, specially pyOpenSSL, lpysaml, asn1crypto and others. The problem was, that I had to change (upgrade/downgrade) the system packages for making the "external" hue working, but then the other services and components were not working. I am sorry for this generic answer, dont have the exact details, already deleted that env. Please let me know if you find any solution to this.
... View more
08-06-2019
07:14 AM
Hi, it looks like a permission issue which is silently ignored. Can you please post the ACL's from the HDFS path, the root folder where the table is stored and the ACL's of the table path as well. Maybe you can check the hive server's log, if there is any kind of permission issue. And finally I would check the NameNode logs, of the file is not deleted because of missing permissions, it will be there as a log message. Tomas
... View more
05-18-2019
10:29 PM
No it was just one insert and after the repeat it succeeded, so I am not able to reproduce, and thus no patterns. CDH 5.15 Can you give me a detailed hint how to get the full stacktrace (from the Impala daemon?) of the failed fragment? I dont have the query profile (already deleted) but as I can remember one of the fragment (out of 10) was waiting for almost 2h to HDFS sink, others finished within a minute. Maybe it is a hdfs issue?
... View more
05-15-2019
11:54 PM
Hi,
one of our INSERT query failed in Impala, one of the fragment could not write the data into HDFS. The message is quite descriptive, but I am not able to find out what is the root cause of this failure - the HDFS did not reported any issue at that time, neither Impala daemons.
Query Status: Failed to write data (length: 38425) to Hdfs file:
hdfs://hanameservice/data/target_table/data/_impala_insert_staging/e64692c97276103f_d0ba1f0500000000/.e64692c97276103f-d0ba1f0500000001_1863061457_dir/e64692c97276103f-d0ba1f0500000001_2018403269_data.0.parq
Error(255): Unknown error 255 Root cause: IllegalMonitorStateException:
The Impala daemon log:
I0516 06:43:06.444519 32188 krpc-data-stream-recvr.cc:557] cancelled stream: fragment_instance_id=44488c54916f20c9:261655f90000000e node_id=5
I0516 06:43:06.444725 32188 query-state.cc:412] Instance completed. instance_id=44488c54916f20c9:261655f90000000e #in-flight=3 status=OK
I0516 06:43:06.444736 32188 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=44488c54916f20c9:261655f900000000 refcnt=2
I0516 06:43:06.793349 32186 query-state.cc:412] Instance completed. instance_id=44488c54916f20c9:261655f900000015 #in-flight=2 status=OK
I0516 06:43:06.793372 32186 query-exec-mgr.cc:155] ReleaseQueryState(): query_id=44488c54916f20c9:261655f900000000 refcnt=1
I0516 06:43:06.899813 10865 status.cc:125] Failed to write data (length: 38425) to Hdfs file: hdfs://hanameservice/target_table/data/_impala_insert_staging/e64692c97276103f_d0ba1f0500000000/.e64692c97276103f-d0ba1f0500000001_1863061457_dir/e64692c97276103f-d0ba1f0500000001_2018403269_data.0.parq
Error(255): Unknown error 255
Root cause: IllegalMonitorStateException:
@ 0x966e3a
@ 0x107e9fb
@ 0xe1aea3
@ 0xe1b127
@ 0xe1c54d
@ 0xdecd8c
@ 0xdedbc3
@ 0xdef090
@ 0xbadc17
@ 0xbb06af
@ 0xb9e74a
@ 0xd607ef
@ 0xd60fea
@ 0x12d8b5a
@ 0x7fa01674fdd5
@ 0x7fa016478ead
I0516 06:43:06.944746 10865 runtime-state.cc:170] Error from query e64692c97276103f:d0ba1f0500000000: Failed to close HDFS file: hdfs://hanameservice/target_table/data/_impala_insert_staging/e64692c97276103f_d0ba1f0500000000/.e64692c97276103f-d0ba1f0500000001_1863061457_dir/e64692c97276103f-d0ba1f0500000001_2018403269_data.0.parq
Error(255): Unknown error 255
Root cause: IllegalMonitorStateException:
I0516 06:43:06.966231 10865 query-state.cc:412] Instance completed. instance_id=e64692c97276103f:d0ba1f0500000001 #in-flight=1 status=GENERAL: Failed to write data (length: 38425) to Hdfs file: hdfs://hanameservice/target_table/data/_impala_insert_staging/e64692c97276103f_d0ba1f0500000000/.e64692c97276103f-d0ba1f0500000001_1863061457_dir/e64692c97276103f-d0ba1f0500000001_2018403269_data.0.parq
Error(255): Unknown error 255
Root cause: IllegalMonitorStateException:
I0516 06:43:06.966250 10865 query-state.cc:425] Cancel: query_id=e64692c97276103f:d0ba1f0500000000
Any hints what can be the root cause of this issue?
Thanks
... View more
Labels:
- Labels:
-
Apache Impala
-
HDFS
05-15-2019
11:31 PM
Hi Cloudera, I see a lot of these warnings in Impala Daemon logs: W0516 07:12:24.227567 1049 ShortCircuitCache.java:826] ShortCircuitCache(0x119fb869): could not load 1399296933_BP-76826636-10.197.31.86-1501521881839 due to InvalidToken exception. Does it indicate some bad configuration? What can I do to eliminate these warnings? Thanks
... View more
Labels:
- Labels:
-
Apache Impala
04-18-2019
03:53 AM
@Harsh J hadoop distcp s3a://<BUCKET>/2019/01/DETAIL_USAGE/* s3a://<BUCKET>/usage_for_spark_production/2019/01/' I do not pass any special params to the tool, just the source directory with asterix and the destination directory. Usually it copies without a problem, but it happened twice that it left a file with the wierd suffix.
... View more
04-18-2019
12:20 AM
No options are passed. Just like hadoop distcp s3a:location1 s3a:location2
... View more
04-17-2019
10:28 PM
Hi, I have a job copying data from S3 to S3 using distcp which from time to time leaves an unfinished file LOAD00000092.csv.gz.____distcpSplit____0.83821251 in the S3 bucket. The job runs fine on YARN, no error is logged during the copy, no error is logged during the container execution. Is there any way to configure the distcp to avoid using splits? Or why is this happening? Any tips or advise is welcome, how to overcome this. Thanks T
... View more
Labels:
- Labels:
-
Apache YARN
04-17-2019
03:35 AM
1 Kudo
Make sure that the command above returns not just the name of the server, but the fully qualified domain name, so in your case "mugzy.c.essential-rider-208218.internal". You can do it by editing the /etc/hosts file. Or check the GCP documentation regarding the FQDN for VMs for your specific Linux OS.
... View more
04-17-2019
03:07 AM
1 Kudo
You can check the Kudu tablet servers UI to verify what consumes the memory. I think one of the reason could be that you have too many tablets across the cluster (i.e. it requires too many metadata). Or one other root cause can be when you have a table where you constantly insert small amount of data, always appending, this can cause small disk replicas. The solution is to re-create the whole table, this will "compact" the replicas and thus it will consume less memory.
... View more