About Tomas79

Tomas79 · ‎11-26-2019

The solution is quite simple, was not aware that the service-wide configurations are not in roles but in services. So the solution is to use a ServicesResourceApi endpoint and read_service_config method. Something like this: def get_service_config(self, service_name): """Returns the configuration of the service""" services_instance = cm_client.ServicesResourceApi(self.api) view = 'summary' try: api_response = services_instance.read_service_config( self.cluster_name, service_name, view=view) return api_response.to_dict() except ApiException as exception: print(f"Exception when calling ServicesResourceApi->read_service_config: {exception}\n")

Tomas79 · ‎11-25-2019

It looks like the java class com.cloudera.enterprise.dbutil.DbProvisioner expects that the user has superuser privilege on the PosgreSQL and thus the Create DB and Create Role is not enough (AWS RDS unfortunately does not provide superuser). I had to workaround the issue by creating the databases upfront.

Tomas79 · ‎11-25-2019

Hi Cloudera, the cloudera altus director cannot create a database(s) for CM and throws this error, even if it has a root user access to the external AWS RDS posgre database: org.postgresql.util.PSQLException: ERROR: must be member of role "cmadmin_dkndwlxo" I could not find any hint in the docs what this exact role means and why the root user must be member of this role. Postgre error log: 2019-11-25 11:07:06 UTC:10.150.1.7(43878):dbroot@postgres:[12215]:ERROR: must be member of role "cmadmin_dkndwlxo" 2019-11-25 11:07:06 UTC:10.150.1.7(43878):dbroot@postgres:[12215]:STATEMENT: create database scm_75ilop0jdikuhinujsfs7l5m1n owner cmadmin_dkndwlxo encoding 'UTF8' 2019-11-25 11:07:17 UTC:10.150.1.7(43880):dbroot@postgres:[12313]:ERROR: must be member of role "cmadmin_wrhjespw" 2019-11-25 11:07:17 UTC:10.150.1.7(43880):dbroot@postgres:[12313]:STATEMENT: create database scm_38kegs9qab7j5l6hgqo069h3am owner cmadmin_wrhjespw encoding 'UTF8' 2019-11-25 11:07:28 UTC:10.150.1.7(43882):dbroot@postgres:[12422]:ERROR: must be member of role "cmadmin_kfelwpnh" 2019-11-25 11:07:28 UTC:10.150.1.7(43882):dbroot@postgres:[12422]:STATEMENT: create database scm_5vrk2jc93r9h4nq9n87c3majfp owner cmadmin_kfelwpnh encoding 'UTF8' 2019-11-25 11:07:48 UTC:10.150.1.7(43884):dbroot@postgres:[12703]:ERROR: must be member of role "cmadmin_xxyehrrb" 2019-11-25 11:07:48 UTC:10.150.1.7(43884):dbroot@postgres:[12703]:STATEMENT: create database scm_fprfmbk5dq8n7n659594goeukg owner cmadmin_xxyehrrb encoding 'UTF8' 2019-11-25 11:08:19 UTC:10.150.1.7(43886):dbroot@postgres:[13017]:ERROR: must be member of role "cmadmin_qgathjfw" 2019-11-25 11:08:19 UTC:10.150.1.7(43886):dbroot@postgres:[13017]:STATEMENT: create database scm_fo6j4rn05hdlrid3g0l584urjs owner cmadmin_qgathjfw encoding 'UTF8' Postgre users: test=> \du List of roles Role name | Attributes | Member of ------------------+------------------------------------------------+----------------- cmadmin_dkndwlxo | | {} cmadmin_kfelwpnh | | {} cmadmin_qgathjfw | | {} cmadmin_wrhjespw | | {} cmadmin_xxyehrrb | | {} dbroot | Create role, Create DB +| {rds_superuser} | Password valid until infinity | Any hints? Thanks

Tomas79 · ‎11-20-2019

Hi, I am wondering if it is possible to get Service-Wide configurations via read_config method of the RoleConfigGroupsResourceApi class. https://archive.cloudera.com/cm6/6.3.0/generic/jar/cm_api/swagger-html-sdk-docs/python/docs/RoleConfigGroupsResourceApi.html#read_config The read_roles method of RolesResourceAPI returns theses roles: CD-HDFS-eHtEMKVf-DATANODE-BASE CD-HDFS-eHtEMKVf-SECONDARYNAMENODE-BASE CD-HDFS-eHtEMKVf-HTTPFS-BASE CD-HDFS-eHtEMKVf-DATANODE-BASE CD-HDFS-eHtEMKVf-DATANODE-BASE CD-HDFS-eHtEMKVf-NAMENODE-BASE But when I query all these roles, I cannot find the Service-Wide property of Advanced configuration for core-site.xml Reading configuration for CD-HDFS-eHtEMKVf-DATANODE-BASE {'items': [{'default': None, 'description': 'For advanced use only, key-value pairs (one on ' "each line) to be inserted into a role's " 'environment. Applies to configurations of this ' 'role except client configuration.', 'display_name': 'DataNode Environment Advanced Configuration ' 'Snippet (Safety Valve)', 'name': 'DATANODE_role_env_safety_valve', 'related_name': '', 'required': False, 'sensitive': False, 'validation_message': None, 'validation_state': 'OK', 'validation_warnings_suppressed': False, 'value': None}, {'default': '{"critical":"never","warning":"1000000.0"}', 'description': 'The health test thresholds of the number of blocks ' 'on a DataNode', 'display_name': 'DataNode Block Count Thresholds', 'name': 'datanode_block_count_thresholds', 'related_name': '', 'required': False, 'sensitive': False, 'validation_message': None, 'validation_state': 'OK', 'validation_warnings_suppressed': None, 'value': None}, {'default': None, Maybe I should search in other classes? Please advise, Thanks

Tomas79 · ‎11-06-2019

Hi, I would like to know if there is some way to restrict how much disks space can a YARN user use in the YARN's NodeManager user cache. I would like to avoid to fill up he entire disk by some accident by only one user. Is there a way to set lets say that every user can have X amount of GB for usercache in YARN? If not, can I somehow instruct YARN to use different folder (drive) for non-production users and thus avoid the consumption of all free space? Thanks

Tomas79 · ‎10-10-2019

Removed the balancer override (so it is now true) and the DN is still ok. So I dont know what is the reason, but it is definitely not solved. I think under some conditions this can happen to anybody running on CDH5

Tomas79 · ‎10-10-2019

And after 2 minutes (it was writing the same NPE error) suddenly "fixes" itself, and the DN starts.. 2019-10-10 11:04:04,846 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342) at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320) at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647) ... 31 more 2019-10-10 11:04:05,054 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current: 56073ms 2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current: 93125ms 2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to add all replicas to map: 93126ms 2019-10-10 11:04:42,170 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data9/cdh, DS-190d4a84-4811-4186-9fda-a6cfe07008ec): no suitable block pools found to scan. Waiting 551660352 ms. 2019-10-10 11:04:42,184 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data2/cdh, DS-1e368637-4201-4558-99c1-25d7ab6bb6d4): no suitable block pools found to scan. Waiting 551660354 ms. 2019-10-10 11:04:42,200 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh, after more than 504 hour(s) 2019-10-10 11:04:42,205 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh, after more than 504 hour(s) 2019-10-10 11:04:42,227 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data3/cdh, DS-6d2daa74-6042-4e3e-a91f-1c91393777f4): no suitable block pools found to scan. Waiting 551660336 ms. 2019-10-10 11:04:42,276 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh, after more than 504 hour(s) So I am not sure now, is this because of the disable - or something else?

Tomas79 · ‎10-10-2019

Hi Cloudera, I have a similar issue with this https://community.cloudera.com/t5/Support-Questions/Datanode-is-not-connecting-to-namenode-CDH-5-14-0/m-p/65172#M55187, but in my case the solution by disabling the disk balancer did not helped. STARTUP_MSG: build = <a href="<a href="http://github.com/cloudera/hadoop" target="_blank">http://github.com/cloudera/hadoop</a>" target="_blank"><a href="http://github.com/cloudera/hadoop</a" target="_blank">http://github.com/cloudera/hadoop</a</a>> -r 2d822203265a2827554b84cbb46c69b86ccca149; compiled by 'jenkins' on 2018-08-09T16:22Z STARTUP_MSG: java = 1.8.0_161 ************************************************************/ 2019-10-10 10:32:12,421 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2019-10-10 10:32:12,880 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL using keytab file hdfs.keytab 2019-10-10 10:32:13,074 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2019-10-10 10:32:13,119 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576 2019-10-10 10:32:13,120 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled. 2019-10-10 10:32:13,121 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ip-10-197-27-68.eu-west-1.compute.internal 2019-10-10 10:32:13,151 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 8589934592 2019-10-10 10:32:13,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /10.197.27.68:50010 2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s 2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50 2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s 2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50 2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Listening on UNIX domain socket: /var/run/hdfs-sockets/dn 2019-10-10 10:32:13,219 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2019-10-10 10:32:13,224 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets. 2019-10-10 10:32:13,228 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined 2019-10-10 10:32:13,235 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2019-10-10 10:32:13,236 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode 2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2019-10-10 10:32:13,248 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 44480 2019-10-10 10:32:13,248 INFO org.mortbay.log: jetty-6.1.26.cloudera.4 2019-10-10 10:32:13,435 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:44480 2019-10-10 10:32:13,781 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTPS traffic on /10.197.27.68:50475 2019-10-10 10:32:13,786 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor 2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL 2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = hdfs 2019-10-10 10:32:13,811 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 300 2019-10-10 10:32:13,822 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020 2019-10-10 10:32:13,936 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /10.197.27.68:50020 2019-10-10 10:32:13,966 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: hanameservice 2019-10-10 10:32:13,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: hanameservice 2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022 starting to offer service 2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022 starting to offer service 2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:24,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:24,049 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022 2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022 2019-10-10 10:32:29,154 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 6 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=6, dataDirs=6) 2019-10-10 10:32:29,169 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data1/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal 2019-10-10 10:32:29,192 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data2/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal 2019-10-10 10:32:29,195 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data3/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal 2019-10-10 10:32:29,221 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data8/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal 2019-10-10 10:32:29,240 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data9/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal 2019-10-10 10:32:29,255 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data11/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal 2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,313 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,314 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,347 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,348 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,364 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=1710566395;bpid=BP-76826636-10.197.31.86-1501521881839;lv=-56;nsInfo=lv=-60;cid=cluster2;nsid=1710566395;c=0;bpid=BP-76826636-10.197.31.86-1501521881839;dnuuid=2de9411f-0f62-431e-bdfb-c1bbc7c20655 2019-10-10 10:32:29,384 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy: Available space volume choosing policy initialized: dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold = 10737418240, dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction = 0.75 2019-10-10 10:32:29,392 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-37471644-43b9-4631-be36-b72215d9c152 2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data1/cdh/current, StorageType: DISK 2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-1e368637-4201-4558-99c1-25d7ab6bb6d4 2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data2/cdh/current, StorageType: DISK 2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-6d2daa74-6042-4e3e-a91f-1c91393777f4 2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data3/cdh/current, StorageType: DISK 2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-3605e8a7-240c-4f46-bd94-fb9a76240925 2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data8/cdh/current, StorageType: DISK 2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-190d4a84-4811-4186-9fda-a6cfe07008ec 2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data9/cdh/current, StorageType: DISK 2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-709f30d4-d700-48f9-972d-6def31844ab7 2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data11/cdh/current, StorageType: DISK 2019-10-10 10:32:29,398 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean 2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Volume reference is released. 2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-76826636-10.197.31.86-1501521881839 2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current... 2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current... 2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current... 2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current... 2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current... 2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current... 2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 863853522944 2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865228335039 2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865263112192 2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 860011779047 2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 735340314624 2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 1472138194944 2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data8/cdh/current: 11ms 2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data3/cdh/current: 11ms 2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data2/cdh/current: 12ms 2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data11/cdh/current: 11ms 2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data9/cdh/current: 11ms 2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data1/cdh/current: 12ms 2019-10-10 10:32:29,412 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to scan all replicas for block pool BP-76826636-10.197.31.86-1501521881839: 13ms 2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current... 2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current... 2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current... 2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current... 2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current... 2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current... 2019-10-10 10:32:30,051 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2019-10-10 10:32:31,936 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current: 2521ms 2019-10-10 10:32:32,156 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current: 2741ms 2019-10-10 10:32:33,780 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current: 4365ms 2019-10-10 10:32:33,824 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception javax.management.RuntimeMBeanException: java.lang.NullPointerException I verfied the override for this particular data node: </property>  <property> <name>dfs.disk.balancer.enabled</name> <value>false</value> </property> @pifta mentioned that there should be some error in the startup about race condition, but I dont see any kind of error except the NPE there. Also there was a suggestion that filled disks can cause this issue, but every data volume has at least 80GB available. Thanks

Tomas79 · ‎08-09-2019

@eMazarakis I noticed the "-t" flag on the directory. That is a sticky bit, every time you see it in the HDFS ACLs, it tells you that only the owner is able to drop directory, nobody else (even if write access is granted) So in your case only the hive user can remove this directory. Maybe as @EricL pointed out, you have impersonation, so the query is running under a different user. In either way, you need to search for this permission issue in the logs.

Tomas79 · ‎08-07-2019

Hi, the probable root cause is that the spark job submitted by the Jupyter notebook has a different memory config parameters. So I dont think the issue is Jupyter, but rather the executor and driver memory settings. Yarn is not able to provide enough resources (i.e. memory) 19/08/06 23:10:41 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Check your cluster settings: - how much memory YARN has allocated in NodeManagers, how big the container could be - what are the submit options of your spark job

Online	Offline
Last Visited	‎01-14-2021 05:46 AM

Member Since	‎07-01-2015 06:03 AM
Last Visited	‎01-14-2021 05:46 AM
Posts	460
Kudos received	79

Cloudera Community

Re: Read service-wide configuration values via API

Re: Cloudera Altus - create CM with existing postg...

Re: Spark job getting failed with Jupyter notebook

Re: Create Parameterized view Impala

Re: Unable to access NameNode in cross realm trust...

Re: Read service-wide configuration values via API

Re: Cloudera Altus - create CM with existing postg...

Cloudera Altus - create CM with existing postgre D...

Read service-wide configuration values via API

YARN user cache limit

Re: Datanode is not connecting after restart CDH5....

Re: Datanode is not connecting after restart CDH5....

Datanode is not connecting after restart CDH5.15.1

Re: How to properly delete a hive table through HU...

Re: Spark job getting failed with Jupyter notebook