Member since
07-01-2015
460
Posts
78
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1344 | 11-26-2019 11:47 PM | |
1301 | 11-25-2019 11:44 AM | |
9470 | 08-07-2019 12:48 AM | |
2170 | 04-17-2019 03:09 AM | |
3483 | 02-18-2019 12:23 AM |
11-26-2019
11:47 PM
1 Kudo
The solution is quite simple, was not aware that the service-wide configurations are not in roles but in services. So the solution is to use a ServicesResourceApi endpoint and read_service_config method. Something like this: def get_service_config(self, service_name):
"""Returns the configuration of the service"""
services_instance = cm_client.ServicesResourceApi(self.api)
view = 'summary'
try:
api_response = services_instance.read_service_config(
self.cluster_name, service_name, view=view)
return api_response.to_dict()
except ApiException as exception:
print(f"Exception when calling ServicesResourceApi->read_service_config: {exception}\n")
... View more
11-25-2019
11:44 AM
It looks like the java class com.cloudera.enterprise.dbutil.DbProvisioner expects that the user has superuser privilege on the PosgreSQL and thus the Create DB and Create Role is not enough (AWS RDS unfortunately does not provide superuser). I had to workaround the issue by creating the databases upfront.
... View more
11-25-2019
03:32 AM
Hi Cloudera,
the cloudera altus director cannot create a database(s) for CM and throws this error, even if it has a root user access to the external AWS RDS posgre database:
org.postgresql.util.PSQLException: ERROR: must be member of role "cmadmin_dkndwlxo"
I could not find any hint in the docs what this exact role means and why the root user must be member of this role.
Postgre error log:
2019-11-25 11:07:06 UTC:10.150.1.7(43878):dbroot@postgres:[12215]:ERROR: must be member of role "cmadmin_dkndwlxo"
2019-11-25 11:07:06 UTC:10.150.1.7(43878):dbroot@postgres:[12215]:STATEMENT: create database scm_75ilop0jdikuhinujsfs7l5m1n owner cmadmin_dkndwlxo encoding 'UTF8'
2019-11-25 11:07:17 UTC:10.150.1.7(43880):dbroot@postgres:[12313]:ERROR: must be member of role "cmadmin_wrhjespw"
2019-11-25 11:07:17 UTC:10.150.1.7(43880):dbroot@postgres:[12313]:STATEMENT: create database scm_38kegs9qab7j5l6hgqo069h3am owner cmadmin_wrhjespw encoding 'UTF8'
2019-11-25 11:07:28 UTC:10.150.1.7(43882):dbroot@postgres:[12422]:ERROR: must be member of role "cmadmin_kfelwpnh"
2019-11-25 11:07:28 UTC:10.150.1.7(43882):dbroot@postgres:[12422]:STATEMENT: create database scm_5vrk2jc93r9h4nq9n87c3majfp owner cmadmin_kfelwpnh encoding 'UTF8'
2019-11-25 11:07:48 UTC:10.150.1.7(43884):dbroot@postgres:[12703]:ERROR: must be member of role "cmadmin_xxyehrrb"
2019-11-25 11:07:48 UTC:10.150.1.7(43884):dbroot@postgres:[12703]:STATEMENT: create database scm_fprfmbk5dq8n7n659594goeukg owner cmadmin_xxyehrrb encoding 'UTF8'
2019-11-25 11:08:19 UTC:10.150.1.7(43886):dbroot@postgres:[13017]:ERROR: must be member of role "cmadmin_qgathjfw"
2019-11-25 11:08:19 UTC:10.150.1.7(43886):dbroot@postgres:[13017]:STATEMENT: create database scm_fo6j4rn05hdlrid3g0l584urjs owner cmadmin_qgathjfw encoding 'UTF8'
Postgre users:
test=> \du
List of roles
Role name | Attributes | Member of
------------------+------------------------------------------------+-----------------
cmadmin_dkndwlxo | | {}
cmadmin_kfelwpnh | | {}
cmadmin_qgathjfw | | {}
cmadmin_wrhjespw | | {}
cmadmin_xxyehrrb | | {}
dbroot | Create role, Create DB +| {rds_superuser}
| Password valid until infinity |
Any hints?
Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
11-20-2019
11:48 AM
Hi,
I am wondering if it is possible to get Service-Wide configurations via read_config method of the RoleConfigGroupsResourceApi class.
https://archive.cloudera.com/cm6/6.3.0/generic/jar/cm_api/swagger-html-sdk-docs/python/docs/RoleConfigGroupsResourceApi.html#read_config
The read_roles method of RolesResourceAPI returns theses roles:
CD-HDFS-eHtEMKVf-DATANODE-BASE
CD-HDFS-eHtEMKVf-SECONDARYNAMENODE-BASE
CD-HDFS-eHtEMKVf-HTTPFS-BASE
CD-HDFS-eHtEMKVf-DATANODE-BASE
CD-HDFS-eHtEMKVf-DATANODE-BASE
CD-HDFS-eHtEMKVf-NAMENODE-BASE
But when I query all these roles, I cannot find the Service-Wide property of Advanced configuration for core-site.xml
Reading configuration for CD-HDFS-eHtEMKVf-DATANODE-BASE
{'items': [{'default': None,
'description': 'For advanced use only, key-value pairs (one on '
"each line) to be inserted into a role's "
'environment. Applies to configurations of this '
'role except client configuration.',
'display_name': 'DataNode Environment Advanced Configuration '
'Snippet (Safety Valve)',
'name': 'DATANODE_role_env_safety_valve',
'related_name': '',
'required': False,
'sensitive': False,
'validation_message': None,
'validation_state': 'OK',
'validation_warnings_suppressed': False,
'value': None},
{'default': '{"critical":"never","warning":"1000000.0"}',
'description': 'The health test thresholds of the number of blocks '
'on a DataNode',
'display_name': 'DataNode Block Count Thresholds',
'name': 'datanode_block_count_thresholds',
'related_name': '',
'required': False,
'sensitive': False,
'validation_message': None,
'validation_state': 'OK',
'validation_warnings_suppressed': None,
'value': None},
{'default': None,
Maybe I should search in other classes? Please advise,
Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
11-06-2019
11:05 PM
Hi,
I would like to know if there is some way to restrict how much disks space can a YARN user use in the YARN's NodeManager user cache. I would like to avoid to fill up he entire disk by some accident by only one user.
Is there a way to set lets say that every user can have X amount of GB for usercache in YARN? If not, can I somehow instruct YARN to use different folder (drive) for non-production users and thus avoid the consumption of all free space?
Thanks
... View more
Labels:
- Labels:
-
Apache YARN
10-10-2019
02:28 AM
Removed the balancer override (so it is now true) and the DN is still ok. So I dont know what is the reason, but it is definitely not solved. I think under some conditions this can happen to anybody running on CDH5
... View more
10-10-2019
02:10 AM
And after 2 minutes (it was writing the same NPE error) suddenly "fixes" itself, and the DN starts.. 2019-10-10 11:04:04,846 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:320)
at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1301)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:2917)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
... 31 more
2019-10-10 11:04:05,054 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current: 56073ms
2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current: 93125ms
2019-10-10 11:04:42,106 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to add all replicas to map: 93126ms
2019-10-10 11:04:42,170 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data9/cdh, DS-190d4a84-4811-4186-9fda-a6cfe07008ec): no suitable block pools found to scan. Waiting 551660352 ms.
2019-10-10 11:04:42,184 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data2/cdh, DS-1e368637-4201-4558-99c1-25d7ab6bb6d4): no suitable block pools found to scan. Waiting 551660354 ms.
2019-10-10 11:04:42,200 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh, after more than 504 hour(s)
2019-10-10 11:04:42,205 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh, after more than 504 hour(s)
2019-10-10 11:04:42,227 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/data/data3/cdh, DS-6d2daa74-6042-4e3e-a91f-1c91393777f4): no suitable block pools found to scan. Waiting 551660336 ms.
2019-10-10 11:04:42,276 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Now rescanning bpid BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh, after more than 504 hour(s) So I am not sure now, is this because of the disable - or something else?
... View more
10-10-2019
02:01 AM
Hi Cloudera, I have a similar issue with this https://community.cloudera.com/t5/Support-Questions/Datanode-is-not-connecting-to-namenode-CDH-5-14-0/m-p/65172#M55187, but in my case the solution by disabling the disk balancer did not helped. STARTUP_MSG: build = <a href="<a href="http://github.com/cloudera/hadoop" target="_blank">http://github.com/cloudera/hadoop</a>" target="_blank"><a href="http://github.com/cloudera/hadoop</a" target="_blank">http://github.com/cloudera/hadoop</a</a>> -r 2d822203265a2827554b84cbb46c69b86ccca149; compiled by 'jenkins' on 2018-08-09T16:22Z
STARTUP_MSG: java = 1.8.0_161
************************************************************/
2019-10-10 10:32:12,421 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2019-10-10 10:32:12,880 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL using keytab file hdfs.keytab
2019-10-10 10:32:13,074 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2019-10-10 10:32:13,114 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2019-10-10 10:32:13,119 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
2019-10-10 10:32:13,120 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled.
2019-10-10 10:32:13,121 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:13,151 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 8589934592
2019-10-10 10:32:13,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /10.197.27.68:50010
2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s
2019-10-10 10:32:13,172 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
2019-10-10 10:32:13,175 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Listening on UNIX domain socket: /var/run/hdfs-sockets/dn
2019-10-10 10:32:13,219 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-10 10:32:13,224 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-10 10:32:13,228 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
2019-10-10 10:32:13,235 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-10 10:32:13,236 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-10 10:32:13,237 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-10 10:32:13,248 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 44480
2019-10-10 10:32:13,248 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2019-10-10 10:32:13,435 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:44480
2019-10-10 10:32:13,781 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTPS traffic on /10.197.27.68:50475
2019-10-10 10:32:13,786 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs/ip-10-197-27-68.eu-west-1.compute.internal@REALM.LOCAL
2019-10-10 10:32:13,786 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = hdfs
2019-10-10 10:32:13,811 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 300
2019-10-10 10:32:13,822 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2019-10-10 10:32:13,936 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /10.197.27.68:50020
2019-10-10 10:32:13,966 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: hanameservice
2019-10-10 10:32:13,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: hanameservice
2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022 starting to offer service
2019-10-10 10:32:13,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022 starting to offer service
2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2019-10-10 10:32:13,992 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:15,042 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:16,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:17,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:18,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:19,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:20,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:21,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:22,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:23,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,049 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-31-86.eu-west-1.compute.internal/10.197.31.86:8022
2019-10-10 10:32:24,050 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022
2019-10-10 10:32:29,154 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 6 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=6, dataDirs=6)
2019-10-10 10:32:29,169 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data1/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,192 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data2/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,195 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data3/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,221 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data8/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,240 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data9/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,255 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/data11/cdh/in_use.lock acquired by nodename 4465@ip-10-197-27-68.eu-west-1.compute.internal
2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,276 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,295 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,313 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,314 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,331 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,347 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,348 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,363 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,364 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=1710566395;bpid=BP-76826636-10.197.31.86-1501521881839;lv=-56;nsInfo=lv=-60;cid=cluster2;nsid=1710566395;c=0;bpid=BP-76826636-10.197.31.86-1501521881839;dnuuid=2de9411f-0f62-431e-bdfb-c1bbc7c20655
2019-10-10 10:32:29,384 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy: Available space volume choosing policy initialized: dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold = 10737418240, dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction = 0.75
2019-10-10 10:32:29,392 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-37471644-43b9-4631-be36-b72215d9c152
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data1/cdh/current, StorageType: DISK
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-1e368637-4201-4558-99c1-25d7ab6bb6d4
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data2/cdh/current, StorageType: DISK
2019-10-10 10:32:29,393 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-6d2daa74-6042-4e3e-a91f-1c91393777f4
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data3/cdh/current, StorageType: DISK
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-3605e8a7-240c-4f46-bd94-fb9a76240925
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data8/cdh/current, StorageType: DISK
2019-10-10 10:32:29,394 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-190d4a84-4811-4186-9fda-a6cfe07008ec
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data9/cdh/current, StorageType: DISK
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-709f30d4-d700-48f9-972d-6def31844ab7
2019-10-10 10:32:29,395 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - /data/data11/cdh/current, StorageType: DISK
2019-10-10 10:32:29,398 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Volume reference is released.
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-76826636-10.197.31.86-1501521881839
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current...
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current...
2019-10-10 10:32:29,399 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current...
2019-10-10 10:32:29,400 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current...
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data1/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 863853522944
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data2/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865228335039
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data8/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 865263112192
2019-10-10 10:32:29,407 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data3/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 860011779047
2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data9/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 735340314624
2019-10-10 10:32:29,408 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Cached dfsUsed found for /data/data11/cdh/current/BP-76826636-10.197.31.86-1501521881839/current: 1472138194944
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data8/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data3/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data2/cdh/current: 12ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data11/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data9/cdh/current: 11ms
2019-10-10 10:32:29,411 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-76826636-10.197.31.86-1501521881839 on /data/data1/cdh/current: 12ms
2019-10-10 10:32:29,412 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to scan all replicas for block pool BP-76826636-10.197.31.86-1501521881839: 13ms
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data1/cdh/current...
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data2/cdh/current...
2019-10-10 10:32:29,414 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data3/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current...
2019-10-10 10:32:29,415 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current...
2019-10-10 10:32:30,051 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-197-7-125.eu-west-1.compute.internal/10.197.7.125:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-10-10 10:32:31,936 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data9/cdh/current: 2521ms
2019-10-10 10:32:32,156 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data8/cdh/current: 2741ms
2019-10-10 10:32:33,780 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-76826636-10.197.31.86-1501521881839 on volume /data/data11/cdh/current: 4365ms
2019-10-10 10:32:33,824 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException I verfied the override for this particular data node: </property>
<!--'dfs.disk.balancer.enabled', originally set to 'true' (non-final), is overridden below by a safety valve-->
<property>
<name>dfs.disk.balancer.enabled</name>
<value>false</value>
</property> @pifta mentioned that there should be some error in the startup about race condition, but I dont see any kind of error except the NPE there. Also there was a suggestion that filled disks can cause this issue, but every data volume has at least 80GB available. Thanks
... View more
Labels:
08-09-2019
02:04 AM
@eMazarakis I noticed the "-t" flag on the directory. That is a sticky bit, every time you see it in the HDFS ACLs, it tells you that only the owner is able to drop directory, nobody else (even if write access is granted) So in your case only the hive user can remove this directory. Maybe as @EricL pointed out, you have impersonation, so the query is running under a different user. In either way, you need to search for this permission issue in the logs.
... View more
08-07-2019
12:48 AM
1 Kudo
Hi, the probable root cause is that the spark job submitted by the Jupyter notebook has a different memory config parameters. So I dont think the issue is Jupyter, but rather the executor and driver memory settings. Yarn is not able to provide enough resources (i.e. memory) 19/08/06 23:10:41 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Check your cluster settings: - how much memory YARN has allocated in NodeManagers, how big the container could be - what are the submit options of your spark job
... View more