Member since
09-05-2018
8
Posts
0
Kudos Received
0
Solutions
12-17-2018
02:22 AM
hi, satz I found the reason just now. It's because we use openstack as our platform and the dns server ip starts with 100, like 100.x.x.x I think it may conflict with cdsw k8s. I changed the dns to local nodes using dnsmasq and everthing is fine now.
... View more
12-17-2018
02:12 AM
Hi, I didn't find how to upload an attachment. So I copied the error message here. I can run spark2 demo on other nodes in the cluster. But when I run spark2 demo inside the docker terminal, it get stucked. And I can read and write hdfs file in the docker terminal. 18/12/17 09:57:13 INFO retry.RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm138 after 1 fail over attempts. Trying to fail over immediately. java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: yarn/master02.cdh.cdhtest.com@CDHTEST.COM, expecting: yarn/10.1.1.3@CDHTEST.COM; Host Details : local host is: "cmzr7xobj64n5n12/100.66.0.11"; destination host is: "master02.cdh.cdhtest.com":8032; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1508) at org.apache.hadoop.ipc.Client.call(Client.java:1441) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy21.getClusterMetrics(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy22.getClusterMetrics(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:483) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:159) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:159) at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:61) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:158) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.<init>(SparkContext.scala:500) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) at org.apache.toree.kernel.api.Kernel.createSparkContext(Kernel.scala:354) at org.apache.toree.kernel.api.Kernel.createSparkContext(Kernel.scala:373) at org.apache.toree.boot.layer.StandardComponentInitialization$class.initializeSparkContext(ComponentInitialization.scala:103) at org.apache.toree.Main$$anon$1.initializeSparkContext(Main.scala:35) at org.apache.toree.boot.layer.StandardComponentInitialization$class.initializeComponents(ComponentInitialization.scala:86) at org.apache.toree.Main$$anon$1.initializeComponents(Main.scala:35) at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:100) at org.apache.toree.Main$.delayedEndpoint$org$apache$toree$Main$1(Main.scala:40) at org.apache.toree.Main$delayedInit$body.apply(Main.scala:24) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at org.apache.toree.Main$.main(Main.scala:24) at org.apache.toree.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: yarn/master02.cdh.cdhtest.com@CDHTEST.COM, expecting: yarn/10.1.1.3@CDHTEST.COM at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:718) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:681) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:769) at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557) at org.apache.hadoop.ipc.Client.call(Client.java:1480) ... 53 more Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: yarn/master02.cdh.cdhtest.com@CDHTEST.COM, expecting: yarn/10.1.1.3@CDHTEST.COM at org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:335) at org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:231) at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:159) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:594) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:396) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:761) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:757) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756) ... 56 more 18/12/17 09:57:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm88 18/12/17 09:57:13 INFO retry.RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm88 after 2 fail over attempts. Trying to fail over after sleeping for 549ms. java.net.ConnectException: Call From cmzr7xobj64n5n12/100.66.0.11 to master01.cdh.cdhtest.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1508) at org.apache.hadoop.ipc.Client.call(Client.java:1441) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy21.getClusterMetrics(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy22.getClusterMetrics(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:483) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:159) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:159) at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:61) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:158) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.<init>(SparkContext.scala:500) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) at org.apache.toree.kernel.api.Kernel.createSparkContext(Kernel.scala:354) at org.apache.toree.kernel.api.Kernel.createSparkContext(Kernel.scala:373) at org.apache.toree.boot.layer.StandardComponentInitialization$class.initializeSparkContext(ComponentInitialization.scala:103) at org.apache.toree.Main$$anon$1.initializeSparkContext(Main.scala:35) at org.apache.toree.boot.layer.StandardComponentInitialization$class.initializeComponents(ComponentInitialization.scala:86) at org.apache.toree.Main$$anon$1.initializeComponents(Main.scala:35) at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:100) at org.apache.toree.Main$.delayedEndpoint$org$apache$toree$Main$1(Main.scala:40) at org.apache.toree.Main$delayedInit$body.apply(Main.scala:24) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at org.apache.toree.Main$.main(Main.scala:24) at org.apache.toree.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744) at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557) at org.apache.hadoop.ipc.Client.call(Client.java:1480) ... 53 more 18/12/17 09:57:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm138 18/12/17 09:57:13 WARN ipc.Client: Exception encountered while connecting to the server : java.lang.IllegalArgumentException: Server has invalid Kerberos principal: yarn/master02.cdh.cdhtest.com@CDHTEST.COM, expecting: yarn/10.1.1.3@CDHTEST.COM 18/12/17 09:57:13 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs@CDHTEST.COM (auth:KERBEROS) cause:java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: yarn/master02.cdh.cdhtest.com@CDHTEST.COM, expecting: yarn/10.1.1.3@CDHTEST.COM 18/12/17 09:57:13 INFO retry.RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm138 after 3 fail over attempts. Trying to fail over immediately. java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: yarn/master02.cdh.cdhtest.com@CDHTEST.COM, expecting: yarn/10.1.1.3@CDHTEST.COM; Host Details : local host is: "cmzr7xobj64n5n12/100.66.0.11"; destination host is: "master02.cdh.cdhtest.com":8032;
... View more
12-17-2018
01:42 AM
Hello everyone:
When I am trying to start a scala session it gets stuck on 'Scala session (Base Image v6) starting...'
But I can reach the terminal and /tmp/spark-driver.log says
WARN ui.JettyUtils: GET /jobs/ failed: java.util.NoSuchElementException java.util.NoSuchElementException
Additionly, when I try to run pyspark program, it get stucked and spark ui tells the same mistake.
Do anyone know what happened?
Thanks a lot!
... View more
Labels:
09-12-2018
02:15 AM
Hi everyone: CDSW support using GPU for machine learning jobs, but it seems one GPU can only be used by one user at the same time. Can we share one GPU with mulitple users or one GPU must be dedicated to one docker and nobody else can use it at the same time? Thanks!
... View more
Labels:
09-05-2018
06:40 PM
Hi,
I'm trying to find a way to limit the resource a user can request.
It is easy to limit the cpu and ram a user can use in CDH by YARN dynamic resource pool.
But I don't know how to limit the docker instance a user can request in CDSW.
I have read all the official documents but it does not mention anything about it.
I tried in CDSW and find that user can open dockers as much as he/she wants and make CDSW unavailability for anybody else.
So, is there some method to limit the resource a user can request inside CDSW cluster?
Thanks a lot.
... View more
Labels: