Member since
06-08-2017
19
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4000 | 05-01-2019 11:51 AM |
05-01-2019
11:51 AM
This was failing since my python executable was not in .zip or .egg format. On creation of the executable in .zip format job was accepted.
... View more
04-29-2019
08:23 PM
Hi , I am upgrading from Spark 1.6.0 to Spark 2.1 on CDH 5.10 platform. I am trying to run spark2-submit command for python implementation and it is failing giving below error. Looks like it is expecting some path property while initilization and creating SparkContext object which is not happening. Error details are as below. Please suggest if any specific configuration is missing or required for spark2. sc = SparkContext(conf=conf) File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__ File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 182, in _do_init File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 249, in _initialize_context File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__ File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.<init>(Path.java:135) at org.apache.hadoop.fs.Path.<init>(Path.java:94) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:368) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:481) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:629) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:627) at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:627) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:874) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:171) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:236) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Spark
04-29-2019
11:58 AM
Thanks, Agreed. I also found the bug details. Based on the URL https://spark.apache.org/docs/1.6.0/#downloading you shared, it contains details which says it is compatible with 2.6+ and 3.1+ which is totally misleading since 3.6 is 3.1+ I have started working to upgrade my app to spark 2. Any suggestiosn on Spark 1.6 to Spark 2 migration guide on Cloudera cluster
... View more
04-29-2019
10:18 AM
Hi All, We are currently using Spark 1.6 on CDH 5.10 platform. We are currently upgrading from python 2.7 to python 3.6 using anaconda distribution. While i try to do spark-submit in client mode the process is failing giving below error - File "/apps/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module' We are not very clear about the cause of the failure. We have checked Spark documentation and it says that Spark 1.6.0 is compatible with python 3.0+. Any thoughts or suggestions on this would be helpful ? Thanks Hemil
... View more
Labels:
- Labels:
-
Apache Spark
03-12-2019
07:40 AM
Hi, I am running spark-submit job on yarn cluster during which it is uploading dependent jars in default HDFS staging directory which is /user/<user id>/.sparkStaging/<yarn applicationId>/*.jar. On verification during spark-submit job, i see that jar is getting uploaded but spark-submit is failing with below error - file owner and group belongs to the same id using which spark-submit is performed. I also tried using configuration parameter spark.yarn.StagingDir but even that didn't helped. Your professional inputs will help in addressing this issue. Error stack trace - ========================= Diagnostics: File does not exist: hdfs://user/<user id>/.sparkStaging/<yarn application_id>/chill-java-0.5.0.jar java.io.FileNotFoundException: File does not exist: hdfs://user/<user id>/.sparkStaging/<yarn application_id>/chill-java-0.5.0.jar at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1257) at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1249) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1249) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Thanks Hemil
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
-
HDFS
06-08-2017
06:12 PM
Hi, I am trying to understand the impact and design for zookeeper setup since Kafka is dependent on zookeeper for its operations. Zookeeper specifies 2F+1 no of nodes to be setup for reliable fault tolerance. Consider that If I have 2 racks and I setup 4 nodes on rack A and 5 on rack B (Total 9 zookeeper nodes) and rack B goes down (5 zookeeper nodes goes down). In that case with the requirement of 2F+1, it needs 11 zookeeper nodes where as I have only 9 nodes. So zookeeper in case of rack failure with higher no of nodes will not be able to sustain which will impact Kafka cluster behavior. Can you please provide your inputs on how to better setup zookeeper so that Kafka can work seamlessly in case of 2 rack infrastructure
... View more
06-08-2017
05:58 PM
Hi, I am trying to understand the impact and design for zookeeper setup since Kafka is dependent on zookeeper for its operations. Zookeeper specifies 2F+1 no of nodes to be setup for reliable fault tolerance. Consider that If I have 2 racks and I setup 4 nodes on rack A and 5 on rack B (Total 9 zookeeper nodes) and rack B goes down (5 zookeeper nodes goes down). In that case with the requirement of 2F+1, it needs 11 zookeeper nodes where as I have only 9 nodes. So zookeeper in case of rack failure with higher no of nodes will not be able to sustain which will impact Kafka cluster behavior. Can you please provide your inputs on how to better setup zookeeper so that Kafka can work seamlessly in case of 2 rack infrastructure.
... View more