Created 06-13-2018 03:49 PM
I'm running spark-submit using the following command:
PYSPARK_PYTHON=./ROOT/myspark/bin/python /usr/hdp/current/spark2-client/bin/spark-submit \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./ROOT/myspark/bin/python \
--master=yarn \
--deploy-mode=cluster \
--driver-memory=4g \
--archives=myspark.zip#ROOT \
--num-executors=32 \
--packages com.databricks:spark-avro_2.11:4.0.0 \
foo.py
myspark.zip is a zipped conda environment. It was created using python with the zipfile pacakge. The files are stored without deflation. foo.py is my application code. This normally works, but if myspark.zip is greater than 2Gb I get:
java.util.zip.ZipException: invalid CEN header (bad signature)
My java version is: jdk1.8.0_112
It looks like older versions of java had this issue, but not my current one. I've written a test java class using java.util.zip which is able to unzip myspark.zip ok. I've checked that all my processes use the above version of java.
YARN logs on console after above command. I've tried both --deploy-mode=cluster and --deploy-mode=client:
18/06/13 16:00:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/06/13 16:00:23 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
18/06/13 16:00:23 INFO RMProxy: Connecting to ResourceManager at myhost2.myfirm.com/10.87.11.17:8050
18/06/13 16:00:23 INFO Client: Requesting a new application from cluster with 6 NodeManagers
18/06/13 16:00:23 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (221184 MB per container)
18/06/13 16:00:23 INFO Client: Will allocate AM container, with 18022 MB memory including 1638 MB overhead
18/06/13 16:00:23 INFO Client: Setting up container launch context for our AM
18/06/13 16:00:23 INFO Client: Setting up the launch environment for our AM container
18/06/13 16:00:23 INFO Client: Preparing resources for our AM container
18/06/13 16:00:24 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://myhost.myfirm.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz
18/06/13 16:00:24 INFO Client: Source and destination file systems are the same. Not copying hdfs://myhost.myfirm.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz
18/06/13 16:00:24 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/com.databri
cks_spark-avro_2.11-4.0.0.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.slf4j_slf4j-api-1.
7.5.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.apache.avro_avro-1.7.6.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.apache.avro_avro-
1.7.6.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org
.codehaus.jackson_jackson-core-asl-1.9.13.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/o
rg.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/com.tho
ughtworks.paranamer_paranamer-2.3.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.xerial.s
nappy_snappy-java-1.0.5.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.ap
ache.commons_commons-compress-1.4.1.jar
18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.tukaani_xz-1.0.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.tukaani_xz-1.0.jar
18/06/13 16:00:26 INFO Client: Source and destination file systems are the same. Not copying hdfs:/user/myuser/release/alphagenspark.zip#ROOT
18/06/13 16:00:26 INFO Client: Uploading resource file:/my/script/dir/spark/alphagen/foo.py -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/foo.py
18/06/13 16:00:26 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/pyspark.zip
18/06/13 16:00:26 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/py4j-0.10.4-src
.zip
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.apache.avro_avro-1.7.6.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar added multiple times to distributed cache.
18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.tukaani_xz-1.0.jar added multiple times to distributed cache.
18/06/13 16:00:27 INFO Client: Uploading resource file:/tmp/spark-6c26ae3b-7248-488f-bc33-9766251474bb/__spark_conf__4405623606341803690.zip -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/__spark_conf__.zip
18/06/13 16:00:27 INFO SecurityManager: Changing view acls to: myuser
18/06/13 16:00:27 INFO SecurityManager: Changing modify acls to: myuser
18/06/13 16:00:27 INFO SecurityManager: Changing view acls groups to:
18/06/13 16:00:27 INFO SecurityManager: Changing modify acls groups to:
18/06/13 16:00:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser); groups with view permissions: Set(); users with modify permissions: Set(myuser); groups with modify permissions: Set()
18/06/13 16:00:27 INFO Client: Submitting application application_1528901858967_0019 to ResourceManager
18/06/13 16:00:27 INFO YarnClientImpl: Submitted application application_1528901858967_0019
18/06/13 16:00:28 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:28 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1528923627110
final status: UNDEFINED
tracking URL: http://myhost2.myfirm.com:8088/proxy/application_1528901858967_0019/
user: myuser
18/06/13 16:00:29 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:30 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:31 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:32 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:33 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:34 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:35 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:36 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:37 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:38 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)
18/06/13 16:00:39 INFO Client: Application report for application_1528901858967_0019 (state: FAILED)
18/06/13 16:00:39 INFO Client:
client token: N/A
diagnostics: Application application_1528901858967_0019 failed 2 times due to AM Container for appattempt_1528901858967_0019_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://myhost2.myfirm.com:8088/cluster/app/application_1528901858967_0019 Then click on links to logs of each attempt.
Diagnostics: java.util.zip.ZipException: invalid CEN header (bad signature)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1528923627110
final status: FAILED
tracking URL: http://myhost2.myfirm.com:8088/cluster/app/application_1528901858967_0019
user: myuser
18/06/13 16:00:39 INFO Client: Deleted staging directory hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019
Exception in thread "main" org.apache.spark.SparkException: Application application_1528901858967_0019 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/06/13 16:00:39 INFO ShutdownHookManager: Shutdown hook called
18/06/13 16:00:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-6c26ae3b-7248-488f-bc33-9766251474bb
Has anyone seen this before?
Created 06-13-2018 04:42 PM
Sharing yarn application logs could help us review this issue. At least share the full error stack and let us know from which log you are getting it from. Lastly if you run on yarn-client mode do you still see this error?
Created 06-13-2018 09:26 PM
I've added some logs. client mode produces the same error as cluster mode.