Member since
04-11-2016
38
Posts
13
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
49911 | 01-04-2017 11:43 PM | |
4077 | 09-05-2016 04:07 PM | |
10640 | 09-05-2016 03:50 PM | |
2445 | 08-30-2016 08:15 PM | |
4047 | 08-30-2016 01:01 PM |
09-05-2016
04:16 PM
The most easy way just launch "spark-shell" at the command line. This will give you the active version running on your cluster: [root@xxxxxxx ~]# spark-shell
16/09/05 17:15:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.3.1
/_/ Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information. [root@xxxxxxx ~]# spark-shell
16/09/05 17:15:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.3.1
/_/
Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
... View more
09-05-2016
04:07 PM
Hi Surya, Add SPARK-SFTP library as --packages or --jar in your spark-submit command. I am not sure spark 1.4.1 would be able to handle it. Look at upgrading spark to 1.6.1 Check https://github.com/springml/spark-sftp https://spark-packages.org/package/springml/spark-sftp Include this package in your Spark Applications using: spark-shell, pyspark, or spark-submit > $SPARK_HOME/bin/spark-shell --packages com.springml:spark-sftp_2.10:1.0.1
sbt In your sbt build file, add: libraryDependencies += "com.springml" % "spark-sftp_2.10" % "1.0.1" Maven In your pom.xml, add: <dependencies>
<!-- list of dependencies -->
<dependency>
<groupId>com.springml</groupId>
<artifactId>spark-sftp_2.10</artifactId>
<version>1.0.1</version>
</dependency>
</dependencies>
Releases 1bf5b3 | zip | jar ) / Date: 2016-05-27 / License: Apache-2.0 / Scala version: 2.10
7d5b02 | zip | jar ) / Date: 2016-01-11 / License: Apache-2.0 / Scala version: 2.10
... View more
09-05-2016
03:50 PM
1 Kudo
# Check Commands
# --------------
# Spark Scala
# -----------
# Optionally export Spark Home
export SPARK_HOME=/usr/hdp/current/spark-client
# Spark submit example in local mode
spark-submit --class org.apache.spark.examples.SparkPi --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10
# Spark submit example in client mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10
# Spark submit example in cluster mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10
# Spark shell with yarn client
spark-shell --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1
# Pyspark
# -------
# Optionally export Hadoop COnf and PySpark Python
export HADOOP_CONF_DIR=/etc/hadoop/conf
export PYSPARK_PYTHON=/opath/to/bin/python
# PySpark submit example in local mode
spark-submit --verbose /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100
# PySpark submit example in client mode
spark-submit --verbose --master yarn-client /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100
# PySpark submit example in cluster mode
spark-submit --verbose --master yarn-cluster /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100
# PySpark shell with yarn client
pyspark --master yarn-client
@jigar.patel
... View more
08-30-2016
08:15 PM
2 Kudos
Resolution done for Spark 2.0.0 Resolution for Spark Submit issue: add java-opts file in /usr/hdp/current/spark2-client/conf/ [root@sandbox conf]# cat java-opts -Dhdp.version=2.5.0.0-817 Spark Submit working example: [root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --ex ecutor-cores 1 examples/jars/spark-examples*.jar 10 16/08/2917:44:57 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform...using builtin-java classes where applicable 16/08/2917:44:58 WARN shortcircuit.DomainSocketFactory:Theshort-circuit local reads feature cannot be used because libhadoop cannot be loaded. 16/08/2917:44:58 INFO client.RMProxy:Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/08/2917:44:58 INFO yarn.Client:Requesting a new application from cluster with1NodeManagers 16/08/2917:44:58 INFO yarn.Client:Verifyingour application has not requested more than the maximum memory capability of the cluster (7680 MB per container) 16/08/2917:44:58 INFO yarn.Client:Will allocate AM container,with2248 MB memory including 200 MB overhead 16/08/2917:44:58 INFO yarn.Client:Setting up container launch context forour AM 16/08/2917:44:58 INFO yarn.Client:Setting up the launch environment forour AM container 16/08/2917:44:58 INFO yarn.Client:Preparing resources forour AM container 16/08/2917:44:58 WARN yarn.Client:Neither spark.yarn.jars nor spark.yarn.archive isset, falling back to uploading libraries under SPARK_HOME. 16/08/2917:45:00 INFO yarn.Client:Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_libs__3503948162159958877.zip -> hdfs://sandbox.hortonw orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_libs__3503948162159958877.zip 16/08/2917:45:01 INFO yarn.Client:Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar-> hdfs://sandbox.hortonworks.com:8020/ user/root/.sparkStaging/application_1472397144295_0006/spark-examples_2.11-2.0.0.jar 16/08/2917:45:01 INFO yarn.Client:Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_conf__4613069544481307021.zip -> hdfs://sandbox.hortonw orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_conf__.zip 16/08/2917:45:01 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode 16/08/2917:45:01 INFO spark.SecurityManager:Changing view acls to: root 16/08/2917:45:01 INFO spark.SecurityManager:Changing modify acls to: root 16/08/2917:45:01 INFO spark.SecurityManager:Changing view acls groups to: 16/08/2917:45:01 INFO spark.SecurityManager:Changing modify acls groups to: 16/08/2917:45:01 INFO spark.SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with view permissions:Set(root); groups with view permiss ions:Set(); users with modify permissions:Set(root); groups with modify permissions:Set() 16/08/2917:45:01 INFO yarn.Client:Submitting application application_1472397144295_0006 to ResourceManager 16/08/2917:45:01 INFO impl.YarnClientImpl:Submitted application application_1472397144295_0006 16/08/2917:45:02 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:02 INFO yarn.Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Registerwith RM ApplicationMaster host: N/A ApplicationMaster RPC port:-1 queue:default start time:1472492701409 final status: UNDEFINED tracking URL:http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ user: root 16/08/2917:45:03 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:04 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:05 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:06 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:06 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host:10.0.2.15 ApplicationMaster RPC port:0 queue:default start time:1472492701409 final status: UNDEFINED tracking URL:http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ user: root 16/08/2917:45:07 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:08 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:09 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:10 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:11 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:12 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:13 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:14 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:15 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:16 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:17 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:18 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:19 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:20 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:21 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:22 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:23 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:24 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:25 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:26 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:27 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:28 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:29 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:30 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:31 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:32 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:33 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:34 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:35 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:36 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:37 INFO yarn.Client:Application report for application_1472397144295_0006 (state: FINISHED) 16/08/2917:45:37 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host:10.0.2.15 ApplicationMaster RPC port:0 queue:default start time:1472492701409 final status: SUCCEEDED tracking URL:http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ user: root 16/08/2917:45:37 INFO util.ShutdownHookManager:Shutdown hook called 16/08/2917:45:37 INFO util.ShutdownHookManager:Deleting directory /tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b [root@sandbox spark2-client]# Resolution for Spark Shell issue (lzo-codec): add the following 2 lines in your spark-defaults.conf spark.driver.extraClassPath /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 Spark Shell working example: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-cores 1 Settingdefault log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 16/08/2917:47:09 WARN yarn.Client:Neither spark.yarn.jars nor spark.yarn.archive isset, falling back to uploading libraries under SPARK_HOME. 16/08/2917:47:21 WARN spark.SparkContext:Use an existing SparkContext, some configuration may not take effect. Spark context Web UI available at http://10.0.2.15:4041 Spark context available as'sc'(master = yarn, app id = application_1472397144295_0007). Spark session available as'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/'_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101) Type in expressions to have them evaluated. Type :help for more information. scala> sc.getConf.getAll.foreach(println) (spark.eventLog.enabled,true) (spark.yarn.scheduler.heartbeat.interval-ms,5000) (hive.metastore.warehouse.dir,file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse) (spark.repl.class.outputDir,/tmp/spark-fa16d4d3-8ec8-4b0e-a1da-5a2dffe39d08/repl-5dd28f29-ae03-4965-a535-18a95173b173) (spark.yarn.am.extraJavaOptions,-Dhdp.version=2.5.0.0-817) (spark.yarn.containerLauncherMaxThreads,25) (spark.driver.extraJavaOptions,-Dhdp.version=2.5.0.0-817) (spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64) (spark.driver.appUIAddress,http://10.0.2.15:4041) (spark.driver.host,10.0.2.15) (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0007) (spark.yarn.preserve.staging.files,false) (spark.home,/usr/hdp/current/spark2-client) (spark.app.name,Spark shell) (spark.repl.class.uri,spark://10.0.2.15:37426/classes) (spark.ui.port,4041) (spark.yarn.max.executor.failures,3) (spark.submit.deployMode,client) (spark.yarn.executor.memoryOverhead,200) (spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) (spark.driver.extraClassPath,/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar) (spark.executor.memory,2g) (spark.yarn.driver.memoryOverhead,200) (spark.hadoop.yarn.timeline-service.enabled,false) (spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native) (spark.app.id,application_1472397144295_0007) (spark.executor.id,driver) (spark.yarn.queue,default) (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,sandbox.hortonworks.com) (spark.eventLog.dir,hdfs:///spark-history) (spark.master,yarn) (spark.driver.port,37426) (spark.yarn.submit.file.replication,3) (spark.sql.catalogImplementation,hive) (spark.driver.memory,2g) (spark.jars,) (spark.executor.cores,1) scala> val file = sc.textFile("/tmp/data") file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24 scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26 scala> counts.take(10) res1: Array[(String, Int)] = Array((hadoop.tasklog.noKeepSplits=4,1), (log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.se rver.resourcemanager.appsummary.logger},1), (Unless,1), (this,4), (hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log,1), (under,4), (log4j.appender.RFA. layout.ConversionPattern=%d{ISO8601},2), (log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout,1), (AppSummaryLogging,1), (log4j.appender.RMAUDIT.layout=org.apac he.log4j.PatternLayout,1)) scala>
... View more
08-30-2016
01:01 PM
1 Kudo
Resolution for Spark Submit issue: add java-opts file in /usr/hdp/current/spark2-client/conf/ [root@sandbox conf]# cat java-opts
-Dhdp.version=2.5.0.0-817 Spark Submit working example: [root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --ex
ecutor-cores 1 examples/jars/spark-examples*.jar 10
16/08/29 17:44:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/29 17:44:58 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/29 17:44:58 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/08/29 17:44:58 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/08/29 17:44:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)
16/08/29 17:44:58 INFO yarn.Client: Will allocate AM container, with 2248 MB memory including 200 MB overhead
16/08/29 17:44:58 INFO yarn.Client: Setting up container launch context for our AM
16/08/29 17:44:58 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/29 17:44:58 INFO yarn.Client: Preparing resources for our AM container
16/08/29 17:44:58 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/29 17:45:00 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_libs__3503948162159958877.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_libs__3503948162159958877.zip
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar -> hdfs://sandbox.hortonworks.com:8020/
user/root/.sparkStaging/application_1472397144295_0006/spark-examples_2.11-2.0.0.jar
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_conf__4613069544481307021.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_conf__.zip
16/08/29 17:45:01 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls to: root
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls to: root
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls groups to:
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/29 17:45:01 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permiss
ions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/08/29 17:45:01 INFO yarn.Client: Submitting application application_1472397144295_0006 to ResourceManager
16/08/29 17:45:01 INFO impl.YarnClientImpl: Submitted application application_1472397144295_0006
16/08/29 17:45:02 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:02 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1472492701409
final status: UNDEFINED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/
user: root
16/08/29 17:45:03 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:04 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:05 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:06 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:06 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.0.2.15
ApplicationMaster RPC port: 0
queue: default
start time: 1472492701409
final status: UNDEFINED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/
user: root
16/08/29 17:45:07 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:08 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:09 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:10 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:11 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:12 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:13 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:14 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:15 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:16 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:17 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:18 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:19 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:20 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:21 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:22 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:23 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:24 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:25 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:26 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:27 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:28 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:29 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:30 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:31 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:32 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:33 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:34 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:35 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:36 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:37 INFO yarn.Client: Application report for application_1472397144295_0006 (state: FINISHED)
16/08/29 17:45:37 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.0.2.15
ApplicationMaster RPC port: 0
queue: default
start time: 1472492701409
final status: SUCCEEDED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/
user: root
16/08/29 17:45:37 INFO util.ShutdownHookManager: Shutdown hook called
16/08/29 17:45:37 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b
[root@sandbox spark2-client]#
Resolution for Spark Shell issue (lzo-codec): add the following 2 lines in your spark-defaults.conf spark.driver.extraClassPath /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 Spark Shell working example: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-cores 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/29 17:47:09 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/29 17:47:21 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.0.2.15:4041
Spark context available as 'sc' (master = yarn, app id = application_1472397144295_0007).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala> sc.getConf.getAll.foreach(println)
(spark.eventLog.enabled,true)
(spark.yarn.scheduler.heartbeat.interval-ms,5000)
(hive.metastore.warehouse.dir,file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse)
(spark.repl.class.outputDir,/tmp/spark-fa16d4d3-8ec8-4b0e-a1da-5a2dffe39d08/repl-5dd28f29-ae03-4965-a535-18a95173b173)
(spark.yarn.am.extraJavaOptions,-Dhdp.version=2.5.0.0-817)
(spark.yarn.containerLauncherMaxThreads,25)
(spark.driver.extraJavaOptions,-Dhdp.version=2.5.0.0-817)
(spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)
(spark.driver.appUIAddress,http://10.0.2.15:4041)
(spark.driver.host,10.0.2.15)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0007)
(spark.yarn.preserve.staging.files,false)
(spark.home,/usr/hdp/current/spark2-client)
(spark.app.name,Spark shell)
(spark.repl.class.uri,spark://10.0.2.15:37426/classes)
(spark.ui.port,4041)
(spark.yarn.max.executor.failures,3)
(spark.submit.deployMode,client)
(spark.yarn.executor.memoryOverhead,200)
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
(spark.driver.extraClassPath,/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar)
(spark.executor.memory,2g)
(spark.yarn.driver.memoryOverhead,200)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native)
(spark.app.id,application_1472397144295_0007)
(spark.executor.id,driver)
(spark.yarn.queue,default)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,sandbox.hortonworks.com)
(spark.eventLog.dir,hdfs:///spark-history)
(spark.master,yarn)
(spark.driver.port,37426)
(spark.yarn.submit.file.replication,3)
(spark.sql.catalogImplementation,hive)
(spark.driver.memory,2g)
(spark.jars,)
(spark.executor.cores,1)
scala> val file = sc.textFile("/tmp/data")
file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24
scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26
scala> counts.take(10)
res1: Array[(String, Int)] = Array((hadoop.tasklog.noKeepSplits=4,1), (log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.se
rver.resourcemanager.appsummary.logger},1), (Unless,1), (this,4), (hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log,1), (under,4), (log4j.appender.RFA.
layout.ConversionPattern=%d{ISO8601},2), (log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout,1), (AppSummaryLogging,1), (log4j.appender.RMAUDIT.layout=org.apac
he.log4j.PatternLayout,1))
scala>
... View more
08-30-2016
08:15 AM
1 Kudo
Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found I have installed Spark 2.0.0 in Sandbox HDP-2.5.0 in accordance to Paul Hargis great post: https://community.hortonworks.com/articles/53029/how-to-install-and-run-spark-20-on-hdp-25-sandbox.html Thanks Paul. Spark-Submit in Yarn-Client mode works as per log here: [root@sandbox ~]# cd /usr/hdp/current/spark2-client
[root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-core
s 1 examples/jars/spark-examples*.jar 10
16/08/28 14:38:42 INFO spark.SparkContext: Running Spark version 2.0.0
16/08/28 14:38:42 INFO spark.SecurityManager: Changing view acls to: root
16/08/28 14:38:42 INFO spark.SecurityManager: Changing modify acls to: root
16/08/28 14:38:42 INFO spark.SecurityManager: Changing view acls groups to:
16/08/28 14:38:42 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/28 14:38:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(
); users with modify permissions: Set(root); groups with modify permissions: Set()
16/08/28 14:38:43 INFO util.Utils: Successfully started service 'sparkDriver' on port 36008.
16/08/28 14:38:43 INFO spark.SparkEnv: Registering MapOutputTracker
16/08/28 14:38:43 INFO spark.SparkEnv: Registering BlockManagerMaster
16/08/28 14:38:43 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b5149ef4-928d-455e-bf83-2159e12f88f7
16/08/28 14:38:43 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB
16/08/28 14:38:43 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/08/28 14:38:43 INFO util.log: Logging initialized @2226ms
16/08/28 14:38:43 INFO server.Server: jetty-9.2.z-SNAPSHOT
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e1e5b02{/jobs,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ae918c9{/jobs/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d5a39b7{/jobs/job,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5e83450d{/jobs/job/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7c2a88f4{/stages,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c858adb{/stages/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@535f571c{/stages/stage,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@18501a07{/stages/stage/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32dcce09{/stages/pool,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e5acaf5{/stages/pool/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3ac2bace{/storage,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46764885{/storage/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7f9337e6{/storage/rdd,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a3b1e79{/storage/rdd/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1f4da763{/environment,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@232864a3{/environment/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@30e71b5d{/executors,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14b58fc0{/executors/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bf090df{/executors/threadDump,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4eb72ecd{/executors/threadDump/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c61bd1a{/static,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14c62558{/,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5cbdbf0f{/api,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d4aa15a{/stages/stage/kill,null,AVAILABLE}
16/08/28 14:38:43 INFO server.ServerConnector: Started ServerConnector@51fcbb35{HTTP/1.1}{0.0.0.0:4041}
16/08/28 14:38:43 INFO server.Server: Started @2388ms
16/08/28 14:38:43 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at <a href="http://10.0.2.15:4041/">http://10.0.2.15:4041</a>
16/08/28 14:38:43 INFO spark.SparkContext: Added JAR file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar at spark://10.0.2.15:36008/jars/spark-examples_2.11
-2.0.0.jar with timestamp 1472395123767
16/08/28 14:38:44 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers0.0.2.15:8050
16/08/28 14:38:44 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)
16/08/28 14:38:44 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/08/28 14:38:44 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/28 14:38:44 INFO yarn.Client: Preparing resources for our AM container
16/08/28 14:38:44 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
020/user/root/.sparkStaging/application_1472394965674_0001/__spark_libs__6748274495232790272.zip419767250f0/__spark_libs__6748274495232790272.zip -> hdfs://sandbox.hortonworks.com:8
16/08/28 14:38:48 INFO yarn.Client: Uploading resource file:/tmp/spark-a10e8972-1076-4a61-a014-8419767250f0/__spark_conf__6530127439911581770.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0001/__spark_conf__.zip
16/08/28 14:38:48 INFO spark.SecurityManager: Changing modify acls to: root
16/08/28 14:38:48 INFO spark.SecurityManager: Changing view acls groups to:
16/08/28 14:38:48 INFO spark.SecurityManager: Changing modify acls groups to:
); users with modify permissions: Set(root); groups with modify permissions: Set()led; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(
16/08/28 14:38:48 INFO yarn.Client: Submitting application application_1472394965674_0001 to ResourceManager
16/08/28 14:38:48 INFO impl.YarnClientImpl: Submitted application application_1472394965674_0001
16/08/28 14:38:49 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)ation_1472394965674_0001 and attemptId None
16/08/28 14:38:49 INFO yarn.Client:
client token: N/A
ApplicationMaster host: N/As launched, waiting for AM container to Register with RM
ApplicationMaster RPC port: -1
queue: default
final status: UNDEFINED18
tracking URL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/</a>
user: root
16/08/28 14:38:51 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)
16/08/28 14:38:52 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)
16/08/28 14:38:52 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
PROXY_URI_BASES -> <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001</a>), /proxy/application_1472394965674_0001lter, Map(PROXY_HOSTS -> sandbox.hortonworks.com,
16/08/28 14:38:52 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/08/28 14:38:53 INFO yarn.Client: Application report for application_1472394965674_0001 (state: RUNNING)
16/08/28 client token: N/An.Client:
diagnostics: N/A
ApplicationMaster host: 10.0.2.15
queue: defaultter RPC port: 0
start time: 1472395128618
final status: UNDEFINED
user: rootRL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/</a>
16/08/28 14:38:53 INFO cluster.YarnClientSchedulerBackend: Application application_1472394965674_0001 has started running.
16/08/28 14:38:53 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35756.
16/08/28 14:38:53 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 35756)
16/08/28 14:38:53 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:35756 with 912.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 35756)
16/08/28 14:38:53 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 35756)
16/08/28 14:38:54 INFO scheduler.EventLoggingListener: Logging events to hdfs:///spark-history/application_1472394965674_0001
16/08/28 14:38:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:36932) with ID 1
16/08/28 14:38:56 INFO storage.BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:41061 with 912.3 MB RAM, BlockManagerId(1, sandbox.hortonworks.com, 4106
16/08/28 14:38:57 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:36936) with ID 2
16/08/28 14:38:57 INFO storage.BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:41746 with 912.3 MB RAM, BlockManagerId(2, sandbox.hortonworks.com, 4174
6)
16/08/28 14:38:57 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.ter reached minRegisteredResourcesRatio: 0.8
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46a61277{/SQL,null,AVAILABLE}
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b4b5885{/SQL/json,null,AVAILABLE}
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2bcd7bea{/SQL/execution/json,null,AVAILABLE}
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@59bde227{/static/sql,null,AVAILABLE}
16/08/28 14:38:57 INFO internal.SharedState: Warehouse path is 'file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse'.
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
16/08/28 14:38:57 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 912.3 MB)
16/08/28 14:38:57 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1169.0 B, free 912.3 MB)
16/08/28 14:38:57 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012size: 1169.0 B, free: 912.3 MB)
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
16/08/28 14:38:57 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks
16/08/28 14:38:57 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sandbox.hortonworks.com, partition 1, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:58 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 2 hostname: sandbox.hortonworks.com.
16/08/28 14:38:58 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: sandbox.hortonworks.com.
16/08/28 14:38:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox.hortonworks.com:41746 (size: 1169.0 B, free: 912.3 MB)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, sandbox.hortonworks.com, partition 2, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 2 on executor id: 1 hostname: sandbox.hortonworks.com.
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 3 on executor id: 2 hostname: sandbox.hortonworks.com.5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1084 ms on sandbox.hortonworks.com (1/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1061 ms on sandbox.hortonworks.com (2/10)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 4 on executor id: 1 hostname: sandbox.hortonworks.com.5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 88 ms on sandbox.hortonworks.com (3/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, sandbox.hortonworks.com, partition 5, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 101 ms on sandbox.hortonworks.com (4/10)works.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, sandbox.hortonworks.com, partition 6, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 6 on executor id: 1 hostname: sandbox.hortonworks.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, sandbox.hortonworks.com, partition 7, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 7 on executor id: 2 hostname: sandbox.hortonworks.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 48 ms on sandbox.hortonworks.com (6/10)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 8 on executor id: 1 hostname: sandbox.hortonworks.com.5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 48 ms on sandbox.hortonworks.com (7/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, sandbox.hortonworks.com, partition 9, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 40 ms on sandbox.hortonworks.com (8/10)nworks.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 38 ms on sandbox.hortonworks.com (9/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 31 ms on sandbox.hortonworks.com (10/10)
16/08/28 14:38:59 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.293 s
16/08/28 14:38:59 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.605653 s
Pi is roughly 3.1418151418151417
16/08/28 14:38:59 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2d4aa15a{/stages/stage/kill,null,UNAVAILABLE}
Spark-Submit in Yarn-cluster mode fails as per log here: [root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cor
es 1 examples/jars/spark-examples*.jar 10
16/08/28 14:41:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/28 14:41:08 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/28 14:41:08 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/08/28 14:41:09 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/08/28 14:41:09 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)
16/08/28 14:41:09 INFO yarn.Client: Will allocate AM container, with 2248 MB memory including 200 MB overhead
16/08/28 14:41:09 INFO yarn.Client: Setting up container launch context for our AM
16/08/28 14:41:09 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/28 14:41:09 INFO yarn.Client: Preparing resources for our AM container
16/08/28 14:41:09 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/28 14:41:10 INFO yarn.Client: Uploading resource file:/tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc/__spark_libs__4204158628332382181.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0002/__spark_libs__4204158628332382181.zip
16/08/28 14:41:11 INFO yarn.Client: Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/
.sparkStaging/application_1472394965674_0002/spark-examples_2.11-2.0.0.jar
16/08/28 14:41:12 INFO yarn.Client: Uploading resource file:/tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc/__spark_conf__2789110900476377363.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0002/__spark_conf__.zip
16/08/28 14:41:12 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/08/28 14:41:12 INFO spark.SecurityManager: Changing view acls to: root
16/08/28 14:41:12 INFO spark.SecurityManager: Changing modify acls to: root
16/08/28 14:41:12 INFO spark.SecurityManager: Changing view acls groups to:
16/08/28 14:41:12 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/28 14:41:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(
); users with modify permissions: Set(root); groups with modify permissions: Set()
16/08/28 14:41:12 INFO yarn.Client: Submitting application application_1472394965674_0002 to ResourceManager
16/08/28 14:41:12 INFO impl.YarnClientImpl: Submitted application application_1472394965674_0002
16/08/28 14:41:13 INFO yarn.Client: Application report for application_1472394965674_0002 (state: ACCEPTED)
16/08/28 14:41:13 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1472395272580
final status: UNDEFINED
tracking URL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0002/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0002/</a>
user: root
16/08/28 14:41:14 INFO yarn.Client: Application report for application_1472394965674_0002 (state: ACCEPTED)
16/08/28 14:41:15 INFO yarn.Client: Application report for application_1472394965674_0002 (state: FAILED)
16/08/28 14:41:15 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1472394965674_0002 failed 2 times due to AM Container for appattempt_1472394965674_0002_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: <a href="http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002">http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002</a> Then click on links to logs of each att
empt.
Diagnostics: Exception from container-launch.
Container id: container_e17_1472394965674_0002_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/root/appcache/application_1472394965674_0002/container_e17_1472394965674_0002_02_000001/launch_container.sh: line 25: $PWD:$PWD/__spa
rk_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-
doop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework
/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/
hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/root/appcache/application_1472394965674_0002/container_e17_1472394965674_0002_02_000001/launch_container.sh:
line 25: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:
/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-f
yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/
hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.run(Shell.java:820)va:909)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1099)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81))
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.lang.Thread.run(Thread.java:745)or$Worker.run(ThreadPoolExecutor.java:615)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
start time: 1472395272580
final status: FAILED
tracking URL: <a href="http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002">http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002</a>
16/08/28 14:41:15 INFO yarn.Client: Deleting staging directory hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1472394965674_0002
Exception in thread "main" org.apache.spark.SparkException: Application application_1472394965674_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client.main(Client.scala):1175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at java.lang.reflect.Method.invoke(Method.java:606)DelegatingMethodAccessorImpl.java:43)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)0)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/08/28 14:41:15 INFO util.ShutdownHookManager: Shutdown hook called
[root@sandbox spark2-client]# utdownHookManager: Deleting directory /tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc
Any help to resolve this would be appreciated. In Spark-Shell mode, called with the following command: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn I am encountering a LzoCodec not found error, as per log here: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn
Setting default log level to "WARN".
16/08/28 14:44:42 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/28 14:44:54 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at <a href="http://10.0.2.15:4041/">http://10.0.2.15:4041</a>
Spark session available as 'spark'.ster = yarn, app id = application_1472394965674_0003).
Welcome to
____ __
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val file = sc.textFile("/tmp/data")
file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24
java.lang.RuntimeException: Error in configuring object)).map(word => (word, 1)).reduceByKey(_ + _)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:186).java:136)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)
at scala.Option.getOrElse(Option.scala:121)ions$2.apply(RDD.scala:246)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)D.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
at org.apache.spark.rdd.PairRDDFunctions$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:328)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)tions.scala:328)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
... 48 elided.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:327)
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
... 83 morehe.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
... 85 morehe.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
scala>
Any help to resolve this would be appreciated. Thanks. Amit
... View more
Labels:
08-29-2016
02:07 PM
Zeppelin + PySpark (1.6.* or 2.0.0) - I want to know how I can add Python libraries such as Numpy/Pandas/SKLearn... Additional question: If I install Anaconda Python and its repo - How do I need to configure the Zeppelin interpreters so that PySpark works well with the anaconda python repo
... View more
Labels:
- Labels:
-
Apache Zeppelin
08-28-2016
09:29 PM
2 Kudos
Sandbox HDP-2.5.0 TP Spark 1.6.2 - I am encounterning the following ERROR GPLNativeCodeLoader: Could not load native gpl library - ERROR LzoCodec: Cannot load native-lzo without native-hadoop while running a simple word count on spark-shell [root@sandbox ~]# cd $SPARK_HOME [root@sandbox spark-client]# ./bin/spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m --jars /us r/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar The following code is submitted at the Spark CLI val file = sc.textFile("/tmp/data") val counts = file.flatMap(line => line.split(" ")).map(word =>(word,1)). reduceByKey(_ + _) counts.saveAsTextFile("/tmp/wordcount")
This yields the following error: ERROR GPLNativeCodeLoader: Could not load native gpl library ERROR LzoCodec: Cannot load native-lzo without native-hadoop The same error appear with or without adding the --jars parameter as here under: --jars /us r/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar Full Log: [root@sandbox ~]# cd $SPARK_HOME [root@sandbox spark-client]# ./bin/spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m --jars /us r/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar 16/08/2716:28:23 INFO SecurityManager:Changing view acls to: root 16/08/2716:28:23 INFO SecurityManager:Changing modify acls to: root 16/08/2716:28:23 INFO SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with view permis sions:Set(root); users with modify permissions:Set(root) 16/08/2716:28:23 INFO HttpServer:Starting HTTP Server 16/08/2716:28:23 INFO Server: jetty-8.y.z-SNAPSHOT 16/08/2716:28:23 INFO AbstractConnector:StartedSocketConnector@0.0.0.0:43011 16/08/2716:28:23 INFO Utils:Successfully started service 'HTTP class server' on port 43011. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/'_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.2 /_/ Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_101) Type in expressions to have them evaluated. Type :help for more information. 16/08/27 16:28:26 INFO SparkContext: Running Spark version 1.6.2 16/08/27 16:28:26 INFO SecurityManager: Changing view acls to: root 16/08/27 16:28:26 INFO SecurityManager: Changing modify acls to: root 16/08/27 16:28:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permis sions: Set(root); users with modify permissions: Set(root) 16/08/27 16:28:26 INFO Utils: Successfully started service 'sparkDriver' on port 45506. 16/08/27 16:28:27 INFO Slf4jLogger: Slf4jLogger started 16/08/27 16:28:27 INFO Remoting: Starting remoting 16/08/27 16:28:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:44 829] 16/08/27 16:28:27 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 44829. 16/08/27 16:28:27 INFO SparkEnv: Registering MapOutputTracker 16/08/27 16:28:27 INFO SparkEnv: Registering BlockManagerMaster 16/08/27 16:28:27 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-0776b175-5dd7-49b9-adf7-f2cbd85a1e1b 16/08/27 16:28:27 INFO MemoryStore: MemoryStore started with capacity 143.6 MB 16/08/27 16:28:27 INFO SparkEnv: Registering OutputCommitCoordinator 16/08/27 16:28:27 INFO Server: jetty-8.y.z-SNAPSHOT 16/08/27 16:28:27 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/08/27 16:28:27 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/08/27 16:28:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040 16/08/27 16:28:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-61ecb98e-989c-4396-9b30-032c4d5a2b90/httpd -857ce699-7db0-428c-9af5-1dca4ec5330d 16/08/27 16:28:27 INFO HttpServer: Starting HTTP Server 16/08/27 16:28:27 INFO Server: jetty-8.y.z-SNAPSHOT 16/08/27 16:28:27 INFO AbstractConnector: Started SocketConnector@0.0.0.0:37515 16/08/27 16:28:27 INFO Utils: Successfully started service 'HTTP file server' on port 37515. 16/08/27 16:28:27 INFO SparkContext: Added JAR file:/usr/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar at ht tp://10.0.2.15:37515/jars/hadoop-lzo-0.6.0.2.5.0.0-817.jar with timestamp 1472315307772 spark.yarn.driver.memoryOverhead is set but does not apply in client mode. 16/08/27 16:28:28 INFO TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 16/08/27 16:28:28 INFO RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/08/27 16:28:28 INFO Client: Requesting a new application from cluster with 1 NodeManagers 16/08/27 16:28:28 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2250 MB per container) 16/08/27 16:28:28 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/08/27 16:28:28 INFO Client: Setting up container launch context for our AM 16/08/27 16:28:28 INFO Client: Setting up the launch environment for our AM container 16/08/27 16:28:28 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs:/ /sandbox.hortonworks.com:8020/hdp/apps/2.5.0.0-817/spark/spark-hdp-assembly.jar 16/08/27 16:28:28 INFO Client: Preparing resources for our AM container 16/08/27 16:28:28 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs:/ /sandbox.hortonworks.com:8020/hdp/apps/2.5.0.0-817/spark/spark-hdp-assembly.jar 16/08/27 16:28:28 INFO Client: Source and destination file systems are the same. Not copying hdfs://sandbox.hortonworks.co m:8020/hdp/apps/2.5.0.0-817/spark/spark-hdp-assembly.jar 16/08/27 16:28:29 INFO Client: Uploading resource file:/tmp/spark-61ecb98e-989c-4396-9b30-032c4d5a2b90/__spark_conf__50848 04354575467223.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1472312154461_0006/__spark_c onf__5084804354575467223.zip 16/08/27 16:28:29 INFO SecurityManager: Changing view acls to: root 16/08/27 16:28:29 INFO SecurityManager: Changing modify acls to: root 16/08/27 16:28:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permis sions: Set(root); users with modify permissions: Set(root) 16/08/27 16:28:29 INFO Client: Submitting application 6 to ResourceManager 16/08/27 16:28:29 INFO YarnClientImpl: Submitted application application_1472312154461_0006 16/08/27 16:28:29 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1472312154461_000 6 and attemptId None 16/08/27 16:28:30 INFO Client: Application report for application_1472312154461_0006 (state: ACCEPTED) 16/08/27 16:28:30 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1472315309252 final status: UNDEFINED tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472312154461_0006/ user: root 16/08/27 16:28:31 INFO Client: Application report for application_1472312154461_0006 (state: ACCEPTED) 16/08/27 16:28:32 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(nul l) 16/08/27 16:28:32 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpF ilter, Map(PROXY_HOSTS -> sandbox.hortonworks.com, PROXY_URI_BASES -> http://sandbox.hortonworks.com:8088/proxy/applicatio n_1472312154461_0006), /proxy/application_1472312154461_0006 16/08/27 16:28:32 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 16/08/27 16:28:32 INFO Client: Application report for application_1472312154461_0006 (state: RUNNING) 16/08/27 16:28:32 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.0.2.15 ApplicationMaster RPC port: 0 queue: default start time: 1472315309252 final status: UNDEFINED tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472312154461_0006/ user: root 16/08/27 16:28:32 INFO YarnClientSchedulerBackend: Application application_1472312154461_0006 has started running. 16/08/27 16:28:32 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on p ort 34124. 16/08/27 16:28:32 INFO NettyBlockTransferService: Server created on 34124 16/08/27 16:28:32 INFO BlockManagerMaster: Trying to register BlockManager 16/08/27 16:28:32 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:34124 with 143.6 MB RAM, BlockManag erId(driver, 10.0.2.15, 34124) 16/08/27 16:28:32 INFO BlockManagerMaster: Registered BlockManager 16/08/27 16:28:32 INFO EventLoggingListener: Logging events to hdfs:///spark-history/application_1472312154461_0006 16/08/27 16:28:36 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (sandbox.hortonworks.com: 39728) with ID 1 16/08/27 16:28:36 INFO BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:38362 with 143.6 MB R AM, BlockManagerId(1, sandbox.hortonworks.com, 38362) 16/08/27 16:28:57 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxReg isteredResourcesWaitingTime: 30000(ms) 16/08/27 16:28:57 INFO SparkILoop: Created spark context.. Spark context available as sc. 16/08/27 16:28:58 INFO HiveContext: Initializing execution hive, version 1.2.1 16/08/27 16:28:58 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.5.0.0-817 16/08/27 16:28:58 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.5.0.0-8 17 16/08/27 16:28:58 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.Objec tStore 16/08/27 16:28:58 INFO ObjectStore: ObjectStore, initialize called 16/08/27 16:28:58 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 16/08/27 16:28:58 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 16/08/27 16:28:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 16/08/27 16:28:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 16/08/27 16:29:00 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,Stor ageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 16/08/27 16:29:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-o nly" so does not have its own datastore table. 16/08/27 16:29:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" s o does not have its own datastore table. 16/08/27 16:29:02 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-o nly" so does not have its own datastore table. 16/08/27 16:29:02 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" s o does not have its own datastore table. 16/08/27 16:29:02 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 16/08/27 16:29:02 INFO ObjectStore: Initialized ObjectStore 16/08/27 16:29:02 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/08/27 16:29:02 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 16/08/27 16:29:03 INFO HiveMetaStore: Added admin role in metastore 16/08/27 16:29:03 INFO HiveMetaStore: Added public role in metastore 16/08/27 16:29:03 INFO HiveMetaStore: No user is added in admin role, since config is empty 16/08/27 16:29:03 INFO HiveMetaStore: 0: get_all_databases 16/08/27 16:29:03 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases 16/08/27 16:29:03 INFO HiveMetaStore: 0: get_functions: db=default pat=* 16/08/27 16:29:03 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=* 16/08/27 16:29:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-o nly" so does not have its own datastore table. 16/08/27 16:29:03 INFO SessionState: Created local directory: /tmp/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec_resources 16/08/27 16:29:03 INFO SessionState: Created HDFS directory: /tmp/hive/root/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec 16/08/27 16:29:03 INFO SessionState: Created local directory: /tmp/root/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec 16/08/27 16:29:03 INFO SessionState: Created HDFS directory: /tmp/hive/root/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec/_tmp_spac e.db 16/08/27 16:29:03 INFO HiveContext: default warehouse location is /user/hive/warehouse 16/08/27 16:29:03 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 16/08/27 16:29:03 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.5.0.0-817 16/08/27 16:29:03 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.5.0.0-8 17 16/08/27 16:29:04 INFO metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 16/08/27 16:29:04 INFO metastore: Connected to metastore. 16/08/27 16:29:04 INFO SessionState: Created local directory: /tmp/83a1e2d3-8c24-4f12-9841-fab259a77514_resources 16/08/27 16:29:04 INFO SessionState: Created HDFS directory: /tmp/hive/root/83a1e2d3-8c24-4f12-9841-fab259a77514 16/08/27 16:29:04 INFO SessionState: Created local directory: /tmp/root/83a1e2d3-8c24-4f12-9841-fab259a77514 16/08/27 16:29:04 INFO SessionState: Created HDFS directory: /tmp/hive/root/83a1e2d3-8c24-4f12-9841-fab259a77514/_tmp_spac e.db 16/08/27 16:29:04 INFO SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala> val file = sc.textFile("/tmp/data") 16/08/27 16:29:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 234.8 KB, free 234.8 KB) 16/08/27 16:29:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 28.1 KB, free 262.9 KB) 16/08/27 16:29:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:34124 (size: 28.1 KB, free: 143.6 MB) 16/08/27 16:29:20 INFO SparkContext: Created broadcast 0 from textFile at <console>:27 file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:27 scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) 16/08/27 16:29:35 ERROR GPLNativeCodeLoader: Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1889) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32) at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2147) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2112) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:189) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65) at org.apache.spark.rdd.PairRDDFunctions$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.PairRDDFunctions$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:323) at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:330) at $line19.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:29) at $line19.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:34) at $line19.$read$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:36) at $line19.$read$iwC$iwC$iwC$iwC$iwC.<init>(<console>:38) at $line19.$read$iwC$iwC$iwC$iwC.<init>(<console>:40) at $line19.$read$iwC$iwC$iwC.<init>(<console>:42) at $line19.$read$iwC$iwC.<init>(<console>:44) at $line19.$read$iwC.<init>(<console>:46) at $line19.$read.<init>(<console>:48) at $line19.$read$.<init>(<console>:52) at $line19.$read$.<clinit>(<console>) at $line19.$eval$.<init>(<console>:7) at $line19.$eval$.<clinit>(<console>) at $line19.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.s cala:997) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:94 5) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:94 5) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/08/27 16:29:35 ERROR LzoCodec: Cannot load native-lzo without native-hadoop 16/08/27 16:29:35 INFO FileInputFormat: Total input paths to process : 1 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:29 scala> Please help to fix this issue.
... View more
Labels:
08-24-2016
06:05 AM
How do we set SPARK_MAJOR_VERSION. In which conf file. Are there any other related conf files to maintain?
... View more