Support Questions
Find answers, ask questions, and share your expertise

Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found

Solved Go to solution
Highlighted

Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found

Contributor

Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found

I have installed Spark 2.0.0 in Sandbox HDP-2.5.0 in accordance to Paul Hargis great post:

https://community.hortonworks.com/articles/53029/how-to-install-and-run-spark-20-on-hdp-25-sandbox.h...

Thanks Paul.

Spark-Submit in Yarn-Client mode works as per log here:

[root@sandbox ~]# cd /usr/hdp/current/spark2-client                                                                                                                                  
[root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-core
s 1 examples/jars/spark-examples*.jar 10
16/08/28 14:38:42 INFO spark.SparkContext: Running Spark version 2.0.0                                                                                                               
16/08/28 14:38:42 INFO spark.SecurityManager: Changing view acls to: root                                                                                                            
16/08/28 14:38:42 INFO spark.SecurityManager: Changing modify acls to: root                                                                                                          
16/08/28 14:38:42 INFO spark.SecurityManager: Changing view acls groups to:                                                                                                          
16/08/28 14:38:42 INFO spark.SecurityManager: Changing modify acls groups to:                                                                                                        
16/08/28 14:38:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(
); users  with modify permissions: Set(root); groups with modify permissions: Set()                                                                                                  
16/08/28 14:38:43 INFO util.Utils: Successfully started service 'sparkDriver' on port 36008.                                                                                         
16/08/28 14:38:43 INFO spark.SparkEnv: Registering MapOutputTracker                                                                                                                  
16/08/28 14:38:43 INFO spark.SparkEnv: Registering BlockManagerMaster                                                                                                                
16/08/28 14:38:43 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b5149ef4-928d-455e-bf83-2159e12f88f7                                                       
16/08/28 14:38:43 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB                                                                                                
16/08/28 14:38:43 INFO spark.SparkEnv: Registering OutputCommitCoordinator                                                                                                           
16/08/28 14:38:43 INFO util.log: Logging initialized @2226ms                                                                                                                         
16/08/28 14:38:43 INFO server.Server: jetty-9.2.z-SNAPSHOT                                                                                                                           
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e1e5b02{/jobs,null,AVAILABLE}                                                                  
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ae918c9{/jobs/json,null,AVAILABLE}                                                              
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d5a39b7{/jobs/job,null,AVAILABLE}                                                              
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5e83450d{/jobs/job/json,null,AVAILABLE}                                                         
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7c2a88f4{/stages,null,AVAILABLE}                                                                
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c858adb{/stages/json,null,AVAILABLE}                                                           
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@535f571c{/stages/stage,null,AVAILABLE}                                                          
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@18501a07{/stages/stage/json,null,AVAILABLE}                                                     
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32dcce09{/stages/pool,null,AVAILABLE}                                                           
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e5acaf5{/stages/pool/json,null,AVAILABLE}                                                      
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3ac2bace{/storage,null,AVAILABLE}                                                               
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46764885{/storage/json,null,AVAILABLE}                                                          
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7f9337e6{/storage/rdd,null,AVAILABLE}                                                           
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a3b1e79{/storage/rdd/json,null,AVAILABLE}                                                      
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1f4da763{/environment,null,AVAILABLE}                                                           
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@232864a3{/environment/json,null,AVAILABLE}                                                      
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@30e71b5d{/executors,null,AVAILABLE}                                                             
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14b58fc0{/executors/json,null,AVAILABLE}                                                        
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bf090df{/executors/threadDump,null,AVAILABLE}                                                  
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4eb72ecd{/executors/threadDump/json,null,AVAILABLE}                                             
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c61bd1a{/static,null,AVAILABLE}                                                                
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14c62558{/,null,AVAILABLE}                                                                      
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5cbdbf0f{/api,null,AVAILABLE}                                                                   
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d4aa15a{/stages/stage/kill,null,AVAILABLE}                                                     
16/08/28 14:38:43 INFO server.ServerConnector: Started ServerConnector@51fcbb35{HTTP/1.1}{0.0.0.0:4041}                                                                              
16/08/28 14:38:43 INFO server.Server: Started @2388ms                                                                                                                                
16/08/28 14:38:43 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at <a href="http://10.0.2.15:4041/">http://10.0.2.15:4041</a> 

16/08/28 14:38:43 INFO spark.SparkContext: Added JAR file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar at spark://10.0.2.15:36008/jars/spark-examples_2.11
-2.0.0.jar with timestamp 1472395123767                                                                                                                                              
16/08/28 14:38:44 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers0.0.2.15:8050                                                                       

16/08/28 14:38:44 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)                       
16/08/28 14:38:44 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead                                                                         
16/08/28 14:38:44 INFO yarn.Client: Setting up the launch environment for our AM container                                                                                           

16/08/28 14:38:44 INFO yarn.Client: Preparing resources for our AM container                                                                                                         
16/08/28 14:38:44 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.                                     
020/user/root/.sparkStaging/application_1472394965674_0001/__spark_libs__6748274495232790272.zip419767250f0/__spark_libs__6748274495232790272.zip -> hdfs://sandbox.hortonworks.com:8

16/08/28 14:38:48 INFO yarn.Client: Uploading resource file:/tmp/spark-a10e8972-1076-4a61-a014-8419767250f0/__spark_conf__6530127439911581770.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0001/__spark_conf__.zip                                                                                                        
16/08/28 14:38:48 INFO spark.SecurityManager: Changing modify acls to: root                                                                                                          

16/08/28 14:38:48 INFO spark.SecurityManager: Changing view acls groups to:                                                                                                          
16/08/28 14:38:48 INFO spark.SecurityManager: Changing modify acls groups to:                                                                                                        
); users  with modify permissions: Set(root); groups with modify permissions: Set()led; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(

16/08/28 14:38:48 INFO yarn.Client: Submitting application application_1472394965674_0001 to ResourceManager                                                                         
16/08/28 14:38:48 INFO impl.YarnClientImpl: Submitted application application_1472394965674_0001                                                                                     
16/08/28 14:38:49 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)ation_1472394965674_0001 and attemptId None                               

16/08/28 14:38:49 INFO yarn.Client:                                                                                                                                                  
         client token: N/A                                                                                                                                                           
         ApplicationMaster host: N/As launched, waiting for AM container to Register with RM                                                                                         

         ApplicationMaster RPC port: -1                                                                                                                                              
         queue: default                                                                                                                                                              
         final status: UNDEFINED18                                                                                                                                                   

 tracking URL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/</a> 
         user: root                                                                                                                                                                  
16/08/28 14:38:51 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)                                                                          

16/08/28 14:38:52 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)                                                                          
16/08/28 14:38:52 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)                                                 
PROXY_URI_BASES -> <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001</a>), /proxy/application_1472394965674_0001lter, Map(PROXY_HOSTS -> sandbox.hortonworks.com, 

16/08/28 14:38:52 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter                                                                      
16/08/28 14:38:53 INFO yarn.Client: Application report for application_1472394965674_0001 (state: RUNNING)                                                                           
16/08/28 client token: N/An.Client:                                                                                                                                                  

         diagnostics: N/A                                                                                                                                                            
         ApplicationMaster host: 10.0.2.15                                                                                                                                           
         queue: defaultter RPC port: 0                                                                                                                                               

         start time: 1472395128618                                                                                                                                                   
         final status: UNDEFINED                                                                                                                                                     
 user: rootRL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/</a> 

16/08/28 14:38:53 INFO cluster.YarnClientSchedulerBackend: Application application_1472394965674_0001 has started running.                                                           
16/08/28 14:38:53 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35756.                                            
16/08/28 14:38:53 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 35756)                                                                 

16/08/28 14:38:53 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:35756 with 912.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 35756)                     
16/08/28 14:38:53 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 35756)                                                                  
16/08/28 14:38:54 INFO scheduler.EventLoggingListener: Logging events to hdfs:///spark-history/application_1472394965674_0001                                                        

16/08/28 14:38:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:36932) with ID 1                                    
16/08/28 14:38:56 INFO storage.BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:41061 with 912.3 MB RAM, BlockManagerId(1, sandbox.hortonworks.com, 4106
16/08/28 14:38:57 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:36936) with ID 2                                    

16/08/28 14:38:57 INFO storage.BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:41746 with 912.3 MB RAM, BlockManagerId(2, sandbox.hortonworks.com, 4174
6)                                                                                                                                                                                   
16/08/28 14:38:57 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.ter reached minRegisteredResourcesRatio: 0.8                         

16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46a61277{/SQL,null,AVAILABLE}                                                                   
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b4b5885{/SQL/json,null,AVAILABLE}                                                               
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2bcd7bea{/SQL/execution/json,null,AVAILABLE}                                                    

16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@59bde227{/static/sql,null,AVAILABLE}                                                            
16/08/28 14:38:57 INFO internal.SharedState: Warehouse path is 'file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse'.                                                                   
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions                                                                      

16/08/28 14:38:57 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)                                                                               
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Parents of final stage: List()                                                                                                        
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents                               

16/08/28 14:38:57 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 912.3 MB)                                                     
16/08/28 14:38:57 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1169.0 B, free 912.3 MB)                                               
16/08/28 14:38:57 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012size: 1169.0 B, free: 912.3 MB)                                              

16/08/28 14:38:57 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)                                       
16/08/28 14:38:57 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks                                                                                                      
16/08/28 14:38:57 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sandbox.hortonworks.com, partition 1, PROCESS_LOCAL, 5411 bytes)                             

16/08/28 14:38:58 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 2 hostname: sandbox.hortonworks.com.                                        
16/08/28 14:38:58 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: sandbox.hortonworks.com.                                        
16/08/28 14:38:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox.hortonworks.com:41746 (size: 1169.0 B, free: 912.3 MB)                                

16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, sandbox.hortonworks.com, partition 2, PROCESS_LOCAL, 5411 bytes)                             
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 2 on executor id: 1 hostname: sandbox.hortonworks.com.                                        
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 3 on executor id: 2 hostname: sandbox.hortonworks.com.5411 bytes)                             

16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1084 ms on sandbox.hortonworks.com (1/10)                                                 
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1061 ms on sandbox.hortonworks.com (2/10)                                                 
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 4 on executor id: 1 hostname: sandbox.hortonworks.com.5411 bytes)                             

16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 88 ms on sandbox.hortonworks.com (3/10)                                                   
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, sandbox.hortonworks.com, partition 5, PROCESS_LOCAL, 5411 bytes)                             
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 101 ms on sandbox.hortonworks.com (4/10)works.com.                                        

16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, sandbox.hortonworks.com, partition 6, PROCESS_LOCAL, 5411 bytes)                             
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 6 on executor id: 1 hostname: sandbox.hortonworks.com.                                        
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, sandbox.hortonworks.com, partition 7, PROCESS_LOCAL, 5411 bytes)                             

16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 7 on executor id: 2 hostname: sandbox.hortonworks.com.                                        
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 48 ms on sandbox.hortonworks.com (6/10)                                                   
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 8 on executor id: 1 hostname: sandbox.hortonworks.com.5411 bytes)                             

16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 48 ms on sandbox.hortonworks.com (7/10)                                                   
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, sandbox.hortonworks.com, partition 9, PROCESS_LOCAL, 5411 bytes)                             
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 40 ms on sandbox.hortonworks.com (8/10)nworks.com.                                        

16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 38 ms on sandbox.hortonworks.com (9/10)                                                   
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 31 ms on sandbox.hortonworks.com (10/10)                                                  
16/08/28 14:38:59 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.293 s                                                                        

16/08/28 14:38:59 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.605653 s                                                                           
Pi is roughly 3.1418151418151417                                                                                                                                                     
16/08/28 14:38:59 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2d4aa15a{/stages/stage/kill,null,UNAVAILABLE}  

Spark-Submit in Yarn-cluster mode fails as per log here:

[root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cor
es 1 examples/jars/spark-examples*.jar 10                                                                                                                                            
16/08/28 14:41:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/28 14:41:08 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/28 14:41:08 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/08/28 14:41:09 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/08/28 14:41:09 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)
16/08/28 14:41:09 INFO yarn.Client: Will allocate AM container, with 2248 MB memory including 200 MB overhead
16/08/28 14:41:09 INFO yarn.Client: Setting up container launch context for our AM
16/08/28 14:41:09 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/28 14:41:09 INFO yarn.Client: Preparing resources for our AM container
16/08/28 14:41:09 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/28 14:41:10 INFO yarn.Client: Uploading resource file:/tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc/__spark_libs__4204158628332382181.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0002/__spark_libs__4204158628332382181.zip
16/08/28 14:41:11 INFO yarn.Client: Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/
.sparkStaging/application_1472394965674_0002/spark-examples_2.11-2.0.0.jar
16/08/28 14:41:12 INFO yarn.Client: Uploading resource file:/tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc/__spark_conf__2789110900476377363.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0002/__spark_conf__.zip
16/08/28 14:41:12 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/08/28 14:41:12 INFO spark.SecurityManager: Changing view acls to: root
16/08/28 14:41:12 INFO spark.SecurityManager: Changing modify acls to: root
16/08/28 14:41:12 INFO spark.SecurityManager: Changing view acls groups to: 
16/08/28 14:41:12 INFO spark.SecurityManager: Changing modify acls groups to: 
16/08/28 14:41:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(
); users  with modify permissions: Set(root); groups with modify permissions: Set()
16/08/28 14:41:12 INFO yarn.Client: Submitting application application_1472394965674_0002 to ResourceManager
16/08/28 14:41:12 INFO impl.YarnClientImpl: Submitted application application_1472394965674_0002
16/08/28 14:41:13 INFO yarn.Client: Application report for application_1472394965674_0002 (state: ACCEPTED)
16/08/28 14:41:13 INFO yarn.Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1472395272580
         final status: UNDEFINED
 tracking URL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0002/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0002/</a>
         user: root
16/08/28 14:41:14 INFO yarn.Client: Application report for application_1472394965674_0002 (state: ACCEPTED)
16/08/28 14:41:15 INFO yarn.Client: Application report for application_1472394965674_0002 (state: FAILED)
16/08/28 14:41:15 INFO yarn.Client: 
         client token: N/A
         diagnostics: Application application_1472394965674_0002 failed 2 times due to AM Container for appattempt_1472394965674_0002_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: <a href="http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002">http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002</a> Then click on links to logs of each att
empt.
Diagnostics: Exception from container-launch.
Container id: container_e17_1472394965674_0002_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/root/appcache/application_1472394965674_0002/container_e17_1472394965674_0002_02_000001/launch_container.sh: line 25: $PWD:$PWD/__spa
rk_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-
doop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework

/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/
hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution                                                                          
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/root/appcache/application_1472394965674_0002/container_e17_1472394965674_0002_02_000001/launch_container.sh: 

line 25: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:
/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-f
yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/

hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution                                                  

        at org.apache.hadoop.util.Shell.run(Shell.java:820)va:909)                                                                                                                   

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1099)                                                                                                
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)                                                     
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81))                                                        

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)                                                                                                                  
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)                                                                                           
        at java.lang.Thread.run(Thread.java:745)or$Worker.run(ThreadPoolExecutor.java:615)                                                                                           
Failing this attempt. Failing the application.                                                                                                                                       

         ApplicationMaster host: N/A                                                                                                                                                 
         ApplicationMaster RPC port: -1                                                                                                                                              
         start time: 1472395272580                                                                                                                                                   

         final status: FAILED                                                                                                                                                        
 tracking URL: <a href="http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002">http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002</a> 
16/08/28 14:41:15 INFO yarn.Client: Deleting staging directory hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1472394965674_0002                            

Exception in thread "main" org.apache.spark.SparkException: Application application_1472394965674_0002 finished with failed status                                                   
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)                                                                                                                
        at org.apache.spark.deploy.yarn.Client.main(Client.scala):1175)                                                                                                              

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                               
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)                                                                                             
        at java.lang.reflect.Method.invoke(Method.java:606)DelegatingMethodAccessorImpl.java:43)                                                                                     

        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:729)                                                                  
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)                                                                                                   
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)0)                                                                                                        

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)                                                                                                               
16/08/28 14:41:15 INFO util.ShutdownHookManager: Shutdown hook called                                                                                                                
[root@sandbox spark2-client]# utdownHookManager: Deleting directory /tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc                                                                  

Any help to resolve this would be appreciated.

In Spark-Shell mode, called with the following command:

[root@sandbox spark2-client]# ./bin/spark-shell --master yarn

I am encountering a LzoCodec not found error, as per log here:

[root@sandbox spark2-client]# ./bin/spark-shell --master yarn                                                                                                                        
Setting default log level to "WARN".
16/08/28 14:44:42 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.                                     

16/08/28 14:44:54 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.                                                                     
Spark context Web UI available at <a href="http://10.0.2.15:4041/">http://10.0.2.15:4041</a> 
Spark session available as 'spark'.ster = yarn, app id = application_1472394965674_0003).                                                                                            

Welcome to                                                                                                                                                                           
      ____              __                                                                                                                                                           
    _\ \/ _ \/ _ `/ __/  '_/                                                                                                                                                         

   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0                                                                                                                                          
      /_/                                                                                                                                                                            
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)

Type in expressions to have them evaluated.                                                                                                                                          
Type :help for more information.                                                                                                                                                     
scala> val file = sc.textFile("/tmp/data")                                                                                                                                           

file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24                                                                                   

java.lang.RuntimeException: Error in configuring object)).map(word => (word, 1)).reduceByKey(_ + _)                                                                                  

  at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)                                                                                                     
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)                                                                                                         
  at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:186).java:136)                                                                                                    

  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)                                                                                                               
  at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)                                                                                                             
  at scala.Option.getOrElse(Option.scala:121)ions$2.apply(RDD.scala:246)                                                                                                             

  at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)                                                                                                                              
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)                                                                                                  
  at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)                                                                                                             

  at scala.Option.getOrElse(Option.scala:121)                                                                                                                                        
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)                                                                                                                              
  at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)D.scala:35)                                                                                                  

  at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)                                                                                                             
  at scala.Option.getOrElse(Option.scala:121)                                                                                                                                        
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)                                                                                                  

  at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)                                                                                                             
  at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)                                                                                                             
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)                                                                                                                              

  at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)                                                                                                          
  at org.apache.spark.rdd.PairRDDFunctions$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:328)                                                                                  
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)tions.scala:328)                                                                                  

  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)                                                                                                  
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)                                                                                                                               
  ... 48 elided.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:327)                                                                                                   

Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.                         
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                                     
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                                           

  at java.lang.reflect.Method.invoke(Method.java:606)                                                                                                                                
  at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)                                                                                                     
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.                                                                      

  at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)                                                                         
  at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)                                                                                  
  ... 83 morehe.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)                                                                                                     

Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found                                                                                     
  at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)                                                                                                    
  ... 85 morehe.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)                                                                         
scala>    

Any help to resolve this would be appreciated.

Thanks.

Amit

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found

Contributor

Resolution for Spark Submit issue: add java-opts file in /usr/hdp/current/spark2-client/conf/

[root@sandbox conf]# cat java-opts                                                              
-Dhdp.version=2.5.0.0-817

Spark Submit working example:

[root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --ex
ecutor-cores 1 examples/jars/spark-examples*.jar 10                                                                                                                        
16/08/29 17:44:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable                        
16/08/29 17:44:58 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.                          
16/08/29 17:44:58 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050                                                             
16/08/29 17:44:58 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers                                                                          
16/08/29 17:44:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)             
16/08/29 17:44:58 INFO yarn.Client: Will allocate AM container, with 2248 MB memory including 200 MB overhead                                                              
16/08/29 17:44:58 INFO yarn.Client: Setting up container launch context for our AM                                                                                         
16/08/29 17:44:58 INFO yarn.Client: Setting up the launch environment for our AM container                                                                                 
16/08/29 17:44:58 INFO yarn.Client: Preparing resources for our AM container                                                                                               
16/08/29 17:44:58 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.                           
16/08/29 17:45:00 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_libs__3503948162159958877.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_libs__3503948162159958877.zip                                                                 
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar -> hdfs://sandbox.hortonworks.com:8020/
user/root/.sparkStaging/application_1472397144295_0006/spark-examples_2.11-2.0.0.jar                                                                                       
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_conf__4613069544481307021.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_conf__.zip                                                                                    
16/08/29 17:45:01 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode                                                                    
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls to: root                                                                                                  
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls to: root                                                                                                
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls groups to:                                                                                                
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls groups to:                                                                                              
16/08/29 17:45:01 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permiss
ions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()                                                                              
16/08/29 17:45:01 INFO yarn.Client: Submitting application application_1472397144295_0006 to ResourceManager                                                               
16/08/29 17:45:01 INFO impl.YarnClientImpl: Submitted application application_1472397144295_0006                                                                           
16/08/29 17:45:02 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:02 INFO yarn.Client:                                                                                                                                        
         client token: N/A                                                                                                                                                 
         diagnostics: AM container is launched, waiting for AM container to Register with RM                                                                               
         ApplicationMaster host: N/A                                                                                                                                       
         ApplicationMaster RPC port: -1                                                                                                                                    
         queue: default                                                                                                                                                    
         start time: 1472492701409                                                                                                                                         
         final status: UNDEFINED                                                                                                                                           
 tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ 
         user: root                                                                                                                                                        
16/08/29 17:45:03 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:04 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:05 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:06 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:06 INFO yarn.Client:                                                                                                                                        
         client token: N/A                                                                                                                                                 
         diagnostics: N/A                                                                                                                                                  
         ApplicationMaster host: 10.0.2.15                                                                                                                                 
         ApplicationMaster RPC port: 0                                                                                                                                     
         queue: default                                                                                                                                                    
         start time: 1472492701409                                                                                                                                         
         final status: UNDEFINED                                                                                                                                           
 tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ 
         user: root                                                                                                                                                        
16/08/29 17:45:07 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:08 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:09 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:10 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:11 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:12 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:13 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:14 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:15 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:16 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:17 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:18 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:19 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:20 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:21 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:22 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:23 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:24 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:25 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:26 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:27 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:28 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:29 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:30 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:31 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:32 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:33 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:34 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:35 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:36 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:37 INFO yarn.Client: Application report for application_1472397144295_0006 (state: FINISHED)                                                                
16/08/29 17:45:37 INFO yarn.Client:                                                                                                                                        
         client token: N/A                                                                                                                                                 
         diagnostics: N/A                                                                                                                                                  
         ApplicationMaster host: 10.0.2.15                                                                                                                                 
         ApplicationMaster RPC port: 0                                                                                                                                     
         queue: default                                                                                                                                                    
         start time: 1472492701409                                                                                                                                         
         final status: SUCCEEDED                                                                                                                                           
 tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ 
         user: root                                                                                                                                                        
16/08/29 17:45:37 INFO util.ShutdownHookManager: Shutdown hook called                                                                                                      
16/08/29 17:45:37 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b                                                        
[root@sandbox spark2-client]#                                                                                                                                              

Resolution for Spark Shell issue (lzo-codec): add the following 2 lines in your spark-defaults.conf

spark.driver.extraClassPath /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar                                                                            
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64                                            

Spark Shell working example:

[root@sandbox spark2-client]# ./bin/spark-shell --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-cores 1                              
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).                                                                                                                      
16/08/29 17:47:09 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.                           
16/08/29 17:47:21 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.                                                           
Spark context Web UI available at http://10.0.2.15:4041 
Spark context available as 'sc' (master = yarn, app id = application_1472397144295_0007).                                                                                  
Spark session available as 'spark'.                                                                                                                                        
Welcome to                                                                                                                                                                 
      ____              __                                                                                                                                                 
     / __/__  ___ _____/ /__                                                                                                                                               
    _\ \/ _ \/ _ `/ __/  '_/                                                                                                                                               
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0                                                                                                                                
      /_/                                                                                                                                                                  
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)                                                                                                      
Type in expressions to have them evaluated.                                                                                                                                
Type :help for more information.                                                                                                                                           

scala> sc.getConf.getAll.foreach(println)                                                                                                                                  
(spark.eventLog.enabled,true)                                                                                                                                              
(spark.yarn.scheduler.heartbeat.interval-ms,5000)                                                                                                                          
(hive.metastore.warehouse.dir,file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse)                                                                                            
(spark.repl.class.outputDir,/tmp/spark-fa16d4d3-8ec8-4b0e-a1da-5a2dffe39d08/repl-5dd28f29-ae03-4965-a535-18a95173b173)                                                     
(spark.yarn.am.extraJavaOptions,-Dhdp.version=2.5.0.0-817)                                                                                                                 
(spark.yarn.containerLauncherMaxThreads,25)                                                                                                                                
(spark.driver.extraJavaOptions,-Dhdp.version=2.5.0.0-817)                                                                                                                  
(spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)                                         
(spark.driver.appUIAddress,http://10.0.2.15:4041) 
(spark.driver.host,10.0.2.15)                                                                                                                                              
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0007) 
(spark.yarn.preserve.staging.files,false)                                                                                                                                  
(spark.home,/usr/hdp/current/spark2-client)                                                                                                                                
(spark.app.name,Spark shell)                                                                                                                                               
(spark.repl.class.uri,spark://10.0.2.15:37426/classes)                                                                                                                     
(spark.ui.port,4041)                                                                                                                                                       
(spark.yarn.max.executor.failures,3)                                                                                                                                       
(spark.submit.deployMode,client)                                                                                                                                           
(spark.yarn.executor.memoryOverhead,200)                                                                                                                                   
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)                                                                                              
(spark.driver.extraClassPath,/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar)                                                                          
(spark.executor.memory,2g)                                                                                                                                                 
(spark.yarn.driver.memoryOverhead,200)                                                                                                                                     
(spark.hadoop.yarn.timeline-service.enabled,false)                                                                                                                         
(spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native)                                                                                                
(spark.app.id,application_1472397144295_0007)                                                                                                                              
(spark.executor.id,driver)                                                                                                                                                 
(spark.yarn.queue,default)                                                                                                                                                 
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,sandbox.hortonworks.com)                                                               
(spark.eventLog.dir,hdfs:///spark-history)                                                                                                                                 
(spark.master,yarn)                                                                                                                                                        
(spark.driver.port,37426)                                                                                                                                                  
(spark.yarn.submit.file.replication,3)                                                                                                                                     
(spark.sql.catalogImplementation,hive)                                                                                                                                     
(spark.driver.memory,2g)                                                                                                                                                   
(spark.jars,)                                                                                                                                                              
(spark.executor.cores,1)                                                                                                                                                   

scala> val file = sc.textFile("/tmp/data")                                                                                                                                 
file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24                                                                         

scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)                                                                        
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26                                                                            

scala> counts.take(10)                                                                                                                                                     
res1: Array[(String, Int)] = Array((hadoop.tasklog.noKeepSplits=4,1), (log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.se
rver.resourcemanager.appsummary.logger},1), (Unless,1), (this,4), (hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log,1), (under,4), (log4j.appender.RFA.
layout.ConversionPattern=%d{ISO8601},2), (log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout,1), (AppSummaryLogging,1), (log4j.appender.RMAUDIT.layout=org.apac
he.log4j.PatternLayout,1))                                                                                                                                                 

scala>                                                                                                                                                                     


					
				
			
			
				
			
			
			
				

View solution in original post

2 REPLIES 2
Highlighted

Re: Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found

Contributor

Resolution for Spark Submit issue: add java-opts file in /usr/hdp/current/spark2-client/conf/

[root@sandbox conf]# cat java-opts                                                              
-Dhdp.version=2.5.0.0-817

Spark Submit working example:

[root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --ex
ecutor-cores 1 examples/jars/spark-examples*.jar 10                                                                                                                        
16/08/29 17:44:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable                        
16/08/29 17:44:58 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.                          
16/08/29 17:44:58 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050                                                             
16/08/29 17:44:58 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers                                                                          
16/08/29 17:44:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)             
16/08/29 17:44:58 INFO yarn.Client: Will allocate AM container, with 2248 MB memory including 200 MB overhead                                                              
16/08/29 17:44:58 INFO yarn.Client: Setting up container launch context for our AM                                                                                         
16/08/29 17:44:58 INFO yarn.Client: Setting up the launch environment for our AM container                                                                                 
16/08/29 17:44:58 INFO yarn.Client: Preparing resources for our AM container                                                                                               
16/08/29 17:44:58 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.                           
16/08/29 17:45:00 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_libs__3503948162159958877.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_libs__3503948162159958877.zip                                                                 
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar -> hdfs://sandbox.hortonworks.com:8020/
user/root/.sparkStaging/application_1472397144295_0006/spark-examples_2.11-2.0.0.jar                                                                                       
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_conf__4613069544481307021.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_conf__.zip                                                                                    
16/08/29 17:45:01 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode                                                                    
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls to: root                                                                                                  
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls to: root                                                                                                
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls groups to:                                                                                                
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls groups to:                                                                                              
16/08/29 17:45:01 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permiss
ions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()                                                                              
16/08/29 17:45:01 INFO yarn.Client: Submitting application application_1472397144295_0006 to ResourceManager                                                               
16/08/29 17:45:01 INFO impl.YarnClientImpl: Submitted application application_1472397144295_0006                                                                           
16/08/29 17:45:02 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:02 INFO yarn.Client:                                                                                                                                        
         client token: N/A                                                                                                                                                 
         diagnostics: AM container is launched, waiting for AM container to Register with RM                                                                               
         ApplicationMaster host: N/A                                                                                                                                       
         ApplicationMaster RPC port: -1                                                                                                                                    
         queue: default                                                                                                                                                    
         start time: 1472492701409                                                                                                                                         
         final status: UNDEFINED                                                                                                                                           
 tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ 
         user: root                                                                                                                                                        
16/08/29 17:45:03 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:04 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:05 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)                                                                
16/08/29 17:45:06 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:06 INFO yarn.Client:                                                                                                                                        
         client token: N/A                                                                                                                                                 
         diagnostics: N/A                                                                                                                                                  
         ApplicationMaster host: 10.0.2.15                                                                                                                                 
         ApplicationMaster RPC port: 0                                                                                                                                     
         queue: default                                                                                                                                                    
         start time: 1472492701409                                                                                                                                         
         final status: UNDEFINED                                                                                                                                           
 tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ 
         user: root                                                                                                                                                        
16/08/29 17:45:07 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:08 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:09 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:10 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:11 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:12 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:13 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:14 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:15 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:16 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:17 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:18 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:19 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:20 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:21 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:22 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:23 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:24 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:25 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:26 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:27 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:28 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:29 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:30 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:31 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:32 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:33 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:34 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:35 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:36 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)                                                                 
16/08/29 17:45:37 INFO yarn.Client: Application report for application_1472397144295_0006 (state: FINISHED)                                                                
16/08/29 17:45:37 INFO yarn.Client:                                                                                                                                        
         client token: N/A                                                                                                                                                 
         diagnostics: N/A                                                                                                                                                  
         ApplicationMaster host: 10.0.2.15                                                                                                                                 
         ApplicationMaster RPC port: 0                                                                                                                                     
         queue: default                                                                                                                                                    
         start time: 1472492701409                                                                                                                                         
         final status: SUCCEEDED                                                                                                                                           
 tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ 
         user: root                                                                                                                                                        
16/08/29 17:45:37 INFO util.ShutdownHookManager: Shutdown hook called                                                                                                      
16/08/29 17:45:37 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b                                                        
[root@sandbox spark2-client]#                                                                                                                                              

Resolution for Spark Shell issue (lzo-codec): add the following 2 lines in your spark-defaults.conf

spark.driver.extraClassPath /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar                                                                            
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64                                            

Spark Shell working example:

[root@sandbox spark2-client]# ./bin/spark-shell --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-cores 1                              
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).                                                                                                                      
16/08/29 17:47:09 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.                           
16/08/29 17:47:21 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.                                                           
Spark context Web UI available at http://10.0.2.15:4041 
Spark context available as 'sc' (master = yarn, app id = application_1472397144295_0007).                                                                                  
Spark session available as 'spark'.                                                                                                                                        
Welcome to                                                                                                                                                                 
      ____              __                                                                                                                                                 
     / __/__  ___ _____/ /__                                                                                                                                               
    _\ \/ _ \/ _ `/ __/  '_/                                                                                                                                               
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0                                                                                                                                
      /_/                                                                                                                                                                  
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)                                                                                                      
Type in expressions to have them evaluated.                                                                                                                                
Type :help for more information.                                                                                                                                           

scala> sc.getConf.getAll.foreach(println)                                                                                                                                  
(spark.eventLog.enabled,true)                                                                                                                                              
(spark.yarn.scheduler.heartbeat.interval-ms,5000)                                                                                                                          
(hive.metastore.warehouse.dir,file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse)                                                                                            
(spark.repl.class.outputDir,/tmp/spark-fa16d4d3-8ec8-4b0e-a1da-5a2dffe39d08/repl-5dd28f29-ae03-4965-a535-18a95173b173)                                                     
(spark.yarn.am.extraJavaOptions,-Dhdp.version=2.5.0.0-817)                                                                                                                 
(spark.yarn.containerLauncherMaxThreads,25)                                                                                                                                
(spark.driver.extraJavaOptions,-Dhdp.version=2.5.0.0-817)                                                                                                                  
(spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)                                         
(spark.driver.appUIAddress,http://10.0.2.15:4041) 
(spark.driver.host,10.0.2.15)                                                                                                                                              
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0007) 
(spark.yarn.preserve.staging.files,false)                                                                                                                                  
(spark.home,/usr/hdp/current/spark2-client)                                                                                                                                
(spark.app.name,Spark shell)                                                                                                                                               
(spark.repl.class.uri,spark://10.0.2.15:37426/classes)                                                                                                                     
(spark.ui.port,4041)                                                                                                                                                       
(spark.yarn.max.executor.failures,3)                                                                                                                                       
(spark.submit.deployMode,client)                                                                                                                                           
(spark.yarn.executor.memoryOverhead,200)                                                                                                                                   
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)                                                                                              
(spark.driver.extraClassPath,/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar)                                                                          
(spark.executor.memory,2g)                                                                                                                                                 
(spark.yarn.driver.memoryOverhead,200)                                                                                                                                     
(spark.hadoop.yarn.timeline-service.enabled,false)                                                                                                                         
(spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native)                                                                                                
(spark.app.id,application_1472397144295_0007)                                                                                                                              
(spark.executor.id,driver)                                                                                                                                                 
(spark.yarn.queue,default)                                                                                                                                                 
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,sandbox.hortonworks.com)                                                               
(spark.eventLog.dir,hdfs:///spark-history)                                                                                                                                 
(spark.master,yarn)                                                                                                                                                        
(spark.driver.port,37426)                                                                                                                                                  
(spark.yarn.submit.file.replication,3)                                                                                                                                     
(spark.sql.catalogImplementation,hive)                                                                                                                                     
(spark.driver.memory,2g)                                                                                                                                                   
(spark.jars,)                                                                                                                                                              
(spark.executor.cores,1)                                                                                                                                                   

scala> val file = sc.textFile("/tmp/data")                                                                                                                                 
file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24                                                                         

scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)                                                                        
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26                                                                            

scala> counts.take(10)                                                                                                                                                     
res1: Array[(String, Int)] = Array((hadoop.tasklog.noKeepSplits=4,1), (log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.se
rver.resourcemanager.appsummary.logger},1), (Unless,1), (this,4), (hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log,1), (under,4), (log4j.appender.RFA.
layout.ConversionPattern=%d{ISO8601},2), (log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout,1), (AppSummaryLogging,1), (log4j.appender.RMAUDIT.layout=org.apac
he.log4j.PatternLayout,1))                                                                                                                                                 

scala>                                                                                                                                                                     


					
				
			
			
				
			
			
			
				

View solution in original post

Re: Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found

Yep, this worked for me as well. Thanks.