Member since 
    
	
		
		
		09-24-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                98
            
            
                Posts
            
        
                76
            
            
                Kudos Received
            
        
                18
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3362 | 08-29-2016 04:42 PM | |
| 6358 | 08-09-2016 08:43 PM | |
| 2382 | 07-19-2016 04:08 PM | |
| 3019 | 07-07-2016 04:05 PM | |
| 8402 | 06-29-2016 08:25 PM | 
			
    
	
		
		
		06-27-2016
	
		
		04:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @alain TSAFACK   
	I think you need the --files option to pass the python script to all executor instances. So for example:  ./bin/spark-submit --class my.main.Class \
    --master yarn-cluster \
    --jars my-other-jar.jar,my-other-other-jar.jar
    --files return.py
    my-main-jar.jar
    app_arg1 app_arg2
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-24-2016
	
		
		09:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	I was able to run your example on the Hortonworks 2.4 Sandbox (slightly newer version than your 2.3.2). However, it appears you have drastically increased the memory requirements between your 2 examples. You only allocate 512m to the driver and executor in "yarn-client" mode, but allocate 4g and 2g in second example, plus by requesting 3 executors, you will need over 10 GB RAM. Here is the command I actually ran to replicate the "cluster" deploy mode: 
 ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --num-executors 1 --driver-memory 1024m --executor-memory 1024m --executor-cores 1 lib/spark-examples*.jar 10
  
	... and here is the result in the Yarn application logs: 
 Log Type: stdout
Log Upload Time: Fri Jun 24 21:19:42 +0000 2016
Log Length: 23
Pi is roughly 3.142752
  Therefore, it is possible your job never was submitted to the run queue since it required too many resources. Please make sure it was not stuck in the 'ACCEPTED' state from the ResourceManager UI. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2016
	
		
		06:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Agreed, you should at least upgrade the lower HDP version (...2.3.0...) to the newer HDP version (2.3.4.0-3485). It is best to get the default Spark version from the HDP install. Please see Table 1.1 at this link which describes the version associations for HDP, Ambari, and Spark:  http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_spark-guide/content/ch_introduction-spark.html   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-16-2016
	
		
		06:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Spark includes some Jackson libraries as it's own dependencies, including this one:     <fasterxml.jackson.version>2.6.5</fasterxml.jackson.version>  Therefore, if your additional third-party library also includes this library with a different version, then the classloader will get errors. You can use the Maven Shade plugin to "relocate" the third-party jar, as described here:  https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html  Here is an example of relocating the "com.fasterxml.jackson" library:  http://stackoverflow.com/questions/34764732/relocating-fastxml-jackson-classes-to-my-package-fastxml-jackson 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-06-2016
	
		
		07:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Timothy Spann Be aware that Henning's post, while architecturally sound, relies on the "Hive Streaming API", which infers reliance on Hive Transaction support. Current advice is not to rely on transactions, at least until the Hive LLAP TechPreview comes out end of June 2016. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-27-2016
	
		
		04:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Sean Glover  The Apache Spark download will allow you to build spark in multiple ways using various build flags to include/exclude components:  http://spark.apache.org/docs/latest/building-spark.html  Without Hive, you can still create a SQLContext, but it will be native to Spark and not leverage HiveContext. Without a HiveContext, you cannot reference the Hive Metastore, use Hive UDF's etc. Other tools like the Zeppelin data science notebook also default to creating a HiveContext (configurable) so it will need the Hive dependencies. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-25-2016
	
		
		01:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Actually, if you don't specify local mode (--master "local") then you will be running in Standalone mode described here:  
 Standalone mode: By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes. You can limit the number of nodes an application uses by setting the  spark.cores.max configuration property in it, or change the default for applications that don’t set this setting through  spark.deploy.defaultCores . Finally, in addition to controlling cores, each application’s  spark.executor.memory  setting controls its memory use.   Also, I think you have the port wrong for the Monitor web interface, try using port 4040 instead of 8080, like this:  http://<driver-node>:4040 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-24-2016
	
		
		04:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 If you are running with deploy mode = yarn (previously, master set to "yarn-client" or "yarn-cluster"), then you can discover the state of the spark job by bringing up the Yarn ResourceManager UI. In Ambari, select Yarn service from left-hand panel, choose "Quick Links", and click on "ResourceManager UI". It will open web page on port 8088.  Here is an example (click on 'Applications' in left panel to see all states):     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-23-2016
	
		
		06:27 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 FYI: Here is the quickest way to discover if you have access to your Hive "default" database tables:  val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val tables = sqlContext.sql("show tables")
tables.show()
tables: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean]
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
|sample_07|      false|
|sample_08|      false|
+---------+-----------+
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-23-2016
	
		
		06:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 The Spark History Server UI has a link at the bottom called "Show Incomplete Applications". Click on this link and it will show you the running jobs, like zeppelin (see image).     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













