Member since 
    
	
		
		
		01-23-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                114
            
            
                Posts
            
        
                19
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2813 | 03-26-2018 04:53 AM | |
| 31317 | 12-01-2017 07:15 AM | |
| 1261 | 11-28-2016 11:30 AM | |
| 2192 | 10-25-2016 11:26 AM | 
			
    
	
		
		
		05-23-2018
	
		
		02:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 This article discuss the process related to Oozie Manual Sharelib update and the prerequisites for Spark Oozie Sharelib  Copy the sharelib to a local directory   
# mkdir oozie_share_lib 
# hadoop fs -copyToLocal <current-share-lib-directory> oozie_share_lib/lib  To update oozie sharelib once the existing oozie sharelib copied from HDFS to local as above:  /usr/hdp/current/oozie-client/bin/oozie-setup.sh sharelib create -fs /user/oozie/share/lib/ -locallib oozie_share_lib/  This will create a new sharelib including SPARK Oozie sharelib:  the destination path for sharelib is: /user/oozie/share/lib/lib_20180502070613
Fixing oozie spark sharelib
Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark
Renaming spark to spark_orig in /user/oozie/share/lib/lib_20180502070613
Creating new  spark directory in /user/oozie/share/lib/lib_20180502070613
Copying Oozie spark sharelib jar to /user/oozie/share/lib/lib_20180502070613/spark
Copying oozie_share_lib/lib/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark libraries to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark python libraries to /user/oozie/share/lib/lib_20180502070613/spark
Copying local spark hive site to /user/oozie/share/lib/lib_20180502070613/spark  But from the corresponding HDFS folder we can see that the spark lib's were not added to the SPARK Oozie share lib:  $ hadoop fs -ls /user/oozie/share/lib/lib_20180502070613/spark
Found 1 items
-rwxrwxrwx   3 oozie hadoop  191121639 2018-05-02 07:18 /user/oozie/share/lib/lib_20180502070613/spark/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar  It means Oozie Sharelib update is not working as expected for SPARK, even though it shows Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark   But the spark client was not installed on the node from where oozie sharelib update command was run no-spark-client-installed.png  And from the node where the SPARK-CLIENT installed OOZIE Sharelib update  does properly update the Spark Oozie Share Lib:  the destination path for sharelib is: /user/oozie/share/lib/lib_20180502064112
Fixing oozie spark sharelib
Spark is locally installed at /usr/hdp/2.6.3.0-235/oozie/../spark
Renaming spark to spark_orig in /user/oozie/share/lib/lib_20180502064112
Creating new  spark directory in /user/oozie/share/lib/lib_20180502064112
Copying Oozie spark sharelib jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying oozie-new-sharelib/lib/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying local spark libraries to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-examples-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-core-3.2.10.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/lib/spark-hdp-assembly.jar
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-rdbms-3.2.9.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/lib/datanucleus-api-jdo-3.2.6.jar to /user/oozie/share/lib/lib_20180502064112/spark
Copying local spark python libraries to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/pyspark.zip to /user/oozie/share/lib/lib_20180502064112/spark
Copying /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/py4j-0.9-src.zip to /user/oozie/share/lib/lib_20180502064112/spark
Ignoring file /usr/hdp/2.6.3.0-235/oozie/../spark/python/lib/PY4J_LICENSE.txt
Copying local spark hive site to /user/oozie/share/lib/lib_20180502064112/spark
Copying /etc/spark/conf/hive-site.xml to /user/oozie/share/lib/lib_20180502064112/spark  From here we can see that Oozie is able to pick up the files from /usr/hdp/2.6.3.0-235/spark/conf/  to HDFS /user/oozie/share/lib/lib_20180502064112/spark where we have the spark-client installed spark-client-installed.png  $ hadoop fs -ls /user/oozie/share/lib/lib_20180502064112/spark
Found 8 items
-rw-r--r--   3 oozie hdfs     339666 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-api-jdo-3.2.6.jar
-rw-r--r--   3 oozie hdfs    1890075 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-core-3.2.10.jar
-rw-r--r--   3 oozie hdfs    1809447 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/datanucleus-rdbms-3.2.9.jar
-rw-r--r--   3 oozie hdfs       1918 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/hive-site.xml
-rw-r--r--   3 oozie hdfs      23278 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/oozie-sharelib-spark-4.2.0.2.6.3.0-235.jar
-rw-r--r--   3 oozie hdfs      44846 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/py4j-0.9-src.zip
-rw-r--r--   3 oozie hdfs     358253 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/pyspark.zip
-rw-r--r--   3 oozie hdfs  191121639 2018-05-02 06:41 /user/oozie/share/lib/lib_20180502064112/spark/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar  With this, to have properly updated Spark Oozie share lib we need to have Spark client to be installed from the node/server where we are running the Oozie Share lib update manually. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		05-03-2018
	
		
		05:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Satya
 P
 The error:  StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.  and the --master spark://111.33.22.111:50070   Any specific reason to use NN port 50070 instead of Spark related ports?  Thanks  Venkat 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-20-2018
	
		
		11:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This has been identified as a BUG in SPARK 2.2. which is fixed in SPARK 2.3 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-18-2018
	
		
		01:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @heta desai  
	you can use the parameters based on your environment and here is the details that gives details about LDAP error codes.  Thanks  Venkat 
  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-18-2018
	
		
		11:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@heta desai This is what we use for add.ldif:  dn: CN=<username>,OU=prod1,OU=Hadoop,OU=Users,OU=UK,DC=global,DC=org
changetype: add
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
distinguishedName: CN=<username>,OU=prod1,OU=Hadoop,OU=Users,OU=UK,DC=global,DC=org
cn: <username>
userAccountControl: 514
unicodePwd::IgBTAHQAYQBnAGkAbgBnAEAAMgAwADEANwAiAA==
accountExpires: 0
userPrincipalName: <username>@GLOBAL.ORG  This works for us. Please check your DN's, OU's and the corresponding objectClass to be specified, these are entirely environment dependent.  Thanks  Venkat 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-18-2018
	
		
		10:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@heta desai Can you please check the ldapsearch with the same user you are trying to connect and the same OU is working?  Thanks  Venkat 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-17-2018
	
		
		02:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Kiran Nittala   --files and --conf spark.yarn.dist.files both works, any specific reason we have to pass these parameters even though the files hive-site.xml and hbase-site.xml from /etc/spark2/conf  Thanks  Venkat 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-17-2018
	
		
		02:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Vinod K C  I haven't come across any document but from the HDP installation you can find it from: /etc/spark2/conf/spark-env.sh  # Options read in YARN client mode
#SPARK_EXECUTOR_INSTANCES="2" #Number of workers to start (Default: 2)
#SPARK_EXECUTOR_CORES="1" #Number of cores for the workers (Default: 1).
#SPARK_EXECUTOR_MEMORY="1G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
#SPARK_DRIVER_MEMORY="512M" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
#SPARK_YARN_APP_NAME="spark" #The name of your application (Default: Spark)
#SPARK_YARN_QUEUE="default" #The hadoop queue to use for allocation requests (Default: default)
#SPARK_YARN_DIST_FILES="" #Comma separated list of files to be distributed with the job.
#SPARK_YARN_DIST_ARCHIVES="" #Comma separated list of archives to be distributed with the job.  But this says only YARN CLIENT mode.  And the job is not picking up the files available in /etc/spark2/conf as well.  Thanks  Venkat 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-13-2018
	
		
		11:37 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Rohit Khose As i have given --files does work, but when the file is given as part of SPARK_YARN_DIST_FILES and also the files are available in /etc/spark2/conf/hive-site.xml spark should be able to pick it up these any specific reason that this is not getting picked up?  Thanks  Venkat 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-13-2018
	
		
		11:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 We are on HDP 2.6.3 and using SPARK 2.2 and running the job using on YARN CLUSTER mode.  using spark-submit and the spark-env.sh contains SPARK_YARN_DIST_FILES="/etc/spark2/conf/hive-site.xml,/etc/spark2/conf/hbase-site.xml"   but these values are not honored.  spark-submit --class com.virtuslab.sparksql.MainClass  --master yarn --deploy-mode cluster /tmp/spark-hive-test/spark_sql_under_the_hood-spark2.2.0.jar  This is trying to connect to Hive and fetch the data from a table, but it fails with table on not found in database:   diagnostics: User class threw exception: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'xyz' not found in database 'qwerty';
         ApplicationMaster host: 121.121.121.121
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1523616607943
         final status: FAILED
         tracking URL: https://managenode002xxserver:8090/proxy/application_1523374609937_10224/
         user: abc123
Exception in thread "main" org.apache.spark.SparkException: Application application_1523374609937_10224 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:782)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)  The same works when we pass the --files parameter:  spark-submit --class com.virtuslab.sparksql.MainClass  --master yarn --deploy-mode cluster --files /etc/spark2/conf/hive-site.xml /tmp/spark-hive-test/spark_sql_under_the_hood-spark2.2.0.jar  Result attached.  Any pointers why it is not using picking up SPARK_YARN_DIST_FILES?  Thanks  Venkat 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
- 
						
							
		
			Apache YARN
 
        













