Member since 
    
	
		
		
		02-21-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                8
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		04-18-2017
	
		
		09:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @chitrartha sur,  I resolved issue with Hadoop home. Here are steps that I did:   Download spark binary package
     from here http://spark.apache.org/downloads.html  Unpack spark zip, in my case it
     was spark-2.0.2-bin-hadoop2.7  In eclipse project add new
     library with all spark jars(taken from spark-2.0.2-bin-hadoop2.7/jars)  Copy hdfs-site.xml, core-site.xml
     and yarn-site.xm from cluster and put them under src/main/resources  In hdfs-site.xml define next
     property hdfs-site.xml: <property>
 
<name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  </property>   In configuration of main
class add SPARK_HOME environment variable: SPARK_HOME=D:/spark-2.0.2-bin-hadoop2.7  In
     C:/Windows/System32/drivers/etc/hosts file add line with IP address of
     Hadoop sandbox and hostname.E.g. 192.168.144.133 sandbox.hortonworks.com    Then code goes: SparkConf conf = new
     SparkConf();
           conf.set("spark.master",
     "yarn-client");
          
     conf.set("spark.local.ip","IP_OF_SANDBOX") ;
          
     conf.set("spark.driver.host","IP_OF_MY_LOCAL_WINDOWS_MACHINE");
          
     conf.set("spark.sql.hive.metastore.jars", "builtin");
           conf.setAppName("Application name");
          this.sparkSession =
     SparkSession.builder().config(conf).getOrCreate();    System.setProperty("HADOOP_USER_NAME", "root");
        
      System.setProperty("SPARK_YARN_MODE", "yarn");    Those steps were enough to connect with cluster. But I stuck at the step of submitting spark jobs. Spark got a ping of me, start running, but them just hang with status: ACCEPTED. And it lasts forever. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-22-2017
	
		
		04:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Jan J   If you have already some cluster with Hive tables  in it you don't need to create those tables with Spark once more.   You can just connect to existing. Please try next:  1. Pack your code in jar file and move somewhere to your cluster. Make Hive query calls from SparkSession.sql("YOUR_QUERY").  2. run spark-submit tool with 'driver-java-options' set to local metastore  --driver-java-options "-Dhive.metastore.uris=thrift://localhost:9083"  Best regards,  Olga 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-22-2017
	
		
		02:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi all,  I changed configuration to next  SparkConf conf = new SparkConf();
        conf.set("spark.master", "yarn-client");
        conf.set("spark.local.ip","192.168.144.133") ;
        conf.set("spark.driver.host","localhost");
        conf.set("spark.sql.hive.metastore.jars", "builtin");
        conf.setAppName("Data Analyzer");
        this.sparkSession = SparkSession.builder().config(conf).getOrCreate();  and updated dependencies in pom.xml with spark-yarn.jar. In pom.xml I have next dependencies:         <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.10</artifactId>
            <version>2.0.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-yarn_2.10</artifactId>
            <version>2.0.1</version>
            <scope>provided</scope>
        </dependency>  I keep core-site.xml/hdf-site.xml/yarn-site.xml in src/main/resources folder. And now have another issue.  Here is the stacktrace:  at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:225)
   at org.apache.hadoop.util.Shell.<clinit>(Shell.java:250)
   at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
   at org.apache.hadoop.yarn.conf.YarnConfiguration.<clinit>(YarnConfiguration.java:345)
   at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.newConfiguration(YarnSparkHadoopUtil.scala:71)
   at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:54)
   at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.<init>(YarnSparkHadoopUtil.scala:56)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   at java.lang.Class.newInstance(Class.java:442)
   at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:414)
   at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412)
   at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412)
   at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437)
   at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223)
   at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
   at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
   at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
   at org.apache.spark.SparkContext.<init>(SparkContext.scala:420)
   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)  Also
 I noticed that if I remove Hadoop config files from src/main/resources 
the application behaves the same way. So it seems to me like application
 ignores them. Should I put them in another folder?   Best regards,  Olga 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-21-2017
	
		
		04:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Adnan,   Thanks a lot for sharing info. Currently I can't move our project to CueSheet, but nevertheless interesting to know it.  Best regards,  Olga 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-21-2017
	
		
		10:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi all,  We have Spark application written on Java that uses 
yarn-client mode. We build application into jar file and then run it on 
cluster with spark-submit tool. It works fine and everything is running 
well on cluster.  But it is not very easy to test our 
application directly on cluster. For each even small change I have to 
create jar file and push it inside the cluster. That's why I would like 
to run application from my Eclipse(exists on Windows) against cluster remotely.  I use spark-sql_2.11 module and instantiate SparkSession as next:  SparkSession.builder().appName("Data
 
Analyzer").master("yarn-client").config("spark.sql.hive.metastore.jars",
 "builtin").getOrCreate();   Also I copied core-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xm from my test cluster(HDP 2.5) and put them into classpath.  But when running application from Eclipse I got next errors:  org.apache.spark.SparkException: Unable to load YARN support
   at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:417)
   at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412)
   at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412)
   at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437)
   at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223)
   at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
   at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
   at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
   at org.apache.spark.SparkContext.<init>(SparkContext.scala:420)
   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)
   at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
   at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
   at scala.Option.getOrElse(Option.scala:121)
   at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)  Seems like my application can't find YARN jars/configs.   Could you please help me to understand what I'm doing wrong? Is
 it possible to run Java application with yarn-client mode from Eclipse 
remotely to the cluster?  And what steps should we follow to make it 
working?  It would be great if you can share your ideas or give me some hints how to overcome this issue.  Best regards,  Olga 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 - 
						
							
		
			Apache YARN