Member since
02-21-2017
8
Posts
2
Kudos Received
0
Solutions
04-18-2017
09:59 AM
1 Kudo
Hi @chitrartha sur, I resolved issue with Hadoop home. Here are steps that I did: Download spark binary package
from here http://spark.apache.org/downloads.html Unpack spark zip, in my case it
was spark-2.0.2-bin-hadoop2.7 In eclipse project add new
library with all spark jars(taken from spark-2.0.2-bin-hadoop2.7/jars) Copy hdfs-site.xml, core-site.xml
and yarn-site.xm from cluster and put them under src/main/resources In hdfs-site.xml define next
property hdfs-site.xml: <property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property> In configuration of main
class add SPARK_HOME environment variable: SPARK_HOME=D:/spark-2.0.2-bin-hadoop2.7 In
C:/Windows/System32/drivers/etc/hosts file add line with IP address of
Hadoop sandbox and hostname.E.g. 192.168.144.133 sandbox.hortonworks.com Then code goes: SparkConf conf = new
SparkConf();
conf.set("spark.master",
"yarn-client");
conf.set("spark.local.ip","IP_OF_SANDBOX") ;
conf.set("spark.driver.host","IP_OF_MY_LOCAL_WINDOWS_MACHINE");
conf.set("spark.sql.hive.metastore.jars", "builtin");
conf.setAppName("Application name");
this.sparkSession =
SparkSession.builder().config(conf).getOrCreate(); System.setProperty("HADOOP_USER_NAME", "root");
System.setProperty("SPARK_YARN_MODE", "yarn"); Those steps were enough to connect with cluster. But I stuck at the step of submitting spark jobs. Spark got a ping of me, start running, but them just hang with status: ACCEPTED. And it lasts forever.
... View more
02-22-2017
04:58 PM
Hi @Jan J If you have already some cluster with Hive tables in it you don't need to create those tables with Spark once more. You can just connect to existing. Please try next: 1. Pack your code in jar file and move somewhere to your cluster. Make Hive query calls from SparkSession.sql("YOUR_QUERY"). 2. run spark-submit tool with 'driver-java-options' set to local metastore --driver-java-options "-Dhive.metastore.uris=thrift://localhost:9083" Best regards, Olga
... View more
02-22-2017
02:45 PM
Hi all, I changed configuration to next SparkConf conf = new SparkConf();
conf.set("spark.master", "yarn-client");
conf.set("spark.local.ip","192.168.144.133") ;
conf.set("spark.driver.host","localhost");
conf.set("spark.sql.hive.metastore.jars", "builtin");
conf.setAppName("Data Analyzer");
this.sparkSession = SparkSession.builder().config(conf).getOrCreate(); and updated dependencies in pom.xml with spark-yarn.jar. In pom.xml I have next dependencies: <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>2.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.10</artifactId>
<version>2.0.1</version>
<scope>provided</scope>
</dependency> I keep core-site.xml/hdf-site.xml/yarn-site.xml in src/main/resources folder. And now have another issue. Here is the stacktrace: at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:225)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:250)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.yarn.conf.YarnConfiguration.<clinit>(YarnConfiguration.java:345)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.newConfiguration(YarnSparkHadoopUtil.scala:71)
at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:54)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.<init>(YarnSparkHadoopUtil.scala:56)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:414)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:420)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) Also
I noticed that if I remove Hadoop config files from src/main/resources
the application behaves the same way. So it seems to me like application
ignores them. Should I put them in another folder? Best regards, Olga
... View more
02-21-2017
04:25 PM
Hi Adnan, Thanks a lot for sharing info. Currently I can't move our project to CueSheet, but nevertheless interesting to know it. Best regards, Olga
... View more
02-21-2017
10:12 AM
1 Kudo
Hi all, We have Spark application written on Java that uses
yarn-client mode. We build application into jar file and then run it on
cluster with spark-submit tool. It works fine and everything is running
well on cluster. But it is not very easy to test our
application directly on cluster. For each even small change I have to
create jar file and push it inside the cluster. That's why I would like
to run application from my Eclipse(exists on Windows) against cluster remotely. I use spark-sql_2.11 module and instantiate SparkSession as next: SparkSession.builder().appName("Data
Analyzer").master("yarn-client").config("spark.sql.hive.metastore.jars",
"builtin").getOrCreate(); Also I copied core-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xm from my test cluster(HDP 2.5) and put them into classpath. But when running application from Eclipse I got next errors: org.apache.spark.SparkException: Unable to load YARN support
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:417)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412)
at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:420)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) Seems like my application can't find YARN jars/configs. Could you please help me to understand what I'm doing wrong? Is
it possible to run Java application with yarn-client mode from Eclipse
remotely to the cluster? And what steps should we follow to make it
working? It would be great if you can share your ideas or give me some hints how to overcome this issue. Best regards, Olga
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN