Support Questions

Find answers, ask questions, and share your expertise

Help to start spark with no errors

avatar
Rising Star

Hi, Im trying to execute queries with Spark SQL over hive tables stored in hdfs single node, but Im with some problems to start spark correctly. I already have hadoop and hive installed and already created the tables with hive with the data stored in hdfs.

I will say what is my hadoop and hive configuration, and hope that someone there already try to execute queries with spark over hive tables and can give a help, and can say what are the step to install spark correctly for this purpose.

I installed hadoop-2.7.1, I extract the files add the environment variables and configured core-site.xml and hdfs-site.xml.

core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Then format the namenode with:

hadoop namenode -format

Then I start hadoop with:

./start-yarn.sh
./start-dfs.sh

And it seems that everything works:

[hadoopdadmin@hadoop sbin]$ jps
9601 NameNode
9699 DataNode
10003 Jps
9091 ResourceManager
9894 SecondaryNameNode
9191 NodeManager

Then after hadoop installed I download hive 1.2.1 and just extract the files and add the environment variables.

The .bashrc file is like this now:

export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
export HADOOP_HOME=/usr/local/hadoop-2.7.1
export HIVE_HOME=/usr/local/apache-hive-1.2.1-bin
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin

To start Hive I just write hive and it seems that works:

[hadoopadmin@hadoopSingleNode ~]$ hive
Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties

hive> 

I have some tables in hive that I create with this command:

create  table customer (C_CUSTKEY INT, C_NAME STRING, C_ADDRESS 
STRING, C_NATIONKEY INT, C_PHONE STRING, C_ACCTBAL DOUBLE, C_MKTSEGMENT 
STRING, C_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
STORED AS TEXTFILE LOCATION '/tables/customer';

Now its time to install spark to query this hive tabes. What Im doing is just download this version "http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz", extract the files and configure environment variables. After this with spark-shell Im getting a lot of errors.

I already try a lot of things but nothing is working to fix the issues, so someone can see what is not ok in my configurations step or what is missing here?

Errors that are appearing after execute spark-shell command:

3157-img1.png

3158-img2.png

1 ACCEPTED SOLUTION

avatar
Guru

@John Cod

Spark shell attempts to start a SQL Context by default. The first thing I would check is whether you are pointing Spark at your existing Hive meta store. In your {SPARK_HOME}/conf folder you should have a hive-site.xml file. Make sure you have the following configuraiton:

<property>
      <name>hive.metastore.uris</name>
      <value>thrift://{IP of meta store host}:{port meta store listening}</value>
</property>

This should tell Spark Shell to connect to your existing meta store instead of trying to create a default, which is what it looks like it is trying to do. The SQL context should now be able to start up and you should be able to access Hive by using the default SQLContext.

val result = sqlContext.sql("SELECT * FROM {hive table name}")
result.show

If the Hive Context was not created by default then do this and retry the query:

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

View solution in original post

2 REPLIES 2

avatar
Guru

@John Cod

Spark shell attempts to start a SQL Context by default. The first thing I would check is whether you are pointing Spark at your existing Hive meta store. In your {SPARK_HOME}/conf folder you should have a hive-site.xml file. Make sure you have the following configuraiton:

<property>
      <name>hive.metastore.uris</name>
      <value>thrift://{IP of meta store host}:{port meta store listening}</value>
</property>

This should tell Spark Shell to connect to your existing meta store instead of trying to create a default, which is what it looks like it is trying to do. The SQL context should now be able to start up and you should be able to access Hive by using the default SQLContext.

val result = sqlContext.sql("SELECT * FROM {hive table name}")
result.show

If the Hive Context was not created by default then do this and retry the query:

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

avatar
Rising Star

Thank you really. Now it is working! It is just showing some warnings about "version information not found in metastore..." and "failed to get database default returning NoSuchObjectException". But as they are warnings should be working fine, right?