Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

I can't find my tables in Spark SQL using Beeline

avatar
Contributor

I'm using the latest HDP (2.4) Sandbox. I have a class that attempts to establish a HiveThriftServer and write a table called cumulativeVitals. When I go to look for the table using Beeline, I can't find it.

Code:

package stlhug.streaming;


import java.util.ArrayList;
import java.sql.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;


import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;


import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.hive.HiveContext;
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka.KafkaUtils;
import stlhug.domain.VitalRecord;
import stlhug.streaming.function.ConvertStreamToDataFrame;
import stlhug.streaming.function.MapMessageToVital;


public class PatientVitalStream2 {


	@SuppressWarnings("deprecation")
	public static void main(String[] args) {


		String zkQuorum = args[1];
		String kafkaConsumerGroupId = args[2];


		SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("PatientVitals");
		JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));
		


		HiveContext sqlContext = new HiveContext(jssc.sparkContext().sc());
		sqlContext.setConf("hive.server2.thrift.port", "10001");
		
		
		initializeTable(jssc, sqlContext, "cumulativeVitals");
		sqlContext.sql("select * from cumulativeVitals").show();


		Map<String, Integer> map = new HashMap<String, Integer>();
		map.put("patientVitals", 1);
		JavaPairReceiverInputDStream<String, String> kafkaStream = KafkaUtils.createStream(jssc, zkQuorum,
				kafkaConsumerGroupId, map);


		JavaDStream<VitalRecord> vitals = kafkaStream.map(new MapMessageToVital());


		vitals.foreach(new ConvertStreamToDataFrame("cumulativeVitals"));
		
		HiveThriftServer2.startWithContext(sqlContext);
		jssc.start();
		jssc.awaitTermination();
	}


	//TODO: http://stackoverflow.com/questions/32362206/spark-dataframe-saveastable-with-partitionby-creates-no-...
	@SuppressWarnings("deprecation")
	private static void initializeTable(JavaStreamingContext jssc, HiveContext sqlContext, String tableName) {
		List<VitalRecord> records = new ArrayList<VitalRecord>();
		VitalRecord record = new VitalRecord(0, new Date(new java.util.Date().getTime()), 0,0,0,0);
		records.add(record);
		JavaRDD<VitalRecord> recordRDD = jssc.sparkContext().parallelize(records);
		DataFrame recordDF = sqlContext.createDataFrame(recordRDD, VitalRecord.class);
		recordDF.registerTempTable(tableName);
//		String fileName = "/home/"+new Random().nextInt(20000);
//		recordDF.saveAsParquetFile(fileName);
//		DataFrame parquetFile = sqlContext.parquetFile(fileName);
//		sqlContext.registerDataFrameAsTable(parquetFile, tableName);
		
	}


}



Start Beeline. Connect to Database:

[root@sandbox spark-client]# pwd
/usr/hdp/current/spark-client
[root@sandbox spark-client]# ./bin/beeline
Beeline version 1.6.0.2.4.0.0-169 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10001
Connecting to jdbc:hive2://localhost:10001
Enter username for jdbc:hive2://localhost:10001: root
Enter password for jdbc:hive2://localhost:10001: **********
16/04/22 22:19:10 INFO Utils: Supplied authorities: localhost:10001
16/04/22 22:19:10 INFO Utils: Resolved authority: localhost:10001
16/04/22 22:19:10 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10001
Connected to: Spark SQL (version 1.6.0)
Driver: Spark Project Core (version 1.6.0.2.4.0.0-169)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10001> show tables;
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+
| sample_07  | false        |
| sample_08  | false        |
+------------+--------------+--+
2 rows selected (0.305 seconds)
0: jdbc:hive2://localhost:10001> 

1 ACCEPTED SOLUTION

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
6 REPLIES 6

avatar

If you create a table via Spark/Beeline and you can see that table, but not the table that exist within Hive, that typically means Spark isn't configured to use Hive metastore.

Pl see this and can you verify that there is a hive-site.xml under spark/conf and it pointing to right host, port corresponding to Hive Meta store.

avatar
Contributor

Under /usr/hdp/current/spark-client/conf on the sandbox:

[root@sandbox conf]# cat hive-site.xml 
  <configuration> 
    <property> 
      <name>hive.metastore.uris</name> 
      <value>thrift://sandbox.hortonworks.com:9083</value> 
    </property> 
  </configuration>

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

This wasn't set up for me by default. It also wasn't mentioned in any of the documentation that I've reviewed.

Thanks for the help!

avatar
Expert Contributor

You need to have hive-site.xml with HMS(hive metastore service) properties under your SPARK_CONF_DIR

avatar
Contributor

This is set up by default on the HDP sandbox. See reply above.