Member since
07-07-2021
2
Posts
0
Kudos Received
0
Solutions
07-07-2021
04:34 PM
@Shelton Thank you for your response. I tried as you suggested but the table didn't show up in beeline. I checked in mysql metastore db as well. The new table info does not show up there. 21/07/07 23:07:56 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY I still see the above in the stderr file eventhough my hive-site.xml is set to use mysql. Not sure if I am missing something else. <name>hive.metastore.db.type</name> <value>mysql</value> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://<mysql end point>:3306/metastore_db</value>
... View more
07-07-2021
09:07 AM
I have a spark cluster with one master and two workers on different servers. I have copied hive-site.xml in spark conf on all 3 servers and started thrift server on master server pointing to spark master. Using this to connect beeline to thrift server and run spark sql queries. I created a scala application to load data from csv to dataframe and then to a spark sql table backed by S3 parquet. Metadata is in MySQL. I am using spark submit command to run the scala app on the cluster. I do not see the table created by the scala app in beeline. However, when I use spark shell on master connected to spark master and do the same loading of a csv file and creating a table, I can see it in beeline. Am I missing something with the scala app? Also do we need HDFS to make this work? hive.metastore.uris in hive-site.xml is not set currently. Not sure what I should set that to, since I dont have anything running on the 9083 port. Also I started thrift server(which runs on port 10000) from spark sbin directory like this : /opt/spark/sbin/start-thriftserver.sh --master spark://<master-ip>:7077 --total-executor-cores 1 This is my scala code: def main(args : Array[String]) {
println( "Hello World!" )
val warehouseLocation = "/home/ubuntu/test/hive_warehouse"
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
val df = spark.read.format("csv").load(args(0))
df.createOrReplaceTempView("my_temp_table")
spark.sql("create table test1_0522 location 's3a://<test-bucket>/data/test1_0522' stored as PARQUET as select * from my_temp_table")
spark.sql("SHOW TABLES").show()
}
... View more
Labels:
- Labels:
-
Apache Spark