Created 06-29-2017 01:39 PM
I have gone through below URL to understand how to load data into HIVE using spark in orc format. I understood how to create table in HIVE using spark howvere I have one question that how would spark identify that in which database this table should be created or if I have same table name in two different HIVE DB in which table spark is going to insert values
I have gone through below URL:
https://hortonworks.com/tutorial/using-hive-with-orc-from-apache-spark/
Created 06-29-2017 02:33 PM
In hive, if you do not specify the database name in your query then it will refer to the default database. The name of the default database itself is 'default'.
So the query in the URL you shared :
hiveContext.sql("create table yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc")
This will create yahoo_orc_table under default database.
If you want to create it in a specific database say 'hardikdatabase', then you must specify databasename.tablename as shown below (hardikdatabase.yahoo_orc_table):
hiveContext.sql("create table hardikdatabase.yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc")
This same rule applies when you want to read data from hive. You must specify the database in the same way unless it is the default database.
As always, if this answer helps you, please consider accepting it.
Created 06-29-2017 02:33 PM
In hive, if you do not specify the database name in your query then it will refer to the default database. The name of the default database itself is 'default'.
So the query in the URL you shared :
hiveContext.sql("create table yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc")
This will create yahoo_orc_table under default database.
If you want to create it in a specific database say 'hardikdatabase', then you must specify databasename.tablename as shown below (hardikdatabase.yahoo_orc_table):
hiveContext.sql("create table hardikdatabase.yahoo_orc_table (date STRING, open_price FLOAT, high_price FLOAT, low_price FLOAT, close_price FLOAT, volume INT, adj_price FLOAT) stored as orc")
This same rule applies when you want to read data from hive. You must specify the database in the same way unless it is the default database.
As always, if this answer helps you, please consider accepting it.
Created 06-30-2017 05:07 PM
Thanks , it helped a lot to clear my confusion.