Currently we use the "create table as select from temp_table" trick to store a dataframe into HIVE, to avoid storing a spark only readable table.
Now I am trying to use hivecontext.createExternalTable(...) to store a dataframe in HIVE metastore using an externally stored file, but I run in the well known issues again: The table is only readable by spark, even if we provide a schema
How to make the table usable outside a spark context (so proper sql queries using an ODBC connection, etc. etc.)?
FYI We use pyspark
Why do you need to create your table from Spark?
Typically you can create your tables using hive query and deploy them separately outside of the Spark job. And then insert your data into the tables using Spark!
An example could be:
CREATE TABLE your_table_name (col1 String, col2 Timestamp) STORED AS ORC
Have created table using above command with saveMode.Ignore(as Want to create new table), I can see using command 'hadoop fs -ls /apps/hive/warehouse\test.db' where test is my database I can see
drwxr-xr-x - psudhir hdfs 0 2016-01-04 05:02 /apps/hive/warehouse/test.db/myTableName
but unale to view or load those tables, i.e when I use HiveContext("showtables") I only can view old tables , new tables are not visible and I cannot access them from my HiveContext object