Here is the Scenario:
I have data in My hive DB with 2 tables, I want to connect tableau to these 2 tables to build my reports.
We have a business requirement to truncate the table quite often and fetch the new reports with new data(The key data of few columns will remain the same but there are other changes which keeps happening to other columns and we want to visualize them), that can't be changed.
We have hortonworks cluster. Used Hive ODBC to connect to the tables and it all works fine except the performance.
When we used spark ODBC and connected through Spark-thrift, performance is far better than hive odbc.
But this have a problem, Whenever we truncate, load the new data into tables, tableau will fail with below errors:
[Microsoft][SparkODBC] (35) Error from server: error code: '0' error message:
'java.lang.IllegalArgumentException: orcFileOperator: path
hdfs://server:8020/HIVE/my.db/mytable/yearmonth=201702/daytimestamp=02070200 does not have valid orc files matching the pattern'. The table "[my].[mytable]" does not exist
Hive data stored under folder /HIVE directory in hdfs with yearmonth and day timestamp partitions.
We did try below workarounds to truncate tables but it doesn't help:
Created a dummy record in the table with a key "DELETEID" and execute below query:
insert overwrite table mytable PARTITION(yearmonth, daytimestamp) select * from mytable where my id = "DELETEID";
This will erase same timestamp records of the row "DELETEID" and doesn't affect after that.
Went ahead and removed files in HDFS,
"hdfs dfs -rm -R -skipTrash /HIVE/my.db/mytable/*"
After uploading data again and refresh the reports, it still refers to one of the OLD hdfs path of the table data and doesn't work.
Interesting things is with hive CLI i can see the table OR query the data and also through Hive view in ambari and also If I use hive ODBC in tableau but it fails consistently with above error when tried for tableau --> SparkODBC --> SparkThrift --> Hive connection
Im quite sure if we remove the partition it should work, but as the data grows partition becomes necessary.
Anyone faced similar problems with SparkODBC ? Please share suggestions.