After few days of battling with new HDP 3.0 to be able to get data from hiveserver2, I think I really need all the help I can get ! Here goes: - I understand (now!) that in HDP 3.0, there are 2 'default' database, 1 in spark, 1 in hive. - I applied the "cp /etc/hive/conf/hive-site.xml /etc/spark2/conf" move so that I can write table in the hive warehouse from spark (pyspark in zeppelin, in fact).
Based on that, the following is happening: when I create tables from zeppelin pyspark, The tables are created in the spark warehouse AND in the hive warehouse BUT with no data:
sql_create_table = "CREATE TABLE IF NOT EXISTS car_rental (banner STRING, store STRING, start_date DATE, return_date DATE, car_sipp STRING, car_type STRING) STORED AS ORC TBLPROPERTIES ('transactional' = 'true')"
table_create = hiveContext.sql(sql_create_table)
With this code, I really find a table "car_rental' in /warehouse/tablespace/managed/hive/ with all rows from the df 'cars' BUT no data are saved under the dir 'car_rental'. However, the data is perfect in /apps/hive/warehouse.....
so now: I'm perfectly content with the data in /apps/hive/warehouse BUT:
1. When doing a basic 'select * from car_rental' in Dataa analytics studio: no rows return (it's probably searching in the default DB of /warehouse/tablespace/managed/hive/
2. MUCH MORE IMPORTANTLY, same thing happen when querying the data externally with hiveserver2 (table is empty).
Please help.... I need to be able to access full pyspark saved tables from hiveserver2 urgently!