Created on 11-07-2021 04:23 AM - edited 11-07-2021 05:01 AM
I'm loading the pyspark(spark-2.4 & python-3.7.4) data frame to Hive using Hivewarehouseconnector. Hive tables are partitioned and orc formatted. Data frame can contain multiple partition values. I need to load those partition values in single load and overwrite if that partition was already exists.
df.select(columns).write\
.format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR)\
.mode('overwrite')\
.option('inferSchema', 'true')\
.option('table','tablename')\
.option('partition','partition_column')\
.save()
Got the below error. I'm passing right partition column name.
Caused by: java.lang.IllegalArgumentException: Invalid partition spec: partition_column.
Even in append mode, it's not working. I'm getting same error.
How to accomplish the load? Kindly help.
Created 11-25-2021 10:53 AM
Created 11-25-2021 10:53 AM
Created 12-08-2021 01:11 PM
Hi @aranjireddy ,
We have the same issue. Is there any workaround?
Is CDH/CDP 7.1.8 available?
Thank you.
Created 12-08-2021 01:28 PM
Hi @nmartinez ,
CDP 7.1.8 is not available, Please contact Cloudera support for the PATCH.
Thanks,
Anji