Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Hivewarehouseconnector loading dynamic partitions in CDH-7.1.7-1

avatar
New Contributor

I'm loading the pyspark(spark-2.4 & python-3.7.4) data frame to Hive using Hivewarehouseconnector. Hive tables are partitioned and orc formatted. Data frame can contain multiple partition values. I need to load those partition values in single load and overwrite if that partition was already exists.

 

 

 

df.select(columns).write\
.format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR)\
.mode('overwrite')\
.option('inferSchema', 'true')\
.option('table','tablename')\
.option('partition','partition_column')\
.save()

 

 

 

Got the below error. I'm passing right partition column name.
Caused by: java.lang.IllegalArgumentException: Invalid partition spec: partition_column.

 

Even in append mode, it's not working. I'm getting same error.

 

How to accomplish the load? Kindly help.

 

1 ACCEPTED SOLUTION

avatar
Contributor

Hi @Rajmn ,

It's Bug in CDH 7.1.7, It's fixed in CDH 7.1.8.

Thanks, Anji

 

View solution in original post

3 REPLIES 3

avatar
Contributor

Hi @Rajmn ,

It's Bug in CDH 7.1.7, It's fixed in CDH 7.1.8.

Thanks, Anji

 

avatar
New Contributor

Hi @aranjireddy ,

We have the same issue. Is there any workaround?

Is CDH/CDP 7.1.8 available?

 

Thank you.

avatar
Contributor

Hi @nmartinez ,

CDP 7.1.8 is not available, Please contact Cloudera support for the PATCH.

Thanks,

Anji