Support Questions
Find answers, ask questions, and share your expertise

Spark Hivewarehouseconnector loading dynamic partitions in CDH-7.1.7-1

New Contributor

I'm loading the pyspark(spark-2.4 & python-3.7.4) data frame to Hive using Hivewarehouseconnector. Hive tables are partitioned and orc formatted. Data frame can contain multiple partition values. I need to load those partition values in single load and overwrite if that partition was already exists.

 

 

 

df.select(columns).write\
.format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR)\
.mode('overwrite')\
.option('inferSchema', 'true')\
.option('table','tablename')\
.option('partition','partition_column')\
.save()

 

 

 

Got the below error. I'm passing right partition column name.
Caused by: java.lang.IllegalArgumentException: Invalid partition spec: partition_column.

 

Even in append mode, it's not working. I'm getting same error.

 

How to accomplish the load? Kindly help.

 

1 ACCEPTED SOLUTION

Cloudera Employee

Hi @Rajmn ,

It's Bug in CDH 7.1.7, It's fixed in CDH 7.1.8.

Thanks, Anji

 

View solution in original post

3 REPLIES 3

Cloudera Employee

Hi @Rajmn ,

It's Bug in CDH 7.1.7, It's fixed in CDH 7.1.8.

Thanks, Anji

 

New Contributor

Hi @aranjireddy ,

We have the same issue. Is there any workaround?

Is CDH/CDP 7.1.8 available?

 

Thank you.

Cloudera Employee

Hi @nmartinez ,

CDP 7.1.8 is not available, Please contact Cloudera support for the PATCH.

Thanks,

Anji

; ;