Support Questions
Find answers, ask questions, and share your expertise

HIVE_WAREHOUSE_CONNECTOR: Saving spark dataframe to Hive >> Duplicates

New Contributor

Hey,

 

Trying to write a dataframe using HIVE_WAREHOUSE_CONNECTOR using the below.

new_df.write.format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR).mode("overwrite").option("table","dbname.tab_name").save()

 

But i find it creates 8 taks irrespective loops 8 times to create same record 8 times. So If i have 10 records in my dataframe, then the final table which gets saved using the save() has 80 records !

 

I think am missing something but could not find anything. !

 

>>> new_df=hive.executeQuery("select * from testdb.d_emp_ext_orc limit 1")

>>> new_df.count()

1

>>> new_df.write.format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR).mode("overwrite").option("table","bench.newtab").save

>>> hive.executeQuery("select count(*) from bench.newtab").show()

19/08/21 10:08:25 WARN TaskSetManager: Stage 24 contains a task of very large size (458 KB). The maximum recommended task size is 100 KB.