Reply
New Contributor
Posts: 4
Registered: ‎11-05-2017

load a dataframe into partitioned Hive table

 

I have an existing table with data which is already partitioned and I am inserting data from a dataframe (created from avro file) during nightly batch.
While inserting, do I need to partition dataframe with same columns (as partitioned columns in Hive table)
or I can directly insertinto table?

 

So far I was doing like this, which is working fine – df.coalesce(4).write.insertInto(table)

 

now I am thinking about — df.write.partitionBy(‘country’,‘year’, ‘month’).insertInto(table)

 

Also does it improve performance in later case?

Announcements